A duplex speech-to-speech model changes the premise: The intelligence layer consumes audio and produces audio directly. The model can attend to what was said and how it was said—content and delivery ...
The new model, called VSSFlow, leverages a creative architecture to generate sounds and speech with a single unified system, with state-of-the-art results. Watch (and hear) some demos below. Currently ...