Text-to-Speech that produces natural voices, offline
Our easy-to-embed speech synthesis SDK (software development kit) to produce lifelike voices in 65 languages for any nomad, mobile or embedded system.
Getting familiar with the technology
Text-to-Speech (or speech synthesis), what is it?
Text-to-Speech (also known as TTS, speech or voice synthesis) produces voices from text as its name would suggest. This technology relies on the creation of voice engines able to translate graphemes (small text data) into phonemes (small audio data).
In order to sound like a human voice, machine learning is used to find the evaluate and choose the most appropriate way to pronounce words and sentences. With the addition of SSML (Speech Synthesis Markup Language), customization can go far beyond that (pitch, timbre, level, speed…).
A fully customizable and offline Text-to-Speech engine
Different voice, gender, quality to choose from…
We proudly gather a very heterogeneous set of TTS voices. Our resources cover multiple genders and voices of different age ranges, in more than 60 different languages.
Voice quality is also important. To comply with most hardware requirements, we provide different quality levels: from compact to high. It allows our users to chose the perfect quality/resource size ratio.
Give it a twist with customized SSML
Speech Synthesis Markup Language (SSML) is a specific language used to transform the way TTS engines are reading the provided text.
Tone, height, pitch, timbre, speed, emphasis… are the kind of parameters that can be personnalized.
A typical SSML tag looks like this : <say-as interpret-as”>VDK</say-as> to spell the word VDK instead of trying to pronounce it as a whole.
Use cases and existing applications
How to leverage offline Text-to-Speech in the field?
Speech synthesis to create audio content for impaired users and enhance service accessibility.
Produce life-like voices for voice assistants or interactive voice response (IVR).
Ability to provide hands-fee instructions or other types of informations while preserving safety and focus.
Humanize any product or service with a natural voice that can be customized with emphasis.
Create a speech-to-speech translation system that says out-loud the result of translated content.
Flexible and modular announcement system with speech synthesis for public transportation services.
Adopting voice AI in your business starts here
Get in touch with our team to shift your company in the Voice First world.
Why should you choose our
Text-to-Speech (TTS) solution?
No wifi or network are required to produce advanced, lifelike voices.
Size flexibility from 726kB up to 580mB depending on the quality.
Add SSML (Speech Synthesis Markup Language) to fine tune your TTS.
On-device means 0 latency, granting a real-time voice AI experience.
We offer 60 different languages and 115+ voices in which you will most certainly find what you need for now, and the future.
We support PCM audio output to offer a variety of audio formats and sampling rates to make it easier than it seems.
For developers, by developers
Start developing your voice AI solution with the VDK
Sign up to request a free trial
Share your information to access the VDK's evaluation.
Develop and test your use cases
Design, create and try all of your features.
Export and integrate
Available for Windows, Android or Linux devices.
Good morning, Guten Tag, Bonjour, Bom Dia, Buongiorno...
Language support is not a problem
Companies that have chosen our solution
Our customers' feedbacks regarding our solutions
“Vivoka’s technologies have been integrated into our solutions dedicated to Human Factors in industry, guaranteeing performance and data confidentiality.”
Head of the Human Factors Technology Laboratory
"Vivoka’s solutions can run offline, on-device, to operate anywhere, anytime and with a broad language support capability which is for our products a distinct competitive advantage."
Chief Executive Officer (CEO)
"We partnered with Vivoka on several innovative projects on embedded Linux system, and delivered, in a very short time, multi-lingual natural voice interactions fully running “at the edge” to our customers."
VP of Technology
Requirements & Quick-Start
How to develop with our offline Text-to-Speech engine?
Our Text-to-Speech engine offers the ability to produce life-like voices in more than 65 languages. We cover 180+ different voices, male/female, of different age range to fit your branding requirements.
The voices come in 5 different quality type:
Compact is the lightest, Pro is the heaviest of them. Regarding of the resource size, the voice quality its “natural” will be impacted.
– Language count: 65
– Resource Size:
- Compact: 1 to 30MB
- Pro: 5 to 100MB
- High: 30 to 300MB
- Premium: 40 to 500MB
– SDK Code Size: from 5MB to 65MB
– Supported Hardware: Microprocessor Units
– Supported Platforms:
- Windows – x86_64
- Linux – x86_64 | armv7hf | armv8
- Android 6.0 (API 23)
– On the Device
Fully-embedded voice technology for brands seeking the convenience of a voice user interface without the privacy or connectivity concerns of the internet. Full access to custom commands and the ability to instantly update command codes during development make voice-enabling your product fast and easy.
– On Premise
Get the power of cloud connectivity combined with the reliability of embedded voice technology. On premise (or hybrid) solutions ensure that your device is always-on and responsive to commands. Seamlessly push product updates and deliver a broader voice experience with the level of cloud-connectivity that best matches your product and users.
Discover other technologies in our stack
Trigger speech recognition process by detecting a unique word or sentence.
Turn human speech into text data that can be processed by complex systems.
Seamlessly identify or authenticate users by recognizing their voice pattern.
Enhance the audio signal quality from voice to boost speech recognition’s accuracy.
Speech-to-text: uses & evolution
Speech-to-text (or automatic speech recognition - ASR) and voice technologies in general have become indispensable features in upcoming products and/or services. Already existing ones will also...