Embedded engine to produce real-time synthetic voice (Text-to-Speech) to vocalize your use cases with natural feedbacks.
What is Text-to-Speech ?
Text-to-Speech, also known as voice synthesis or Text-to-Voice, is a technology used to create real-time voice in order to dictate configurated text. These synthetic voices can be selected according to language, genre and quality.
Human language is filled with many particularities that make it as rich as complex. To produce the most accurate human language, parameters such as pitch, speed, power, emotion and word pronunciation can be customized.
Text-to-Speech is a key component of the speech-enabled interfaces due to its feedback role in the human-machine interaction. Along the user experience improvment, voice synthesis is a great tool to enhance service and product accessibility for visually-impaired individuals.
Our Must-have features
Our TTS (Text-to-Speech) is available in 65 languages that covers the large majority of speakers and needs in order to vocalize any use case.
Many parameters are up to your preferences such as pitch, speed, power, emotion, pronounciations… to fit the synthetic voice with its context further more.
Large range of voices
More than 100 voices are available. These are sorted by gender, emotions and quality for you to customize the answers you give to your users in selected contexts.
Very low CPU usage
Our TTS occupies a minimal CPU load to operate. This capacity comes from its light design and the technical performance of the tools used to create it.
Some of our clients use Text-to-Speech
Text-to-Speech or Speech Synthesis is a commonly used technology in the word of voice interfaces and assistants, especially in terms of audio feedbacks and user informations. Some of our clients, in order to achieve their projects and innovations, has developped interesting features thanks to our embedded TTS.
Operating Software for embedded SDK platform
- Windows: 32-bit and 64-bit
- Linux x86: 32-bit and 64-bit
Standard ports and Tools
- Linux ARM: ARM32 Hardfp, ARM32 Softfp, ARM64
- Android v4.0 (Ice Cream Sandwich), API level 14+, ARM32-v7a Android v7.0 (Nougat), API level 24+, ARM64-v8a
- iOS: arm64, armv7, armv7s, i386 and x86_64 simulator
The code size for a fully featured TTS Embedded engine is 10 to 13.5 MB depending on the target platform. This can be optimized based on required language set, feature and compiler choices.
Voice Operating Point (VOP) with relative flash size (w/o code) and RAM usage
- Embedded Compact – Small versatile TTS suited for constrained platforms
- Flash Size: Ave. 10MB / Max. 21MB
- RAM Usage: Ave. 6MB / Max. 23MB
- Embedded Pro – High quality TTS optimized for navigation, info readout and reading capabilities
- Flash Size: Ave. 55MB / Max. 131MB
- RAM Usage: Ave. 14MB / Max. 38MB
- Embedded High – High quality TTS read-out for SMS, news, e-mail on embedded targets
- Flash Size: Ave. 120MB / Max. 325MB
- RAM Usage: Ave. 24MB / Max. 69MB
- Embedded Premium – Highest quality deep learning based concatenative synthesis, selected voice only
- Flash Size: Ave. 337MB / Max. 159MB
- RAM Usage: Ave. 558MB / Max. 198MB
Mutli-lingual voices include recorded material for one or several foreign languages. They are released for all operating points except Embedded Compact and require up to 50% more memory (flash and RAM) compared to the numbers above.
Our Text-to-Speech (TTS) is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the speech synthesis and make it work properly to perform your use case.
Frequently Asked Questions on Text-to-Speech (TTS)
A few things to know…
Text-to-Speech can be tricky since it is a complex technology. We cover some of the recurrent topics about it in order to give you insights.
Is it possible to customize the generated voice from the TTS engine?
Several parameters can be modified directly with our tool such as pitch, speed, power, emotion or pronounciation
Does the synthetic voices have limits in terms of words lenght ?
You can produce vocalized text as long as you need them to be and optimize the voice parameter to make it sound natural
Is there a required hardware in order to play the synthetics voices ?
Text-to-Speech allows you to create voice in different file extensions, especially the most popular ones (mp3, wav…)
How to produce a synthetic voice that doesn't sound robotic ?
In most cases, pitch, speed and breaks are responsible for this aspect. With our engine, you can customize these parameters at your will
Is the Text-to-Speech able to say correctly specialized or uncommon words ?
Phonetic editor allows you to build custom phonetic for technical words that may be said incorrectly