What is Text-to-Speech ?
Text-to-Speech, also known as voice synthesis or Text-to-Voice, is a technology used to create real-time voice in order to dictate configurated text. These synthetic voices can be selected according to language, genre and quality.
Human language is filled with many particularities that make it as rich as complex. To produce the most accurate human language, parameters such as pitch, speed, power, emotion and word pronunciation can be customized.
Natural Voice Feedbacks
Text-to-Speech is a key component of the speech-enabled interfaces due to its feedback role in the human-machine interaction. Along the user experience improvment, voice synthesis is a great tool to enhance service and product accessibility for visually-impaired individuals.
Some use cases made real with Text-to-Speech (TTS)
Text-to-Speech is the other end of voice interactions, giving conversational AI the ability to answer the users back with human-like voices. Applications are endless as well as the benefits, from branding to usability.
Voice User Experience
Create human-like interactions with voice assistants by giving them natural voice to answer and interact with users.
Voice Information System
Use voice synthesis to generate customized voices to provide clients or users with voice information in addition to traditionnal displays.
Make products and services available to anyone, including the visually impaired thanks to natural voice synthesis.
A technology available inside the Voice Development Kit
1. Choose your voices and languages
Select your favorite voices and languages within our large range of resources (+60 languages, +110 voices) to meet your use case expectations.
2. Test and add SSML to customize
Use sample texts to try different voices in different languages in order to benchmark the best mix. Customize your selection with SSML to change tone, height, pitch, timber and more…
3. Save and export your voices
When the expected results are reached, save your voices and export them in order to integrate them inside your products and devices.
Benefits of our Text-to-Speech technology
Multilingual Natural Text-to-Speech
Our Text-to-Speech (TTS) is available in 65 languages that covers the large majority of speakers and needs in order to vocalize any use cases.
Many Available Voices
More than 100 voices are available. These are sorted by gender, emotions and quality for you to customize the answers you give to your users in selected contexts.
Fully Customizable Voices
Many parameters are up to your preferences such as pitch, speed, power, emotion, pronounciations… to fit the synthetic voice with its context further more.
Lightweight Text-to-Speech Engine
Depending on voice’s quality, the required storage capacity can vary a lot. This allows to build TTS for low-spec devices up to muc more powerful systems.
Try Text-to-Speech with the Voice Development Kit
The purpose of this form is to get to know you first and foremost in order to offer you the best test configuration for your needs.
If we determine that your use case is suitable, one of our consultants will contact you to start the evaluation period.
Your information is kept in our database and will only be used to contact you for this evaluation purpose.
Powered by the Voice Development Kit
Standard ports and Tools
- Android (version 6.0 API 23)
- Linux: x86_64, armv7hf, armv8
- Windows: x86_64
Voice Operating Point (VOP) with relative flash size (w/o code) and RAM usage
- Embedded Compact – Small versatile TTS suited for constrained platforms
- Flash Size: Ave. 10MB / Max. 21MB
- RAM Usage: Ave. 6MB / Max. 23MB
- Embedded Pro – High quality TTS optimized for navigation, info readout and reading capabilities
- Flash Size: Ave. 55MB / Max. 131MB
- RAM Usage: Ave. 14MB / Max. 38MB
- Embedded High – High quality TTS read-out for SMS, news, e-mail on embedded targets
- Flash Size: Ave. 120MB / Max. 325MB
- RAM Usage: Ave. 24MB / Max. 69MB
- Embedded Premium – Highest quality deep learning based concatenative synthesis, selected voice only
- Flash Size: Ave. 337MB / Max. 558MB
- RAM Usage: Ave. 159MB / Max. 198MB
Mutli-lingual voices include recorded material for one or several foreign languages. They are released for all operating points except Embedded Compact and require up to 50% more memory (flash and RAM) compared to the numbers above.
Our Text-to-Speech solution is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the technology and make it work properly to perform your use case.
Frequently Asked Questions on
A few things to know…
Text-to-Speech is a well-known technology but not by everybody. We cover some of the recurrent topics about it in order to give you insights.
Is it possible to customize the generated voice from the TTS engine?
Several parameters can be modified directly with our tool such as pitch, speed, power, emotion or pronounciation.
Does the synthetic voices have limits in terms of words lenght ?
You can produce vocalized text as long as you need them to be and optimize the voice parameter to make it sound natural.
What are the technical specifications in order to integrate Text-to-Speech?
Text-to-Speech specifications are essential for its integration. To get access to this information, please contact us.
Is there a required hardware in order to play the synthetics voices ?
Text-to-Speech allows you to create voice in different file extensions, especially the most popular ones (mp3, wav…).
How to produce a synthetic voice that doesn't sound robotic ?
In most cases, pitch, speed and breaks are responsible for this aspect. With our engine, you can customize these parameters at your will.
Is Text-to-Speech able to say correctly specialized or uncommon words ?
Phonetic editor allows you to build custom phonetic for technical words that may be said incorrectly.