Embedded engine to produce real-time synthetic voice (Text-to-Speech) to vocalize your use cases with natural feedbacks.

text to speech voice synthesis

What is Text-to-Speech ?

Text-to-Speech, also known as voice synthesis or Text-to-Voice, is a technology used to create real-time voice in order to dictate configurated text. These synthetic voices can be selected according to language, genre and quality.

Human language is filled with many particularities that make it as rich as complex. To produce the most accurate human language, parameters such as pitch, speed, power, emotion and word pronunciation can be customized.

Text-to-Speech is a key component of the speech-enabled interfaces due to its feedback role in the human-machine interaction. Along the user experience improvment, voice synthesis is a great tool to enhance service and product accessibility for visually-impaired individuals.

Our Must-have features


Multilingual voices

Our TTS (Text-to-Speech) is available in 65 languages that covers the large majority of speakers and needs in order to vocalize any use case.

Customizable voices

Many parameters are up to your preferences such as pitch, speed, power, emotion, pronounciations… to fit the synthetic voice with its context further more.


Large range of voices

More than 100 voices are available. These are sorted by gender, emotions and quality for you to customize the answers you give to your users in selected contexts.

Very low CPU usage

Our TTS occupies a minimal CPU load to operate. This capacity comes from its light design and the technical performance of the tools used to create it.

Some of our clients use Text-to-Speech

Text-to-Speech or Speech Synthesis is a commonly used technology in the word of voice interfaces and assistants, especially in terms of audio feedbacks and user informations. Some of our clients, in order to achieve their projects and innovations, has developped interesting features thanks to our embedded TTS.

Your project has never been that close to its solution!

Browsing through our projects and technologies might have give you some insights about the possibilities you have by working with us. We can further help you to achieve your goals.

Operating Software for embedded SDK platform

  • Windows: 32-bit and 64-bit
  • Linux x86: 32-bit and 64-bit

Standard ports and Tools

  • Linux ARM: ARM32 Hardfp, ARM32 Softfp, ARM64
  • Android v4.0 (Ice Cream Sandwich), API level 14+, ARM32-v7a Android v7.0 (Nougat), API level 24+, ARM64-v8a
  • iOS: arm64, armv7, armv7s, i386 and x86_64 simulator

The code size for a fully featured TTS Embedded engine is 10 to 13.5 MB depending on the target platform. This can be optimized based on required language set, feature and compiler choices.

Voice Operating Point (VOP) with relative flash size (w/o code) and RAM usage

  • Embedded Compact – Small versatile TTS suited for constrained platforms
    • Flash Size: Ave. 10MB / Max. 21MB
    • RAM Usage: Ave. 6MB / Max. 23MB
  • Embedded Pro – High quality TTS optimized for navigation, info readout and reading capabilities
    • Flash Size: Ave. 55MB / Max. 131MB
    • RAM Usage: Ave. 14MB / Max. 38MB
  • Embedded High – High quality TTS read-out for SMS, news, e-mail on embedded targets
    • Flash Size: Ave. 120MB / Max. 325MB
    • RAM Usage: Ave. 24MB / Max. 69MB
  • Embedded Premium – Highest quality deep learning based concatenative synthesis, selected voice only
    • Flash Size: Ave. 337MB / Max. 558MB
    • RAM Usage: Ave. 159MB / Max. 198MB

Mutli-lingual voices include recorded material for one or several foreign languages. They are released for all operating points except Embedded Compact and require up to 50% more memory (flash and RAM) compared to the numbers above.

Text-to-Speech Requirements

Our Text-to-Speech (TTS) is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the speech synthesis and make it work properly to perform your use case.

Frequently Asked Questions on Text-to-Speech (TTS)

A few things to know…

Text-to-Speech can be tricky since it is a complex technology. We cover some of the recurrent topics about it in order to give you insights.

Is it possible to customize the generated voice from the TTS engine?

Several parameters can be modified directly with our tool such as pitch, speed, power, emotion or pronounciation

Does the synthetic voices have limits in terms of words lenght ?

You can produce vocalized text as long as you need them to be and optimize the voice parameter to make it sound natural

What are the technical specifications for integrating Text-to-Speech?

Text-to-Speech specifications are essential for its integration. To get access to this information, please contact us

Is there a required hardware in order to play the synthetics voices ?

Text-to-Speech allows you to create voice in different file extensions, especially the most popular ones (mp3, wav…)

How to produce a synthetic voice that doesn't sound robotic ?

In most cases, pitch, speed and breaks are responsible for this aspect. With our engine, you can customize these parameters at your will

Is the Text-to-Speech able to say correctly specialized or uncommon words ?

Phonetic editor allows you to build custom phonetic for technical words that may be said incorrectly

More technologies to discover

Voice Development Kit

Build a voice assistant or interface in record time with our all-in one voice solution

Automatic Speech Recognition

Technology used to turn voice into text or commands automatically

Wake Word Detection Tool

Easy tool to generate multilingual wake word to embed in devices