Offline Text-to-Speech (TTS)

Embedded engine to produce real-time synthetic voice (Text-to-Speech) to vocalize
your use cases with natural voice-based feedbacks.

Try it in the Voice Development KitContact Sales Team

What is Text-to-Speech ?

Text-to-Speech, also known as voice synthesis or Text-to-Voice, is a technology used to create real-time voice in order to dictate configurated text. These synthetic voices can be selected according to language, genre and quality.


Human language is filled with many particularities that make it as rich as complex. To produce the most accurate human language, parameters such as pitch, speed, power, emotion and word pronunciation can be customized.

Natural Voice Feedbacks

Text-to-Speech is a key component of the speech-enabled interfaces due to its feedback role in the human-machine interaction. Along the user experience improvment, voice synthesis is a great tool to enhance service and product accessibility for visually-impaired individuals.

Some use cases made real with Text-to-Speech (TTS)

Text-to-Speech is the other end of voice interactions, giving conversational AI the ability to answer the users back with human-like voices. Applications are endless as well as the benefits, from branding to usability.

Voice User Experience

Create human-like interactions with voice assistants by giving them natural voice to answer and interact with users.

Voice Information System

Use voice synthesis to generate customized voices to provide clients or users with voice information in addition to traditionnal displays.

Voice Accessibility

Make products and services available to anyone, including the visually impaired thanks to natural voice synthesis.

A technology available inside the Voice Development Kit

1. Choose your voices and languages

Select your favorite voices and languages within our large range of resources (+60 languages, +110 voices) to meet your use case expectations.

2. Test and add SSML to customize

Use sample texts to try different voices in different languages in order to benchmark the best mix. Customize your selection with SSML to change tone, height, pitch, timber and more…

3. Save and export your voices

When the expected results are reached, save your voices and export them in order to integrate them inside your products and devices.

Benefits of our Text-to-Speech technology

Multilingual Natural Text-to-Speech

Our Text-to-Speech (TTS) is available in 65 languages that covers the large majority of speakers and needs in order to vocalize any use cases.

Many Available Voices

More than 100 voices are available. These are sorted by gender, emotions and quality for you to customize the answers you give to your users in selected contexts.

Fully Customizable Voices

Many parameters are up to your preferences such as pitch, speed, power, emotion, pronounciations… to fit the synthetic voice with its context further more.

Lightweight Text-to-Speech Engine

Depending on voice’s quality, the required storage capacity can vary a lot. This allows to build TTS for low-spec devices up to muc more powerful systems.

Try Text-to-Speech with the Voice Development Kit


The purpose of this form is to get to know you first and foremost in order to offer you the best test configuration for your needs.

If we determine that your use case is suitable, one of our consultants will contact you to start the evaluation period.

Your information is kept in our database and will only be used to contact you for this evaluation purpose.


7 + 6 =

Your project has never been that close to its solution!

Browsing through our projects and technologies might have give you some insights about the possibilities you have by working with us. We can further help you to achieve your goals.

Standard ports and Tools

  • Android (version 6.0 API 23)
  • Linux: x86_64, armv7hf, armv8
  • Windows: x86_64


Voice Operating Point (VOP) with relative flash size (w/o code) and RAM usage

  • Embedded Compact – Small versatile TTS suited for constrained platforms
    • Flash Size: Ave. 10MB / Max. 21MB
    • RAM Usage: Ave. 6MB / Max. 23MB
  • Embedded Pro – High quality TTS optimized for navigation, info readout and reading capabilities
    • Flash Size: Ave. 55MB / Max. 131MB
    • RAM Usage: Ave. 14MB / Max. 38MB
  • Embedded High – High quality TTS read-out for SMS, news, e-mail on embedded targets
    • Flash Size: Ave. 120MB / Max. 325MB
    • RAM Usage: Ave. 24MB / Max. 69MB
  • Embedded Premium – Highest quality deep learning based concatenative synthesis, selected voice only
    • Flash Size: Ave. 337MB / Max. 558MB
    • RAM Usage: Ave. 159MB / Max. 198MB

Mutli-lingual voices include recorded material for one or several foreign languages. They are released for all operating points except Embedded Compact and require up to 50% more memory (flash and RAM) compared to the numbers above.

Tech Requirements

Our Text-to-Speech solution is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the technology and make it work properly to perform your use case.

Frequently Asked Questions on

A few things to know…

Text-to-Speech is a well-known technology but not by everybody. We cover some of the recurrent topics about it in order to give you insights.

Is it possible to customize the generated voice from the TTS engine?

Several parameters can be modified directly with our tool such as pitch, speed, power, emotion or pronounciation.

Does the synthetic voices have limits in terms of words lenght ?

You can produce vocalized text as long as you need them to be and optimize the voice parameter to make it sound natural.

What are the technical specifications in order to integrate Text-to-Speech?

Text-to-Speech specifications are essential for its integration. To get access to this information, please contact us.

Is there a required hardware in order to play the synthetics voices ?

Text-to-Speech allows you to create voice in different file extensions, especially the most popular ones (mp3, wav…).

How to produce a synthetic voice that doesn't sound robotic ?

In most cases, pitch, speed and breaks are responsible for this aspect. With our engine, you can customize these parameters at your will.

Is Text-to-Speech able to say correctly specialized or uncommon words ?

Phonetic editor allows you to build custom phonetic for technical words that may be said incorrectly.

More technologies to discover…


Continuous and on-device voice transcription to transform speech into written text

Voice Commands (ASR)

Grammar-based automatic speech recognition for specific vocabulary voice commands

Wake-up Word

Easy tool to generate multilingual wake-up words to voice-activate any devices

Voice Biometrics

Authenticate or identify users with offline text (in)dependent voice biometrics models