Speech-to-Text (STT)

Continuous and multilingual Speech-to-Text technology to turn sentences or
discussions into written text based on machine learning model determination.

Try the Voice Development KitContact Sales Team

What is Speech-to-Text ?

Speech-to-Text is a voice technology, based on deep learning language models, that is used to transform audio signals into transcribed text.

The results are statistically determined regarding the most frequent sentence structures and word occurrence regarding the context identified.

Some use cases made real with Speech-to-Text

Speech-to-Text is the foundation of speech recognition and voice assistants that we know. This technology is designed to be paired with other solutions to produce innovative voice use cases.

Voice Transcription

Automatic voice transcription of discussions and meetings to be processed as voice dictation with specific speech recognition models.

Voice Commands

Turning voice into text to be interpreted by NLP/NLU engines to identify the user’s intent for voice commands.

Voice Messaging Systems

Fast and automatic voice transcription used for messaging applications on devices.

A technology available inside the Voice Development Kit

1. Setup your Speech-to-Text engine

Start by selecting the language you need for your project and determine the right configuration (confidence treshold for instance).

2. Upload audio or record it on the VDK

Plug your microphone in and start speaking or upload your audio files to transcribe its content into text and benchmark which solution works best.

3. Analyze results, optimize and integrate

Speech-to-text transcription will be provided in different hypothesis (whose number can be modified) to help you optimize the confidence threshold for further integration.

Language coverage

Our voice technologies are available in more than 40 languages to voice-enable your products and services wherever you need them to be deployed in the world. Here are some of them.

English (US)

English (UK)














Benefits of our Speech-to-Text technology

Frictionless Voice Transcription

Speech-to-Text gives user the ability to voice interact with any product and service without the vocabulary restrictions that other solution may impose.

High Accuracy Machine Learning

Low Word Error Rate (WER) is achieved thanks to edge machine learning model that provide results hypothesis that can be tracked to improve the system recognition.

Low Footprint Embedded Solution

Our STT occupies a minimal CPU load to operate. This capacity comes from its innovative design and reduced machine learning models size.

Privacy by Design

Embedded solutions have the ability to work without an internet connection or other services. Avoiding data transfers makes an embedded solution private by its very design.

Start building your voice solution now !

Get access to the Voice Development Kit to begin the creation of your enterprise-grade voice solutions.

Please note that only businesses and organisations are able to use our technology, individual use is not yet allowed.

Thank you for your understanding.

Your project has never been that close to its solution!

Browsing through our projects and technologies might have give you some insights about the possibilities you have by working with us. We can further help you to achieve your goals.

Standard ports and Tools

  • Android (version 6.0 API 23)
  • Linux: x86_64, armv7hf, armv8
  • Windows: x86_64


Resource size

  • Between 250 and 300MB

SDK code size

  • Around 50MB


Tech Requirements

Our Speech-to-Text solution is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the technology and make it work properly to perform your use case.

Frequently Asked Questions on

A few things to know…

Speech-to-Text (STT) can be tricky since it is a complex technology. We cover some of the recurrent topics about it in order to give you insights.

Can STT understand spelled letters and numbers ?

Our STT can indeed indentify separated letters and numbers when they are spelled, for instance a licence plate or a customer reference.

Is Speech-to-Text able to recognize specific vocabulary?

Yes, if combined with a specific NLP/NLU (Natural Language Processing/Understanding) solution. Otherwise, you should try our grammar-based ASR.

What are the technical specifications for integrating STT?

STT specifications are essential for its integration. To get access to this information, please contact us.

Can Speech-to-Text works in noisy environments?

Speech-to-Text can work in very noisy environnements if the microphone is adapted to the noise conditions (e.g. in factories).

What type of microphone is best suited for listening?

STT-friendly audio hardwares exist. The best way to find adapted microphone is to contact us in order to test different alternatives.

What is the average error rate of our STT technology?

The WER (Word Error Rate) of our STT depends on text complexity and the hardware quality. Contact us for more personnalized information.

More technologies to discover…

Voice Commands (ASR)

Grammar-based automatic speech recognition for specific vocabulary voice commands


Automatic generation of multilingual natural voices that runs offline on the edge

Wake-up Word

Easy tool to generate multilingual wake-up words to voice-activate any devices

Voice Biometry

Authenticate or identify users with offline text (in)dependent voice biometric models