What is Speech-to-Text ?
Speech-to-Text is a voice technology, based on deep learning language models, that is used to transform audio signals into transcribed text.
The results are statistically determined regarding the most frequent sentence structures and word occurrence regarding the context identified.
Some use cases made real with Speech-to-Text
Speech-to-Text is the foundation of speech recognition and voice assistants that we know. This technology is designed to be paired with other solutions to produce innovative voice use cases.
Automatic voice transcription of discussions and meetings to be processed as voice dictation with specific speech recognition models.
Turning voice into text to be interpreted by NLP/NLU engines to identify the user’s intent for voice commands.
Voice Messaging Systems
Fast and automatic voice transcription used for messaging applications on devices.
A technology available inside the Voice Development Kit
1. Setup your Speech-to-Text engine
Start by selecting the language you need for your project and determine the right configuration (confidence treshold for instance).
2. Upload audio or record it on the VDK
Plug your microphone in and start speaking or upload your audio files to transcribe its content into text and benchmark which solution works best.
3. Analyze results, optimize and integrate
Speech-to-text transcription will be provided in different hypothesis (whose number can be modified) to help you optimize the confidence threshold for further integration.
Our voice technologies are available in more than 40 languages to voice-enable your products and services wherever you need them to be deployed in the world. Here are some of them.
Benefits of our Speech-to-Text technology
Frictionless Voice Transcription
Speech-to-Text gives user the ability to voice interact with any product and service without the vocabulary restrictions that other solution may impose.
High Accuracy Machine Learning
Low Word Error Rate (WER) is achieved thanks to edge machine learning model that provide results hypothesis that can be tracked to improve the system recognition.
Low Footprint Embedded Solution
Our STT occupies a minimal CPU load to operate. This capacity comes from its innovative design and reduced machine learning models size.
Privacy by Design
Embedded solutions have the ability to work without an internet connection or other services. Avoiding data transfers makes an embedded solution private by its very design.
Start building your voice solution now !
Get access to the Voice Development Kit to begin the creation of your enterprise-grade voice solutions.
Please note that only businesses and organisations are able to use our technology, individual use is not yet allowed.
Thank you for your understanding.
Powered by the Voice Development Kit
Our Speech-to-Text solution is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the technology and make it work properly to perform your use case.
Frequently Asked Questions on
A few things to know…
Speech-to-Text (STT) can be tricky since it is a complex technology. We cover some of the recurrent topics about it in order to give you insights.
Can STT understand spelled letters and numbers ?
Our STT can indeed indentify separated letters and numbers when they are spelled, for instance a licence plate or a customer reference.
Is Speech-to-Text able to recognize specific vocabulary?
Yes, if combined with a specific NLP/NLU (Natural Language Processing/Understanding) solution. Otherwise, you should try our grammar-based ASR.
What are the technical specifications for integrating STT?
STT specifications are essential for its integration. To get access to this information, please contact us.
Can Speech-to-Text works in noisy environments?
Speech-to-Text can work in very noisy environnements if the microphone is adapted to the noise conditions (e.g. in factories).
What type of microphone is best suited for listening?
STT-friendly audio hardwares exist. The best way to find adapted microphone is to contact us in order to test different alternatives.
What is the average error rate of our STT technology?
The WER (Word Error Rate) of our STT depends on text complexity and the hardware quality. Contact us for more personnalized information.