Automatic Speech Recognition (ASR)
Turn speech into text (STT) or commands automatically with our embedded grammar-based and Private by Design Automatic Speech Recognition technology.
What is Automatic Speech Recognition ?
Our 100% offline ASR (Automatic Speech Recognition) engine works in a methodology centered on grammar creation. These grammars are dictionaries of commands to be created according to the desired use cases.
This corpus of commands, once compiled with a Machine Learning engine, provides a bank of phonetics that correspond to the queries.
In addition, a second Machine Learning engine is used to analyze the sound frequency resulting from the voices recorded during use, and to associate to each segment of this frequency, the most appropriate phonetics, and therefore the associated word or group of words.
These steps thus make it possible to transcribe a complete sentence from a sound recording, turning voice into text.
Our Must-have features
Our ASR is able to work with 36 different languages, the most commonly spoken (English, Mandarin, Spanish….) in order to scale your solution worldwide.
Reduced Word Error Rate
Unlike generic ASRs, which, wishing to understand and recognize everything, face colossal technical constraints, we focus on specialized commands and wordings.
Our ASR allows you to define specific words, business jargon and technical vocabularies, while specializing in speech recognition for pre-defined use cases.
Very low CPU usage
Our ASR occupies a minimal CPU load to operate. This capacity comes from its design by grammar, thus its specialization, and the technical performance of the tools used to create it.
Some of our clients use Automatic Speech Recognition
Automatic Speech Recognition (ASR) is a commonly used technology in the word of voice interfaces and assistants. Some of our clients, in order to achieve their projects and innovations, has developped interesting features thanks to our ASR.
OS-dependent API binding and packaging
- Android: ASR (within VDK) will be served with a Java-API binding compiled into an Android archive (AAR)
- Win/Linux: ASR (within VDK) will come with C++ API binding
- Apple iOS: ASR (within VDK) will come with Swift binding
Standard ports and Tools
- iOS (version 7.0 and up): arm64 and x86_64
- Android (version 5.0 and up): armv7 (32Bit), arm64 and x86_64
- Linux: armv7 (32Bit), arm64 and x86_64
- Windows: x86_64
Functionnality code size
- Basic command & control (C&C) application: 3.2MB
- Full Fonctionality, largest accoustic models: 9.5MB
Components and relative data size per language
- Accoustics models, per language
- Gen 4 compact: 900kB
- Gen 5: approx. 4MB
- Gen 6: approx. 6MB
- GLIC – mono-lingual – General purpose transcriptions: 300-7300kB
- GLC – multi-lingual – Music collection compilation: 700-3000kB
Components and relative data size per language and total RAM usage
- Digit Recognition: 4kB / 1,25MB
- Basic C&C application 100/10,000 commands: 10-500kB / 1,3-1,8MB
- Telephony (voice-activated dialing) with grammars + SLMs, including NLU. 1350 contacts: 0,52MB / 12,6MB
- 1-shot voice destination entry POI & addresses (UDE) all USA, FST based, including NLU: 300MB / 56 MB
- Embedded dictation: 100MB / 100MB
Our Automatic Speech Recognition is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the ASR (STT) and make it work properly to perform your use case.
Frequently Asked Questions on Automatic Speech Recognition
A few things to know…
Automatic Speech Recognition can be tricky since it is a complex technology. We cover some of the recurrent topics about it in order to give you insights.
Can ASR understand spelled letters and numbers ?
Our ASR can indeed indentify separated letters and numbers when they are spelled, for instance a licence plate or a customer reference
Is Automatic Speech Recognition able to recognize specific vocabulary?
Our ASR’s design is able to understand very specific vocabulary thanks to the creation of specialized grammars
Can Automatic Speech Recognition work in noisy environments?
Automatic Speech Recognition can work in very noisy environnements if the microphone is adapted to the noise conditions (e.g. in factories)
What type of microphone is best suited for listening?
ASR-friendly audio hardwares exist. The best way to find adapted microphone is to contact us in order to test different alternatives
What is the average error rate of ASR technology?
The WER (Word Error Rate) of our ASR depends on the grammar complexity and the hardware quality. 0% errors is something we can achieve