+33 (0)9 71 00 03 70 contact@vivoka.com

Automatic Speech Recognition (ASR)

Turn speech into text (STT) or commands automatically with our embedded grammar-based and Private by Design Automatic Speech Recognition technology.

Intégrer Table of Content ?

What is Automatic Speech Recognition ?

Our 100% offline ASR (Automatic Speech Recognition) engine works in a methodology centered on grammar creation. These grammars are dictionaries of commands to be created according to the desired use cases.

This corpus of commands, once compiled with a Machine Learning engine, provides a bank of phonetics that correspond to the queries.

In addition, a second Machine Learning engine is used to analyze the sound frequency resulting from the voices recorded during use, and to associate to each segment of this frequency, the most appropriate phonetics, and therefore the associated word or group of words.

These steps thus make it possible to transcribe a complete sentence from a sound recording, turning voice into text.

Our Must-have features

w

Multilingual Speech

Our ASR is able to work with 36 different languages, the most commonly spoken (English, Mandarin, Spanish….) in order to scale your solution worldwide.

Reduced Word Error Rate

Unlike generic ASRs, which, wishing to understand and recognize everything, face colossal technical constraints, we focus on specialized commands and wordings.

i

Grammar-based Technology

Our ASR allows you to define specific words, business jargon and technical vocabularies, while specializing in speech recognition for pre-defined use cases.

Very low CPU usage

Our ASR occupies a minimal CPU load to operate. This capacity comes from its design by grammar, thus its specialization, and the technical performance of the tools used to create it.

Some of our clients use ASR

Automatic Speech Recognition (ASR) is a commonly used technology in the word of voice interfaces and assistants. Some of our clients, in order to achieve their projects and innovations, has developped interesting features thanks to our ASR.

Your project has never been that close to its solution!

Browsing through our projects and technologies might have give you some insights about the possibilities you have by working with us. We can further help you to achieve your goals.

OS-dependent API binding and packaging

  • Android: ASR (within VDK) will be served with a Java-API binding compiled into an Android archive (AAR)
  • Win/Linux: ASR (within VDK) will come with C++ API binding
  • Apple iOS: ASR (within VDK) will come with Swift binding

Standard ports and Tools

  • iOS (version 7.0 and up): arm64 and x86_64
  • Android (version 5.0 and up): armv7 (32Bit), arm64 and x86_64
  • Linux: armv7 (32Bit), arm64 and x86_64
  • Windows: x86_64

Functionnality code size

  • Basic command & control (C&C) application: 3.2MB
  • Full Fonctionality, largest accoustic models: 9.5MB

Components and relative data size per language

  • Accoustics models, per language
    • Gen 4 compact: 900kB
    • Gen 5: approx. 4MB
    • Gen 6: approx. 6MB
  • GLIC – mono-lingual – General purpose transcriptions: 300-7300kB
  • GLC – multi-lingual – Music collection compilation: 700-3000kB

Components and relative data size per language and total RAM usage

  • Digit Recognition: 4kB / 1,25MB
  • Basic C&C application 100/10,000 commands: 10-500kB / 1,3-1,8MB
  • Telephony (voice-activated dialing) with grammars + SLMs, including NLU. 1350 contacts: 0,52MB / 12,6MB
  • 1-shot voice destination entry POI & addresses (UDE) all USA, FST based, including NLU: 300MB / 56 MB
  • Embedded dictation: 100MB / 100MB

Tech Requirements

Our Automatic Speech Recognition is an embedded technology that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the ASR (STT) and make it work properly to perform your use case.

Frequently Asked Questions on ASR

A few things to know…

Automatic Speech Recognition can be tricky since it is a complex technology. We cover some of the recurrent topics about it in order to give you insights.

u
Can ASR understand spelled letters and numbers ?

Our ASR can indeed indentify separated letters and numbers when they are spelled, for instance a licence plate or a customer reference

u
Is Automatic Speech Recognition able to recognize specific vocabulary?

Our ASR’s design is able to understand very specific vocabulary thanks to the creation of specialized grammars

u
What are the technical specifications for integrating ASR?

ASR specifications are essential for its integration. To get access to this information, please contact us

u
Can Automatic Speech Recognition work in noisy environments?

Automatic Speech Recognition can work in very noisy environnements if the microphone is adapted to the noise conditions (e.g. in factories)

u
What type of microphone is best suited for listening?

ASR-friendly audio hardwares exist. The best way to find adapted microphone is to contact us in order to test different alternatives

u
What is the average error rate of ASR technology?

The WER (Word Error Rate) of our ASR depends on the grammar complexity and the hardware quality. 0% errors is something we can achieve

More technologies to discover

Voice Development Kit

Build a voice assistant or interface in record time our with all-in-one voice solution

Embedded Voice Synthesis

Automatic generation of multilingual natural voices that runs offline on device

Wake Word Detection Tool

Easy tool to generate multilingual wake word to embed in devices