The voice, our expertise

We claim it: we are motivated by the fact of making a perfect product.

Digital transformation has put machines, applications, information kiosks and all other types of IT platforms at the heart of our daily lives. Our ambition is simple: we want to improve human-machine interaction, so between your customers/users and your services, all through voice.

Whatever your field of activity, our job is to meet all the expectations of your users or customers, taking into account the whole situation, with a quality of response that is still unequalled. Since our creation, the years of research we have carried out have enabled us to work on the issues at the forefront of this field, known as speech recognition.

There are many steps involved in the communication process between a human and a voice assistant.

In general, the steps can be divided into 5 bricks:

1. The trigger word

Also called HotWord or Wake up Word, Hotword is the first step in the voice recognition process. It is materialized by the detection of a predefined keyword that triggers the system.

As soon as this keyword is heard, the listening phase of the system starts. For example, “Hey Siri”, “Ok Google” or “Alexa” are Wake up Words. At Vivoka, we have the ability to customize this keyword or find alternatives.

Logo Wake up word Vivoka

2. The transformation of voice into text

ASR or STT (Automatic Speech Recognition or Speech to Text), is a step that converts the user’s voice (which is in the form of sound frequencies) into written text (human readable). The complexity and effectiveness of this comprehension step depends on different factors such as language, accent, surrounding noise, slang words and microphone quality. We strive to have the highest level of oral-written comprehension possible.

3. Understanding the user’s intentions (contextualization)

Known as NLP (Natural Language Computing) or NLU (Natural Language Understanding), this step consists of a semantic analysis of the sentence in order to extract one or more intentions, often accompanied with associated context elements. By voice, we do not have the choice as on an application between 2, 5 or 10 buttons, but between millions of words to express our expectations. One of our specialities is therefore to have designed a system capable of understanding exactly what the user wants, in the right context.

Natural Language Processing Vivoka, reconnaissance vocale et intelligence artificielle.
Logo intelligence artificielle

4. Artificial intelligence

One of the important steps in the process is artificial intelligence. Whatever communication you may have seen on different super-developed AIs, the one that suits you most will be the one designed for your business and based on your user’s data. That’s why, thanks to the intentions clearly identified by our NLP module and to precise contexts, we make sure to design a custom-made artificial intelligence, totally adapted to your needs.

5. The machine answers

The TTS (Text to Speech) is the last brick of the process and is a module commonly called “voice synthesis”. It is used to transform a written text (in French for example) into a sound as close as possible to a human voice. It is used by artificial intelligence to respond to the user and can have a voice commonly used or custom made according to your needs.


Understanding the user’s intentions.

text to speech logo

Identify what your customer wants, and offer it to him. That’s all.

The experience acquired during the various developments we have carried out has enabled us to acquire expertise on different types of Artificial Intelligence, whether on a notion of Cloud or embedded, by responding to the issues of deadlines, security or Big Data.

We make a point in understanding your context and environment.

This is our method of differentiation, and the only one for you to obtain a solution entirely adapted to your business.

Where did the team come from?


Our ability to offer you the best services lies in our R&D department, which must always be up to date on the latest developments in the sector. Coming from different backgrounds such as CNRS, INRIA, Epitech, our specialists are recruited for their skills, but also for their vision and passion for the field in which they are progressing.

At the forefront of research

In addition to the constant improvement of our solution, our R&D department specializes in new areas, including Speech to Emotion, a process that allows us to extract the user’s emotions in an additional context, which will allow us to further improve our artificial intelligence.

To learn more about our R&D activities