Frequently Asked Questions

All you need to know about voice technologies. If your questions are left unanswered, contact us!

Get in touch Discover content

Products

Notions & Concepts

Technologies

Company

Requirements

About products

What is the Voice Development Kit?

The VDK is an all-in-one software solution that enables companies and developers to create a fully on-device voice assistant on their own. VDK showcases all the technologies needed to perform end-to-end voice experience, in more than 40 languages and compatible with most hardware solutions.

How many languages are supported?

VDK supports 41 languages in speech recognition and 65 languages for speech synthesis.

What are Vivoka’s pricing models?

The VDK has a fixed price on which resources and technologies adds-up. This is to be able to use the VDK and develop your solution. In order to commercially exploit it, Vivoka offers flexibility in terms of business models.

The main pricing we feature is a royalty-based licensing model, you have to pay a specific price per unit that is equipped with our technologies. We are also used to offer subscription models that can apply to your already-existing pricing.

If you have specific requirements for your revenue model, let us know and we will find a way to adapt.

How do I get started?

In order to start developing with the VDK you need to request an evaluation period by clicking the “Get Started” button. Once your information have been approved by our team, you will be able to download and prototype with the VDK during 30 days.

Do you offer a free trial?

We let companies try our solution for 30 days with full resources available. To access it, fill in the request for an evaluation here.

Who is using the VDK?

We have a large scope of customers in many industries that you can discover here. The VDK is mostly used by firms and middle-size companies from EMEA and NA.

What kind of use cases are supported by the VDK?

This question should probably be “what are the use cases that are NOT supported by the VDK?” since we are field-agnostic. We offer the tools to create any kind of voice use case, from pick by voice in the supply chain to voice form filling and input in MRO (Maintenance & Repair Operations) to voice-enabled Smart Glasses navigation…

About notions and concepts

What is voice AI?

Voice AI (Voice Artificial Intelligence) is a way to name the principle of voice assistants. It is a conversational system that uses voice commands, understand and forward their intent. After the user’s request is fully achieved, the system answers by giving feedback in natural language, Text-to-Speech for instance, to end the conversation loop.

What is AI voice?

Not to be confused with voice AI, AI voice is mostly referred to as synthetic voices that can be generated through artificial intelligence. Deepfakes, voices that are mimicking real-life individuals, comes from the field of AI voice.

What is a voice assistant?

A voice assistant is a voice-based conversational system that will understand and answer a user, calling multiple technologies (automatic speech recognition, natural language understanding, text-to-speech…) to perform that.

A voice assistant is commonly referred to as the conversational AIs, also called virtual agents, that are hosted in popular smart speakers and devices.

Sometimes, the term is mismatched with the hardware itself when smart speakers from GAFAMs are called voice assistants as well although Google Assistant is also in phones or cars…

What is Voice-First?

Voice-first is a trend, a vision, an ambition in which technology developments mainly promote the use of voice interactions in products and services. In this sense, voice would be used as touch is today.

What do we call speech recognition?

Speech recognition refers to the field of computational linguistics that develops technologies and methodologies to recognize human language, speech, and transform it into data, text mostly. In other words, it is the process that allows computers and other devices to recognize and respond to the audio frequencies that are created by human speech.

What is voice recognition?

Voice recognition is used in the same principle as speech recognition, the process that allows computers and other devices to recognize and respond to the audio frequencies that are created by human speech.

For others, voice recognition can be referred to as voice biometrics and speaker recognition, the principle of identifying an individual through the characteristics of its voice.

This dichotomy opposes Speech, the content of the voice, and Voice itself, the object, as the recognized element.

At vivoka, we consider voice recognition as the same concept as speech recognition.

What is voice control?

Voice control allows users to interact with digital devices and applications with their voice as a hands-free user interface. Voice control is a possible use outcome of voice technologies and it relies mainly on the use of Automatic Speech Recognition (also known as speech-to-text).

What is a voice command?

A voice command is the name given to an action that users can initiate on any application that is equipped with speech recognition capacities as well as the solution to understand and process the intent behind the command.

Voice commands are usually formed as orders or instructions: “Increase the fan speed to 70%” or “Go to the main menu” for instance.

What is a voice input?

Voice input is globally the text result from Automatic Speech Recognition (or Speech-to-Text STT) solutions. It is not necessarily a command that will process an action or specific event, they can also simply be dictations (words, sentences, numbers…).

About technologies

What is a Wake Word?

The Wake word is the first element of an end-to-end voice interaction. You surely are most aware of wake words associated with popular voice assistants such as Siri, Google or Alexa. This technology is used to literally “wake” the assistant by listening for a single word of phrase. Once it is detected, the assistant knows that it has to process the voice commands coming right after.

What is Automatic Speech Recognition (ASR)?

Automatic Speech Recognition, often referred to as Speech-to-Text or simply Speech Recognition, is the main technology that makes voice a way to interact. At the pinnacle of Voice AI, it merges together complex solutions such as acoustic models, natural language understanding (NLU) or audio signal processing. This technical stack allows ASR to turn human speech into normalized data (text, intent, values…) that can be processed by complex systems.

What is Voice Biometrics?

Voice biometrics turns voice into an identifying biological data to authenticate or recognize individuals. Sometimes referred to as voice verification or speaker recognition, voice biometrics is the fastest, most frictionless and highly secure access for a range of use cases.

What is Text-to-Speech (TTS)?

Text-to-Speech (also known as TTS, speech or voice synthesis) produces voices from text as its name would suggest. This technology relies on the creation of voice engines able to translate graphems (small text data) into phonemes (small audio data). In order to sound like a human voice, a lot of machine learning would be used to find the most appropriate way to pronounce words and sentences. With the addition of SSML (Speech Synthesis Markup Language), customization can go far beyond that (pitch, timber, level, speed…).

What is Audio Front End (AFE)?

Audio Front End is an agnostic audio signal processing technology. It is commonly used to facilitate voice-enabled HMI (Human-Machine Interactions) using built-in microphones in today’s electronics. AFE allows a more accurate recognition of voice commands (far-field or close-field) in any use environment by removing interfering sounds captured by the microphone. It extracts the user’s voice and cancels out unintended sounds to provide crystal-clear speech recognition and user understanding.

What is Natural Language Processing (NLP)?

Natural language processing (NLP) is a field of computer science closely related to Artificial Intelligence that focuses on allowing computers and digital devices the ability to understand human languages in the same way a human is able to.

What is Natural Language Understanding (NLU)?

Natural Language Understanding (NLU) is a component of Natural Language Processing (NLP) that is focusing on the linguistic applications.

What is Speaker Recognition?

Speaker Recognition can be considered as the concept that results from voice biometrics, the ability to recognize an individual with characteristics from its voice. This process is based on defining a Voice Print (like a fingerprint) to identify and match. It is often used for diarization, the way to parse a discussion per speaker, or to authenticate someone to give a physical or digital access.

What is SSML (Speech Synthesis Markup Language)?

SSML is a markup language made for text-to-speech (or speech synthesis) applications. It is used to apply tweaks and adjustments to synthetic voices. SSML markups can be like <break=2s> to pause the voice for 2 seconds or <prosody pitch=”high”> to make the voice higher on a specific part of speech.

What is Speech-to-Text (STT)?

Speech-to-Text is another naming associated with speech recognition or automatic speech recognition (ASR). The technology behind it is basically the same, it is the ability to automatically transcribe human voice into text as its name would suggest.

What are VUX and VUI (Voice User Experience and Interface)?

VUX and VUI are specialized applications of user experience and interface methodologies that focus on voice-based products and services. They guide companies to think and create seamless and enjoyable user journeys that endorse voice to interact with interface, features or a conversational AI. For more information, check our blog posts.

How is embedded voice technology competing against Cloud ?

Embedded voice technology is a great choice when data privacy and service reliability are mandatory for you. By design, embedded technology has to be compact, lightweight, to fit inside devices and run locally. Embedded is great for expectable and simple use cases, grammar-based commands and complex environments (remote locations…).

Cloud technologies will have better overall performance (deeper understanding, language flexibility…) since they communicate with remote servers that grants them calculation power and storage. But for power you have to sacrifice the reliability of the service (latency and connection shortage can occur) as well as the privacy, even if communications are secured, data is transferred.

About company

When was Vivoka created?

Vivoka is a french company created in 2015 with currently 7 years of experience in the field of voice technologies and conversational AIs.

How many employees are currently at Vivoka ?

Vivoka has a strong team of 40 people with the large majority of it being PhDs and high-end developers for both the R&D projects and the product development.

Is Vivoka recruiting?

Vivoka is constantly looking for new talents to build its teams. From sales and marketing to engineers and developers, if you want to write the future of the voicetech with passion and leadership, contact us! See the open positions right now!

Has Vivoka ever won awards?

Vivoka is proud to be the winner of multiple CES awards (2019 and 2020) in smart home and smart city categories. We also recently have been awarded the IoT Innovation award with the honnors for the “Compliant” project thanks to our private by design technologies.

In which countries is Vivoka based?

Vivoka is based in France, its home market, but has presence and offices in Italy, Germany and Belgium.

How does Vivoka position itself regarding GAFAMs?

Vivoka is much more enterprise-oriented than fellow competitors such as the GAFAMs. Our typical customers are businesses with which we have a deeper relationship, we are partners, not only providers. In this sense, we strive to offer the best support and technologies to create bespoke voice assistants with privacy at their core.

About requirements

What hardware is supported?

We currently support microprocessors only. Most semiconductor brands are compatible with the VDK, you can check our Developers section to see the technologies’ requirements.

What operating systems are supported?

Windows, Linux and Android are currently supported by our technologies. We actively work on further software compatibility.

What are the available programming languages?

C++ and Java are the main programming languages that would be required to develop with our technologies right now. As compatible OSs, we work to enhance the amount of programming languages we can handle.

What type of microphone is recommended for voice technologies?

Since we are not hardware specialists, we can not really suggest a specific type of microphone. Still, we can get you in touch with our partners, just contact us to let us know. Overall, we would suggest running several tests with multiple devices and conditions to make sure that the voice engine is running smoothly.

Can VDK’s technologies run on the Cloud ?

Technically, VDK’s technologies can be hosted online.

You can also go on-premise, which can be the perfect hybrid solution, combining the reliability of local speech processing and having data transfers with servers that cover your properties.