Offline Voice Solution – Voice Development Kit

Create your voice assistant or any voice-enabled feature with our specialized software development kit (SDK). This offline voice solution contains technologies such as speech recognition and synthesis as well as many other features to help you develop faster and easier.

Voice-specialized software development kit (SDK) for embedded systems


The Voice Development Kit (VDK) is a multifunctional software development kit combined with an intuitive graphical user interface specialized for voice use case. It allows any company and any developer to configure an offline voice solution composed of one or more voice-based technologies (transcription, synthesis, grammar management…) in record time.

The inherent complexity in the technologies and associated plugins has been abstracted and optimized by our know-how obtained through 5 years of projects and engineering.

Offline technologies embedded in the Voice Development Kit come from many players, us included, to guarantee the best versatility to our clients.

Every required technology in a single software solution with graphical interface

Over 60 languages available working offline (Private by Design)

An intuitive graphic user interface

All-in-one solution

Numerous plugins are to be used in parallel to carry out the creation of voice functionalities. With VDK, everything is on the same software and the same interface in one view. Access every offline voice solution and powerful tools very simply.

Simplified workflows

Development processes have been rethought through the VDK. From the grammar setting, through the code, to the testing and integration phase, everything takes place on the VDK interface. Furthermore, all tools run on the same operating system.

(Main dashboard view of the Voice Development Kit with active plugins)

A wide range of plugins and smart tools

Simple Assistant Maker

SAM is a tool imagined and created by Vivoka to create any functionnal offline voice commands in a minimum amount of time. This plugin needs a command to be recognized, a response to be given and a script to be executed to work. All commands are testable in real time.

Customization modules

Grammar Editor, COPFile Editor, Voice Studio… All these tools are at your disposal for the different stages of creating offline voice features: creating recognized vocabularies, customizing synthetic voices, managing dynamic files…

(Plugin view with the complete of available voices for speech synthesis)

Tutorials and clear documentation

Detailed tutorials

It is often complicated to find your way around in the early stages of creating a voice interface. To reduce this complexity, tutorials (creation of an ASR, a TTS…) are available to accompany you step by step in the development of your offline voice solution.

Sample codes & doc

The most popular use cases are associated with example codes to facilitate the understanding of the technologies. For more complex cases, complete documentation and our services are available through the Voice Development Kit platform.

(Simple Assistant Maker plugin and opened tutorial window on the right)

Mastered costs

Voice Development Kit is able to produce an offline voice solution or interface in record time, thus reducing HR and technology costs.


Granting you faster and easier designing, developing and testing capacities reduces the time needed to produce an operationnal solution.

State of the Art

Voice Development Kit provides the current best embedded/offline voice technologies, born from our R&D and our partners solutions.


More than 60 languages are available (ASR, TTS, Wake Word or NLP) to configurate your local voice features according to your users and targeted areas.

Private by Design

Voice Development Kit Studio is a fully offline voice-enabling software. All its technologies run without internet connection, thus preventing data leaks.

Low CPU Usage

Our offline technologies are designed to run on low specifications embedded systems. The CPU usage is reduced to seemlessly operate.

New features available in the Voice Development Kit

Embedded Free Speech Technology

Unlike grammar-based ASRs which reduce understanding capacities for a better precision, Free Speech is an ASR model made to transcribe as much words as possible. As usual, this technology works 100% embedded into device (microprocessor and microcontroler for instance) with low power and storage requirements.

Phonetic Editor Plugin

This tool allows you to define and try specific phonetic for your chosen words. Phonetic translations can then be used to build grammar and refine the way your voice commands are said or understood, either for Wake Word, ASR or TTS usage.

New Set of Embedded Voices for TTS

This second version of the Voice Development Kit comes with a new range of voices for TTS applications. These new samples are available in multiple qualities (thus reducing data storage), genders and emotions to customize your use case and comply with your hardware needs.

Voice Development Kit Phonetic Editor

(Phonetic Editor interface on the Voice Development Kit Studio view)

Some companies that chose the Voice Development Kit for their offline voice solution

Since its creation, the Voice Development Kit has convinced many players in various sectors of activity. In order to best reflect their inventiveness and the performance of our tool, we carry out case studies.

Your project has never been that close to its solution!

Browsing through our projects and technologies might have give you some insights about the possibilities you have by working with us. We can further help you to achieve your goals.

Operating Software for embedded SDK platform

  • Windows: 32-bit and 64-bit
  • Linux x86: 32-bit and 64-bit



– OS-dependent API binding and packaging

  • Android: ASR (within VDK) will be served with a Java-API binding compiled into an Android archive (AAR)
  • Win/Linux: ASR (within VDK) will come with C++ API binding
  • Apple iOS: ASR (within VDK) will come with Swift binding

– Standard ports and Tools

  • iOS (version 7.0 and up): arm64 and x86_64
  • Android (version 5.0 and up): armv7 (32Bit), arm64 and x86_64
  • Linux: armv7 (32Bit), arm64 and x86_64
  • Windows: x86_64

– Functionnality code size

  • Basic command & control (C&C) application: 3.2MB
  • Full Fonctionality, largest accoustic models: 9.5MB

– Components and relative data size per language

  • Accoustics models, per language
    • Gen 4 compact: 900kB
    • Gen 5: approx. 4MB
    • Gen 6: approx. 6MB
  • GLIC – mono-lingual – General purpose transcriptions: 300-7300kB
  • GLC – multi-lingual – Music collection compilation: 700-3000kB

– Components and relative data size per language and total RAM usage

  • Digit Recognition: 4kB / 1,25MB
  • Basic C&C application 100/10,000 commands: 10-500kB / 1,3-1,8MB
  • Telephony (voice-activated dialing) with grammars + SLMs, including NLU. 1350 contacts: 0,52MB / 12,6MB
  • 1-shot voice destination entry POI & addresses (UDE) all USA, FST based, including NLU: 300MB / 56 MB
  • Embedded dictation: 100MB / 100MB

– Standard ports and Tools

  • Linux ARM: ARM32 Hardfp, ARM32 Softfp, ARM64
  • Android v4.0 (Ice Cream Sandwich), API level 14+, ARM32-v7a Android v7.0 (Nougat), API level 24+, ARM64-v8a
  • iOS: arm64, armv7, armv7s, i386 and x86_64 simulator

– Data and code size

The code size for a fully featured TTS Embedded engine is 10 to 13.5 MB depending on the target platform. This can be optimized based on required language set, feature and compiler choices.

– Voice Operating Point (VOP) with relative flash size (w/o code) and RAM usage

  • Embedded Compact – Small versatile TTS suited for constrained platforms
    • Flash Size: Ave. 10MB / Max. 21MB
    • RAM Usage: Ave. 6MB / Max. 23MB
  • Embedded Pro – High quality TTS optimized for navigation, info readout and reading capabilities
    • Flash Size: Ave. 55MB / Max. 131MB
    • RAM Usage: Ave. 14MB / Max. 38MB
  • Embedded High – High quality TTS read-out for SMS, news, e-mail on embedded targets
    • Flash Size: Ave. 120MB / Max. 325MB
    • RAM Usage: Ave. 24MB / Max. 69MB
  • Embedded Premium – Highest quality deep learning based concatenative synthesis, selected voice only
    • Flash Size: Ave. 337MB / Max. 558MB
    • RAM Usage: Ave. 159MB / Max. 198MB

Mutli-lingual voices include recorded material for one or several foreign languages. They are released for all operating points except Embedded Compact and require up to 50% more memory (flash and RAM) compared to the numbers above.

Tech Requirements

Voice Development Kit is an offline voice technology (embedded) that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the VDK’s solutions and make it work properly to perform your use cases.

Frequently Asked Questions on VDK

A few things to know…

Voice Development Kit can be a complex software to understand and adopt. We cover some of the recurrent topics about it in order to give you insights.

How many language are available to use in the Voice Development Kit?

30+ for ASR and Wake Word, 50+ for TTS. If you want the complete list of languages, please contact us

Which programming languages are used to develop with VDK?

For nom, C++ is the main languages available. For mobile devices, Java (Android) and Swift (iOS) are to be used

What are the technical specifications for integrating VDK's technologies?

Voice Development Kit’s specifications are essential for its technologies integration. To get access to this information, please contact us

Is there access to documentation and help to use the tools?

Complete documentation is available with the VDK’s interface. Multiple tutorials are also at your disposal to help you.

Do you provide support and formation for our tech teams?

Our experts can provide complete support to help you with building your projects. Knowledge shared stays with you.

Is there any interest in using the VDK for a single offline voice solution?

Developing with many technologies is as difficult as using only one. VDK is designed to be easier and faster than any other solution

More technologies to discover

Voice Development Kit

Build a voice assistant or interface in record time with our all-in one voice solution

Embedded Voice Synthesis

Automatic generation of multilingual natural voices that runs offline on device

Wake Word Detection Tool

Easy tool to generate multilingual wake word to embed in devices