Offline Voice Solution – Voice Development Kit
Create your voice assistant or any voice-enabled feature with our specialized software development kit (SDK). This offline voice solution contains technologies such as speech recognition and synthesis as well as many other features to help you develop faster and easier.
Voice-specialized software development kit (SDK) for embedded systems
The Voice Development Kit (VDK) is a multifunctional software development kit combined with an intuitive graphical user interface specialized for voice use case. It allows any company and any developer to configure an offline voice solution composed of one or more voice-based technologies (transcription, synthesis, grammar management…) in record time.
The inherent complexity in the technologies and associated plugins has been abstracted and optimized by our know-how obtained through 5 years of projects and engineering.
Offline technologies embedded in the Voice Development Kit come from many players, us included, to guarantee the best versatility to our clients.
Every required technology in a single software solution with graphical interface
Over 60 languages available working offline (Private by Design)
An intuitive graphic user interface
Numerous plugins are to be used in parallel to carry out the creation of voice functionalities. With VDK, everything is on the same software and the same interface in one view. Access every offline voice solution and powerful tools very simply.
Development processes have been rethought through the VDK. From the grammar setting, through the code, to the testing and integration phase, everything takes place on the VDK interface. Furthermore, all tools run on the same operating system.
(Main dashboard view of the Voice Development Kit with active plugins)
A wide range of plugins and smart tools
Simple Assistant Maker
SAM is a tool imagined and created by Vivoka to create any functionnal offline voice commands in a minimum amount of time. This plugin needs a command to be recognized, a response to be given and a script to be executed to work. All commands are testable in real time.
Grammar Editor, COPFile Editor, Voice Studio… All these tools are at your disposal for the different stages of creating offline voice features: creating recognized vocabularies, customizing synthetic voices, managing dynamic files…
(Plugin view with the complete of available voices for speech synthesis)
Tutorials and clear documentation
It is often complicated to find your way around in the early stages of creating a voice interface. To reduce this complexity, tutorials (creation of an ASR, a TTS…) are available to accompany you step by step in the development of your offline voice solution.
Sample codes & doc
The most popular use cases are associated with example codes to facilitate the understanding of the technologies. For more complex cases, complete documentation and our services are available through the Voice Development Kit platform.
(Simple Assistant Maker plugin and opened tutorial window on the right)
Voice Development Kit is able to produce an offline voice solution or interface in record time, thus reducing HR and technology costs.
Granting you faster and easier designing, developing and testing capacities reduces the time needed to produce an operationnal solution.
State of the Art
Voice Development Kit provides the current best embedded/offline voice technologies, born from our R&D and our partners solutions.
More than 60 languages are available (ASR, TTS, Wake Word or NLP) to configurate your local voice features according to your users and targeted areas.
Private by Design
Voice Development Kit Studio is a fully offline voice-enabling software. All its technologies run without internet connection, thus preventing data leaks.
Low CPU Usage
Our offline technologies are designed to run on low specifications embedded systems. The CPU usage is reduced to seemlessly operate.
New features available in the Voice Development Kit
Embedded Free Speech Technology
Unlike grammar-based ASRs which reduce understanding capacities for a better precision, Free Speech is an ASR model made to transcribe as much words as possible. As usual, this technology works 100% embedded into device (microprocessor and microcontroler for instance) with low power and storage requirements.
Phonetic Editor Plugin
This tool allows you to define and try specific phonetic for your chosen words. Phonetic translations can then be used to build grammar and refine the way your voice commands are said or understood, either for Wake Word, ASR or TTS usage.
New Set of Embedded Voices for TTS
This second version of the Voice Development Kit comes with a new range of voices for TTS applications. These new samples are available in multiple qualities (thus reducing data storage), genders and emotions to customize your use case and comply with your hardware needs.
(Phonetic Editor interface on the Voice Development Kit Studio view)
Some companies that chose the Voice Development Kit for their offline voice solution
Since its creation, the Voice Development Kit has convinced many players in various sectors of activity. In order to best reflect their inventiveness and the performance of our tool, we carry out case studies.
Operating Software for embedded SDK platform
- Windows: 32-bit and 64-bit
- Linux x86: 32-bit and 64-bit
– OS-dependent API binding and packaging
- Android: ASR (within VDK) will be served with a Java-API binding compiled into an Android archive (AAR)
- Win/Linux: ASR (within VDK) will come with C++ API binding
- Apple iOS: ASR (within VDK) will come with Swift binding
– Standard ports and Tools
- iOS (version 7.0 and up): arm64 and x86_64
- Android (version 5.0 and up): armv7 (32Bit), arm64 and x86_64
- Linux: armv7 (32Bit), arm64 and x86_64
- Windows: x86_64
– Functionnality code size
- Basic command & control (C&C) application: 3.2MB
- Full Fonctionality, largest accoustic models: 9.5MB
– Components and relative data size per language
- Accoustics models, per language
- Gen 4 compact: 900kB
- Gen 5: approx. 4MB
- Gen 6: approx. 6MB
- GLIC – mono-lingual – General purpose transcriptions: 300-7300kB
- GLC – multi-lingual – Music collection compilation: 700-3000kB
– Components and relative data size per language and total RAM usage
- Digit Recognition: 4kB / 1,25MB
- Basic C&C application 100/10,000 commands: 10-500kB / 1,3-1,8MB
- Telephony (voice-activated dialing) with grammars + SLMs, including NLU. 1350 contacts: 0,52MB / 12,6MB
- 1-shot voice destination entry POI & addresses (UDE) all USA, FST based, including NLU: 300MB / 56 MB
- Embedded dictation: 100MB / 100MB
– Standard ports and Tools
- Linux ARM: ARM32 Hardfp, ARM32 Softfp, ARM64
- Android v4.0 (Ice Cream Sandwich), API level 14+, ARM32-v7a Android v7.0 (Nougat), API level 24+, ARM64-v8a
- iOS: arm64, armv7, armv7s, i386 and x86_64 simulator
– Data and code size
The code size for a fully featured TTS Embedded engine is 10 to 13.5 MB depending on the target platform. This can be optimized based on required language set, feature and compiler choices.
– Voice Operating Point (VOP) with relative flash size (w/o code) and RAM usage
- Embedded Compact – Small versatile TTS suited for constrained platforms
- Flash Size: Ave. 10MB / Max. 21MB
- RAM Usage: Ave. 6MB / Max. 23MB
- Embedded Pro – High quality TTS optimized for navigation, info readout and reading capabilities
- Flash Size: Ave. 55MB / Max. 131MB
- RAM Usage: Ave. 14MB / Max. 38MB
- Embedded High – High quality TTS read-out for SMS, news, e-mail on embedded targets
- Flash Size: Ave. 120MB / Max. 325MB
- RAM Usage: Ave. 24MB / Max. 69MB
- Embedded Premium – Highest quality deep learning based concatenative synthesis, selected voice only
- Flash Size: Ave. 337MB / Max. 558MB
- RAM Usage: Ave. 159MB / Max. 198MB
Mutli-lingual voices include recorded material for one or several foreign languages. They are released for all operating points except Embedded Compact and require up to 50% more memory (flash and RAM) compared to the numbers above.
Voice Development Kit is an offline voice technology (embedded) that is made to be integrated into devices. To do so, these products need to meet specific criteria to handle the VDK’s solutions and make it work properly to perform your use cases.
Frequently Asked Questions on VDK
A few things to know…
Voice Development Kit can be a complex software to understand and adopt. We cover some of the recurrent topics about it in order to give you insights.
How many language are available to use in the Voice Development Kit?
30+ for ASR and Wake Word, 50+ for TTS. If you want the complete list of languages, please contact us
Which programming languages are used to develop with VDK?
For nom, C++ is the main languages available. For mobile devices, Java (Android) and Swift (iOS) are to be used
What are the technical specifications for integrating VDK's technologies?
Voice Development Kit’s specifications are essential for its technologies integration. To get access to this information, please contact us
Is there access to documentation and help to use the tools?
Complete documentation is available with the VDK’s interface. Multiple tutorials are also at your disposal to help you.
Do you provide support and formation for our tech teams?
Our experts can provide complete support to help you with building your projects. Knowledge shared stays with you.
Is there any interest in using the VDK for a single offline voice solution?
Developing with many technologies is as difficult as using only one. VDK is designed to be easier and faster than any other solution