✨ VDK 6.3 is now available — Powered by Cerence’s latest generation of embedded neural speech recognition technology.

VDK Voice AI Platform

Offline Voice AI
for Professional Applications

The complete suite for scalable Voice AI projects, offering a full development platform with Console, Studio, Developer Toolbox, and Runtime components. Build custom voice-enabled solutions and voice-guided workflows that work offline, on any hardware, with full control over your data and deployment.

Try the VDK 6 Watch the VDK 6 Demo

Voice Technologies

65+

Languages

100% Offline and on Edge

Operate without an internet connection

Hardware Agnostic

Deploy on any device or platform

Fully Customizable

Tailor to your specific workflows

Voice Technologies

All included by default in VDK 6

Six Powerful Voice AI Technologies

Build sophisticated voice experiences with our comprehensive technology suite.

Voice Commands

Intelligent Voice Control

In-app navigation and task execution through voice. Enables frontline workers and caregivers to complete actions faster with less physical interaction, significantly boosting productivity.

✓

Multi-language support

✓

Offline processing

✓

Context-aware responses

Learn More About Voice Commands

Wake Word Detection

Always Listening, Always Ready

Activate the voice interface with custom wake words. Low-power, passive, always-on detection ensures instant response while preserving device battery life. Supports anti-wake words to prevent accidental activations, ensuring the system only wakes up when truly intended.

✓

Custom wake word training

✓

Low power consumption

✓

High accuracy detection

✓

Anti-wake word filtering

Learn More About Wake Word

Voice Synthesis (TTS)

Natural, Human-Like Speech

Delivers clear, adaptive voice instructions for frontline workers and caregivers. Supports speed adjustments for efficiency, volume boosts for clarity, and optimal playback for varied operational environments.

✓

65+ voice options

✓

18 neural TTS languages

✓

SSML support

Learn More About Voice Synthesis

Voice Biometrics

Secure Voice Authentication

Identify and authenticate users by their unique voice characteristics. Provide secure, frictionless access without passwords or PINs.

✓

Speaker authentication

✓

Speaker identification

✓

Anti-spoofing protection

✓

Fast enrollment

Learn More About Voice Biometrics

Audio Enhancement

Crystal Clear Audio Processing

Advanced signal processing to remove noise, echo, and reverberation. Ensure optimal audio quality in any environment for better recognition accuracy.

✓

Noise suppression

✓

Echo cancellation

✓

Beamforming

✓

Gain control

Learn More About Audio Enhancement

Coming Soon

Voice-Text-Input technology demo will be available shortly.

Voice-Text-Input

Free-Form Speech Recognition

Transcribes continuous speech into text with high accuracy. Ideal for documentation, reporting, note-taking, and long-form voice input.

✓

Continuous recognition

✓

High recognition accuracy

✓

Custom vocabulary

✓

Real-time transcription

Coming Soon

Voice Error Correction

High-Accuracy Voice for Real-World Environments

Vivoka makes voice recognition accurate and reliable in the real world — even where traditional ASR fails

ASR Alone Struggles

🔇 Noise limits accuracy

🗣️ Accents create recognition errors

⚡ Fast or natural speech breaks the pipeline

⚠️ Real operational use cases become unreliable

Vivoka Unlocks Accuracy

🎯 Cleans noise intelligently through advanced audio processing

🌍 Adapts to any accent with a lightweight Transformer correction model

💬 Handles fast speech and imperfect pronunciation

📊 Boosts accuracy with context lists that guide correction toward valid sequences

⚙️ Supports very large context lists with no impact on performance

The Impact

77%

Fewer Errors*

*Internal benchmark on real-world alphanumeric use cases

⚡ Faster workflows and smoother task execution

✅ Fewer operator mistakes, even in noisy or multilingual environments

🚀 Broader real-world use cases thanks to higher accuracy and reliability

💎

Innovation Included

VEC technology is part of the Logistics Performance Pack, supporting alphanumeric sequences (1–7 characters) with ultra-low latency (<10 ms) and running directly inside the ASR pipeline with no additional dependencies.

⚡ Ready to deploy

🏅

Industry Standard

Aligned with Gartner's 2025 WMS Critical Capabilities, where usability and voice accuracy are essential in retail & e-commerce fulfillment. VEC delivers the precision required for modern warehouse operations.

New in VDK 6.2

Voice Recognition User Words

Make speech recognition adapt to real people, real pronunciation, and real operational vocabulary by linking user-recorded audio samples to specific words or phrases.

Challenge

Generic models miss what matters most

🗣️Names, acronyms, and industry terms are often misrecognized

🌍Pronunciation varies from one user, region, or team to another

⏱️Manual tuning and correction loops slow down deployment

Why it matters

Better recognition where standard ASR struggles

✅Reduce friction in high-value voice interactions

⚡Improve reliability in production from day one

🎯Increase confidence on custom vocabulary without complexity

🎙️ Personalized recognition

Runtime 6.2 feature

Teach the system how your users actually say the words that matter.

Voice Recognition User Words lets applications capture a user saying anything and associate that audio with a target word or phrase. The result is a more personalized and accurate recognition experience for names, jargon, acronyms, and domain-specific vocabulary — without reworking your existing application logic.

Capture

Record a user-provided audio sample directly from the real speaker in the real context.

Associate

Link that sample to a target word or phrase such as a name, code, acronym, or business term.

Recognize

Improve recognition accuracy in live usage without retraining full models or rebuilding the app.

Ideal for

Employee names

Product references

Acronyms

Medical terms

Warehouse vocabulary

Brand-specific language

What you gain

More Accuracy On custom vocabulary

Improve recognition quality for the words that matter most to your users, workflows, and business environment.

Benefits

Fast to deploy, easy to scale

🔌No changes required to existing application code

🧩Fits naturally into existing voice workflows

📈Scales across users, teams, and operational contexts

Use cases

Built for real operational language

🏥Healthcare terms and practitioner names

📦Logistics references and item identifiers

🏭Industrial jargon and site-specific vocabulary

Try it now →

Global Voice Coverage

65+ Languages

✨ VDK 6.3 update · 23 languages upgraded to Neural STT

Cerence’s latest generation of embedded neural speech recognition technology is now integrated into the VDK, improving speech recognition for multilingual, accented, and noisy real-world environments, with Vivoka’s Speech Enhancement, VEC, and User Words adding further optimization on top.

For Voice Commands & Wake Word (ASR)

For Voice Text Input (STT)

For Voice Synthesis (TTS)

For Human-like Voice Synthesis (TTS)

🔍

🇺🇸

English

United States

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇬🇧

English

United Kingdom

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇫🇷

French

France

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇩🇪

German

Germany

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇪🇸

Spanish

Spain

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇲🇽

Spanish

Mexico

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇮🇹

Italian

Italy

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇧🇷

Portuguese

Brazil

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇵🇹

Portuguese

Portugal

ASR STT TTS Human-Like TTS

🇳🇱

Dutch

Netherlands

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇵🇱

Polish

Poland

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇷🇺

Russian

Russia

ASR STT TTS Human-Like TTS

🇨🇳

Mandarin

China

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇸🇪

Swedish

Sweden

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇳🇴

Norwegian

Norway

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇩🇰

Danish

Denmark

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇨🇿

Czech

Czechia

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇮🇳

English

India

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇬🇷

Greek

Greece

ASR STT TTS Human-Like TTS

🇮🇳

Hindi

India

ASR STT New in VDK 6.3 · Neural STT TTS Human-Like TTS

🇦🇺

English

Australia

ASR STT TTS

🇧🇬

Bulgarian

Bulgaria

ASR STT TTS

🇭🇰

Cantonese

Hong Kong

ASR STT New in VDK 6.3 · Neural STT TTS

🇨🇳

Chinese

Sichuan

ASR STT TTS

🇫🇮

Finnish

Finland

ASR STT TTS

🇨🇦

French

Canada

ASR STT New in VDK 6.3 · Neural STT TTS

🇮🇱

Hebrew

Israel

ASR STT TTS

🇭🇺

Hungarian

Hungary

ASR STT New in VDK 6.3 · Neural STT TTS

🇮🇩

Indonesian

Indonesia

ASR STT TTS

🇯🇵

Japanese

Japan

ASR STT New in VDK 6.3 · Neural STT TTS

🇰🇷

Korean

South Korea

ASR STT TTS

🇹🇼

Mandarin

Taiwan

ASR STT TTS

🇸🇰

Slovak

Slovakia

ASR STT TTS

🇮🇳

Tamil

Tamil Nadu

ASR STT TTS

🇮🇳

Telugu

India

ASR STT TTS

🇹🇭

Thai

Thailand

ASR STT TTS

🇹🇷

Turkish

Turkey

ASR STT TTS

🇸🇦

Arabic

Saudi Arabia

ASR STT New in VDK 6.3 · Neural STT

🇨🇳

Cantonese

China

ASR STT New in VDK 6.3 · Neural STT

🇨🇳

English

China

ASR STT

🇯🇵

English

Japan

ASR STT

🇰🇷

English

South Korea

ASR STT

🇲🇾

English

Malaysia

ASR STT

🌏

Arabic

Persian Gulf

TTS

🌍

Arabic

World

TTS

🇪🇸

Basque

Spain

TTS

🇮🇳

Bengali

India

TTS

🇮🇳

Bhojpuri

Jharkhand

TTS

🇪🇸

Catalan

Spain

TTS

🇨🇳

Mandarin

North-East China

TTS

🇨🇳

Chinese

Shanghai

TTS

🇨🇳

Chinese

Shaanxi

TTS

🇭🇷

Croatian

Croatia

TTS

🇧🇪

Dutch

Belgium

TTS

🏴󠁧󠁢󠁳󠁣󠁴󠁿

English

Scotland

TTS

🇮🇪

English

Ireland

TTS

🇿🇦

English

South Africa

TTS

🌏

Farsi

Persian Gulf

TTS

🇧🇪

French

Belgium

TTS

🇪🇸

Galician

Galicia

TTS

🇮🇳

Kannada

Karnataka

TTS

🇲🇾

Malay

Malaysia

TTS

🇮🇳

Marathi

India

TTS

🇷🇴

Romanian

Romania

TTS

🇸🇮

Slovenian

Slovenia

TTS

🇦🇷

Spanish

Argentina

TTS

🇨🇱

Spanish

Chile

TTS

🇨🇴

Spanish

Colombia

TTS

🇺🇦

Ukrainian

Ukraine

TTS

🇪🇸

Valencian

Valencia

TTS

🇻🇳

Vietnamese

Vietnam

TTS

✨

State-of-the-Art Human-Like TTS & Neural STT

20 languages with human-like speech quality, plus 23 languages with human-like Neural STT introduced in VDK 6.3.

The Components of the Next-Generation Voice AI Platform

Complete Suite for Scalable Voice AI Projects

From management to deployment, everything you need to build and scale voice-enabled solutions

Management Platform

VDK Console

Centralizes project access, role management, and technology assignment within a single collaborative hub. Work from anywhere on any device without local installations or version updates.

Full visibility and control across all projects and teams
Multi-project and multi-user environment support
Real-time access to the latest tools and dashboards

Development Platform

Build, Integrate & Accelerate

VDK Studio

Web-based development environment, always up to date. Design, configure, and test offline voice applications with AI-assisted voice command generation and real-time validation.

Browser-based access
AI Command Builder
One-click translation
Batch Unit Testing

VDK Developer Toolbox

Pre-configured samples, templates, and utilities that simplify setup. Includes package management, sample code, and detailed guides.

Code templates
Package management
Guided documentation

VDK API

Cloud-based solution enabling dynamic management of voice commands across all deployments. Create and update commands instantly without manual file handling.

Dynamic command management
No manual files
Cloud-based

Runtime Platform

Built for real-time execution

Replace traditional request/response bottlenecks with a continuous streaming architecture. VDK Service is designed to handle high-throughput voice data with zero-latency overhead.

VDK Service

Real-time audio processing engine for building end-to-end voice workflows. Design modular pipelines where audio flows from input to processing to output through a structured sequence of Producer, Modifiers, and Consumers.

Each pipeline runs inside a session that manages execution and communication, allowing you to stream audio and receive results instantly. Replace multiple voice services with one cohesive system, deployable across Windows, Linux, and Android as an embeddable runtime, with reliable offline performance even in low-connectivity environments.

Architecture

Modular pipeline architecture with Producer, Modifiers, and Consumers
Modifiers transform audio in real time, including enhancement and channel extraction
Consumers deliver results such as transcription, audio output, storage, and biometrics
Support for parallel outputs so audio is processed once and reused in multiple ways

Execution

Real-time streaming via WebSocket for continuous input and output
Session-based execution, configure first and run on demand
REST API for lifecycle and configuration management

Deployment

Cross-platform support across Windows, Linux, and Android
Embeddable runtime for on-device deployment
Reliable offline performance in low-connectivity environments

VDK SERVICE ENGINE REAL-TIME PIPELINE FLOW

LIVE SESSION Streaming...

PRODUCER

Audio Input

Mic / File / Stream
Send Data through Socket

MODIFIERS

Processing

Noise Suppression / Channel Extraction

CONSUMERS

Transcription (ASR)

Speaker Biometrics

File Persistence

Low-Latency Player

Receive Data from Socket

Built for real-time execution

Think in sessions for execution and pipelines for flows

Stream audio continuously instead of relying on request/response

Build once and scale from simple flows to more complex systems

Ready to build?

Start with a simple pipeline and scale up as needed.

Business Benefits

Transform your business with strategic advantages

Fast Return on Investment

Proven impact with faster onboarding, improved productivity, safer operations, and measurable ROI in 6–9 months

Enhanced Safety

Guaranteeing that only authorized individuals can access critical systems and workflows

Simplified Operations

Ensures consistent performance across a diverse set of hardware equipment

Simplified Onboarding

Enabling faster setup and reducing training time

Enhanced Worker Satisfaction

By delivering clear and responsive communication

Support for Worker Diversity

Adaptable to various accents, dialects, and languages

Ready to Transform?

Discuss with our team how you can transform your solutions today

Get Started

Offline Voice AI for Professional Applications

Six Powerful Voice AI Technologies

Voice Commands

Wake Word Detection

Voice Synthesis (TTS)

Voice Biometrics

Audio Enhancement

Coming Soon

Voice-Text-Input

High-Accuracy Voice for Real-World Environments

ASR Alone Struggles

Vivoka Unlocks Accuracy

The Impact

Innovation Included

Industry Standard

Voice Recognition User Words

Generic models miss what matters most

Better recognition where standard ASR struggles

Teach the system how your users actually say the words that matter.

Capture

Associate

Recognize

Ideal for

What you gain

Fast to deploy, easy to scale

Built for real operational language

65+ Languages

Complete Suite for Scalable Voice AI Projects

VDK Console

Build, Integrate & Accelerate

VDK Studio

VDK Developer Toolbox

VDK API

Built for real-time execution

VDK Service

Built for real-time execution

Ready to build?

Business Benefits

Fast Return on Investment

Enhanced Safety

Simplified Operations

Simplified Onboarding

Enhanced Worker Satisfaction

Support for Worker Diversity

Ready to Transform?

VDK 6.3 introduces Neural STT

More reliable natural speech capture

Voice AI beyond command workflows

Natural notes and comments

Exception reporting

AI assistants and copilots

Hands-free documentation

Field service workflows

No workflow rewrite

Offline Voice AI
for Professional Applications