Text-to-Speech that produces natural voices, offline

Our easy-to-embed offline text to speech SDK (software development kit) to produce lifelike voices in 65 languages for any nomad, mobile or embedded system.

Try the technology Contact us

Getting familiar with the technology

Offline Text-to-Speech (or speech synthesis), what is it?

Text-to-Speech (TTS), also referred to as speech synthesis, is a technology that generates speech from written text. Its fundamental process involves the conversion of graphemes (written characters) into their corresponding phonemes (speech sounds).

Through machine learning, the TTS system is able to accurately and naturally pronounce words and sentences, emulating the nuances of human speech. With the addition of SSML (Speech Synthesis Markup Language), extensive customization options allow for adjustments in pitch, timbre, volume, speed, and more, offering a personalized and lifelike auditory experience.

Features

A fully customizable and offline Text-to-Speech engine

Different voice, gender, quality to choose from…

We proudly gather a very heterogeneous set of TTS voices. Our resources cover multiple genders and voices of different age ranges, in more than 60 different languages.

Voice quality is also important. To comply with most hardware requirements, we provide different quality levels: from compact to high. It allows our users to chose the perfect quality/resource size ratio.

Give it a twist with customized SSML

Speech Synthesis Markup Language (SSML) is a specific language used to transform the way TTS engines are reading the provided text.

Tone, height, pitch, timbre, speed, emphasis… are the kind of parameters that can be personnalized.

A typical SSML tag looks like this : <sub alias=”Voice Development Kit”>VDK</sub> to pronounce the word VDK as a whole instead of spelling it.

Use cases and existing applications

How to leverage offline Text-to-Speech in the field?

Accessibility

Speech synthesis to create audio content for impaired users and enhance service accessibility.

Conversational

Produce life-like voices for voice assistants or interactive voice response (IVR).

Information

Ability to provide hands-fee instructions or other types of informations while preserving safety and focus.

Experience

Humanize any product or service with a natural voice that can be customized with emphasis.

Translation

Create a speech-to-speech translation system that says out-loud the result of translated content.

Announcement

Flexible and modular announcement system with offline text-to-speech for public transportation services.

Adopting voice solutions

in your business starts here

Get in touch with our team to shift your company in the Voice First world! try it now.

Contact our experts

Try it now

Benefits

Why should you choose our
Text-to-Speech (TTS) solution?

Offline

No wifi or network are required to produce advanced, lifelike voices.

Small Footprint

Size flexibility from 726kB up to 580mB depending on the quality.

Custom TTS

Add SSML (Speech Synthesis Markup Language) to fine tune your TTS.

Real-time Processing

On-device means 0 latency, granting a real-time voice AI experience.

Multilingual Voices

We offer 60 different languages and 115+ voices in which you will most certainly find what you need for now, and the future.

Cross-Platform

We support PCM audio output to offer a variety of audio formats and sampling rates to make it easier than it seems.

For developers, by developers

Try our voice technologies now

1

Sign up first on the Console

Before integrating with VDK, test our online playground: Vivoka Console.

2

Develop and test your use cases

Design, create and try all of your features.

3

Submit your project

Share your project and talk about it with our expert for real integration.

Sign up on Console

Requirements & Quick-Start

How to develop with our offline
Text-to-Speech engine?

Go to developers section

Our offline Text-to-Speech engine offers the ability to produce life-like voices in more than 65 languages. We cover 180+ different voices, male/female, of different age range to fit your branding requirements.

The voices come in 5 different quality type:

Compact
High
Premium
Pro

Compact is the lightest, Pro is the heaviest of them. Regarding of the resource size, the voice quality its “natural” will be impacted.

– Language count: 65

– Resource Size:

Compact: 1 to 30MB
Pro: 5 to 100MB
High: 30 to 300MB
Premium: 40 to 500MB

– SDK Code Size: from 5MB to 65MB

– Supported Hardware: Microprocessor Units

– Supported Platforms:

Windows – x86_64
Linux – x86_64 | armv7hf | armv8
Android 6.0 (API 23)

– On the Device

Fully-embedded voice technology for brands seeking the convenience of a voice user interface without the privacy or connectivity concerns of the internet. Full access to custom commands and the ability to instantly update command codes during development make voice-enabling your product fast and easy.

– On Premise

Get the power of cloud connectivity combined with the reliability of embedded voice technology. On premise (or hybrid) solutions ensure that your device is always-on and responsive to commands. Seamlessly push product updates and deliver a broader voice experience with the level of cloud-connectivity that best matches your product and users.

Complementary Technologies

Discover other technologies in our stack

Explore the VDK

Wake Word

Trigger speech recognition process by detecting a unique word or sentence.

Speech Recognition

Turn human speech into text data that can be processed by complex systems.

Voice Biometrics

Seamlessly identify or authenticate users by recognizing their voice pattern.

Audio Enhancement

Enhance the audio signal quality from voice to boost speech recognition’s accuracy.

It's always the right time to learn more about voice technologies and their applications

Browse our content

Discover, Latest, Technology

Beyond Naïve RAG: Advanced Retrieval for Conversational AI on Long DocumentsWith the rise of powerful Large Language Models (LLMs) and the rapid adoption of Generative AI, intelligent...

Text-to-Speech that produces natural voices, offline

Offline Text-to-Speech (or speech synthesis), what is it?

A fully customizable and offline Text-to-Speech engine