You wish to fully customize your own ASR model? You want it to be able to understand words that even yourself have difficulties ton get? Stay with us, today we are going to learn how to create a custom voice command with the Grammar Editor plugin which is available in the Voice Development Kit.
Quick reminder, the Voice Development Kit (VDK) is a toolkit specialized in offline speech technologies. It allows companies to easily and quickly integrate automatic speech recognition, language processing and voice synthesis into their applications and devices.
1) Start by creating your new project with VDK
Choosing the type of project you want to build
To begin this tutorial, we will first start with the creation of the project. With the help of the wizard, you need to create a custom application that will allow you to use the Grammar Editor.
Setting the voice project information and languages
For the creation of the project, you will be asked for 3 pieces of information. You will have to set up the project with a name, a directory and the languages you need.
More than 50 languages are available with the Voice Development Kit. You can even create multilingual voice assistants.
Define the technologies to be developed and integrated (ASR in order to customize voice command grammar)
In this new step, you have to choose between 4 available technologies: Wake up Word (WUW), Automatic Speech Recognition (ASR), Natural Language Understand (NLU) or Text to Speech (TTS).
For this case, we will need to use ASR to work on grammar.
2) Discovering Speech Recognition Grammar Edition Plugin
We will now proceed to the grammar edition for your voice command, this step will take place in 5 steps.
- Click on the file to open the Grammar Editor
- Select the language of your grammar (mostly if you work on a multilingual grammar)
- Write your customized grammar here
- Save and Compile to use your grammar
- Click on “Test” and experiment your grammar with your microphone with pre-recorded audio files
Some element useful to write a grammar:
Every lines should end with a semicolon ;
<x> -> Rule name
[x] -> Optional value
| -> “or”
!function(X, Y)
Useful functions:
- !tag(X, Y); tags (X) words (Y) to make the interpretation of result way easier
- !repeat(X, Y, Z); Repeat the words (X) at least Y times up to Z times
- !pronounce “X” PRONAS “Y”; Replace the pronunciation of a word (X) to the pronunciation of another word (Y). You can put | PRONAS “Z” to add various pronunciations.
3) Speech Recognition Grammar Edition : How-to
<main>: I want a pizza;
Now your assistant will be able to detect an order of a pizza.
Let’s make this assistant smarter.
<main>: I want a [pizza] !tag(PIZZA_TYPE, <pizza>);
<pizza>: margherita | proscuiutto e funghi | capricciosa | vegetariana | calzone;
And now my assistant allows me to select a type of pizza from a predefined list.
To improve the detection of the pizza name when pronouncing the american way or the italian way we added the !pronounce function
!pronounce “capricciosa” PRONAS “caprikiosa” | PRONAS “caprichioza”;
This allow us to setup more expected pronunciation to a word
We improve even more our grammar to allow different verbs of action and the possibility to order a number of pizza
<main>: <verb> (a | <number>) [pizza] <pizza>;
<verb>: I (want | would like);
<number>: !tag(NUMBER, 1 | 2 | 3);
<pizza>: [pizza] !tag(PIZZA_TYPE, margherita | proscuiutto e funghi | capricciosa | vegetariana | calzone);
We also moved the !tag function to keep the <main> rule lisibility
Let’s implement the repeat function so our assistant can take the order for a whole group of person at once
<main>: <verb> !repeat((a | <number>) <pizza> [and [<verb>]], 1, *);
The !repeat function allows the repetition of a segment of our sentence.
As you can see, we added [and [<verb>]] at the end of the bracket. This segment is totally optional but allows more interaction with the assistant.
Examples of valid sentence:
- I want a margherita
- I would like a pizza capricciosa and 2 vegetariana
- I want a margherita, 2 capricciosa and I would like a calzone
With all that food, some drinks would be appreciate
<main>: <verb> !repeat((a | <number>) (<pizza> | <drinks>) [and [<verb>]], 1, *);
<drinks>: [!tag(DRINK_FORMAT, glass | bottle) of] !tag(DRINK_TYPE, water | coca | wine | beer);
And finally, let’s add another action to our assistant, to make it more complete. I want to be able to request the remaining time for my order to be ready.
I create a new rule <time_left> and move the content of <main> to a new rule <order>
The finished and customized voice command grammar
#BNF+EM V2.1;
!grammar ASRENG-US;
!start <main>;
!pronounce “capricciosa” PRONAS “caprikiosa” | PRONAS “caprichioza”;
<main>: <order> | <time_left>;
<time_left>: How (much | many) time (left | remaining) [for (our | my) order];
<order>: <verb> !repeat((a | <number>) (<pizza> | <drinks>) [(and | with) [<verb>]], 1, *);
<verb>: I (want | would like);
<number>: !tag(NUMBER, 1 | 2 | 3);
<pizza>: [pizza] !tag(PIZZA_TYPE, margherita | proscuiutto e funghi | capricciosa | vegetariana | calzone);
<drinks>: [!tag(DRINK_FORMAT, glass | bottle) of] !tag(DRINK_TYPE, water | coca | wine | beer);
Now let’s make an order:
“I want a crapricciosa and a bottle of water.”
Since we put tag, this result is really easy to interpret.
This was a short example showing you one of the ways to think a grammar. We recommend iterative work, as it usually allows to cover a maximum of use cases very easily.
Thank you for reading this article. We hope it will help you find cool ideas for your projects and have fun building grammar!