In the perpetual quest for renewal of the UX (user experience), the voice appears from year to year as inevitable. Successor to touch and facial recognition, its multiple applications, beyond interaction, include the identification of individuals through voice biometrics. According to Rita Singh, a Carnegie Mellon researcher specializing in machine learning applied to voice: “It has been known for centuries that the voice carries a wealth of information.” We can use artificial intelligence to extract that information.
Identifying oneself through our voice was until then something reserved for science fiction, yet we are closer to it than it seems! However, like the various controversies linked to new technologies, is it a reliable and secure process?
What is voice biometrics actually?
“The sound of your voice is becoming a new kind of fingerprint.”
Voice biometrics is a scientific and technological field of speech recognition that aims to develop applications to verify a person’s identity solely through their voice.
In fact, it is prosody that governs the voice. It is the set of vocal characteristics (timbre, pitch, valence etc…) specific to each human being. Forming a true vocal imprint, these characteristics are identified in order to make them correspond to a reference model, thus serving for identification.
Technically, machine learning is very popular in this field of research because it allows a system to improve by itself. It is important to specify this because the reliability of the technology will partly depend on the rate of accuracy it can offer. According to the principle of “Machine Learning”; when information is entered (i.e., a user speaks) the system will take advantage of this data in order to:
- function on the one hand;
- refine its results on the other.
How does it work?
Voice biometrics systems enrol a known person by creating an initial template. Usually, you can merge several templates in order to have a better quality representation of a person’s voice. The initial template is called the enrollment template or enrollment voiceprint.
Basically the authentication consists of two steps :
- Enrollment for the voice template creation from audio file or stream;
- Verification when it comes to the matching of the voice templates.
There are different types of authentication with voice biometrics systems. It can be speaker verification or speaker identification.
Speaker verification
Speaker verification occurs when the voice biometrics system captures a new speech sample, creates a template based on this and compares it to the enrollment template. Simply put, the system already knows the speaker but checks their identity.
Speaker identification
In the case of speaker identification, the system compares a sample from an unknown person to multiple enrollment templates. This aims to find who is the speaker within the set of enrollment templates.
But more than that, voice biometrics can be speech dependent or speech independent. Depending on your use case, you may prefer one over the other. For a more secured system, you can also combine them. Let’s see what is the main difference between these two.
Text dependent voice biometrics
Also known as “active voice biometrics”, it requires the person to say a specific word or phrase. The voice biometrics identification system asks you a specific question and you need to answer it with the correct voice (as some of the voice characteristics are also taken into account). It can be any phrase you have precedently set up but usually it is something like “My voice is my password”.
Text independent voice biometrics
Also called “passive voice biometrics” and doesn’t rely on a key phrase. Indeed, this mode is able to passively listen to a conversation and catch the voice’s specific features in order to identify the person who is speaking. It is based on voiceprint identification. This mode requires creating an enrollment template and a verification one by simply speaking. The necessary duration for the templates’ creation may vary, but keep in mind that the longer, the more accurate.
We can use them together to create a more secured model and they both are compatible with speaker verification and speaker identification systems.
There already are some sectors like banking or call centres that take advantage of voice biometrics technology. Indeed, voice biometric authentication can be used to verify a customer’s identity while they are logging into mobile apps for instance. But call centres also use speaker recognition technology for IVRs.
Can we rely on voice biometrics?
Often imagined and used for authentication, voice biometrics is in the midst of questions about its reliability and security. The risks of fraud also apply to the field of voice, because stealing a code and stealing a voice are both technically feasible.
Voice biometrics technology is safe and constantly progressing
Voice biometrics is now a proven technology used in many cases. For instance, banking players have integrated and experimented with it for years. It allows seamless authentication in minor cases and greatly secured one when it needs to.
Nowadays the technology has made huge advancements and has enhanced its capabilities for security systems. Especially in machine learning these last few years. Indeed, we made some major improvements such as liveness detection (also known as anti-spoofing), which are capable of distinguishing a live voice from a recorded or synthetic one. Machine learning models are adapted to this kind of practice as they refine their accuracy as they occur. Thus, today we have intelligent systems with a huge amount of information, which are proving to be increasingly infallible.
Moreover, voice contains a hundred or so specific characteristics. These, depending on the quality of the audio capture and information processing, make voice a robust means of identification.
Yet, voice biometrics remains a sensible subject. Indeed, behind its attractive functionalities, there are significant risks of loopholes. Keep in mind that, in the field of security, the ingenuity of the people imagining the systems is equivalent to that of those seeking to rout them.
There are technical limits persisting
First of all, like any technology, voice biometrics may make mistakes. It strongly depends on the quality of the samples collected but also on the quality of enrollment templates. Thus, according to the chosen authentication system, there may be more or less “false rejects” and/or “false accepts”.
Mistakes happen
We consider the fact of not confirming the identity of the speaker when they actually are the one from the template as a “false reject”. Whereas, a “false accept” is when the system recognizes a speaker whose identity doesn’t correspond to the template.
In both cases, repercussions occur and may cause strong issues in an organisation. Indeed, from “false reject” can come annoyance and discomfort from end users. At the company-level, we could observe a decrease in the efficiency and even more: a loss in contracts. Let’s imagine your customer’s satisfaction depends on your ability to carry out a task rapidly and you get stuck because your authentication system blocks you (or your end user). You may lose some precious time.
On the other hand, in the case of a “false accept”, you can easily imagine what could happen if a malicious person has access to your company’s datas or files whereas they are not supposed to…
Deep voices
In addition to this, the risks of fraud also apply to the field of voice. Indeed, stealing a code and stealing a voice are both technically feasible. Already heard about deep fakes? Deep voices exist too and are more and more “accurate”. To the point it may fool authentication systems. Nevertheless, liveness detection technology exists and is. Nevertheless, anti-spoofing algorithms exist and are now capable of distinguishing a live voice from a recorded or synthetic one.
The answer as to reliability is therefore mixed:
- Like any mode of identification, flaws exist and will exist;
- Voice is positioning itself as robust enough to deserve its chance and would perfectly complement another identification system.
What can we expect from voice biometrics?
In our opinion, and in the opinion of many experts, we should be using voice biometrics in addition to other more proven authentication methods. In doing so, the respective advantages of the different methods can become complementary. For example, many players already are exploring combining voice and facial identification.
Two-factors authentication
Nowadays, this technology is more and more used as a second factor for authentication (2FA). It has become mandatory since 2021 for online payments in France and a lot of other use cases have followed. Thus, 2FA is becoming a standard of security and voice biometrics could help make it more seamless as the user only has to speak to confirm their identity. As passwords even tend to disappear on certain devices to the profit of other mediums of authentication, biometrics in general provide a good user experience. Indeed, it is part of you and you don’t need to remember anything or reset it every now and then because you have forgotten it (even though it is a good practice for security…).
Other emerging uses
Moreover, short utterance speaker recognition has also evolved and voice biometrics identification systems may now recognise a speaker based on short phrases. This technology even allows different speakers recognition during a conversation for example. This is called diarization and it can be useful in many cases. Indeed, in medical records it allows to differentiate the doctor and the patient speech for instance. It also helps for the analysis of call centres’ datas.
However, biometrics is not just for that! One example you can already use at home if you have a smart speaker is Voice Match. It is the ability of assistants to recognize individuals from the same family. From this comes the advanced personalization of the experience, in terms of preferences, accessibility or authorizations for example. Indeed, it ensures security of your personal information and prevents kids from ordering toys from Amazon without permission for instance.
The VDK includes voice biometrics systems and lets you access all voice technologies (speech recognition, audio front end, wake word, etc.) inside a unique SDK. If you want more information, feel free to contact us!
Sources :
https://www.crim.ca/fr/realisations/Biometrie-vocale-vers-une-identification-incontournable
https://usbeketrica.com/article/reconnaissance-vocale-parle-je-te-dirai-qui-tu-es