The field of voice technologies is continuously growing. While there was a brief decrease in the interest for voice these past few years, today there is no doubt that voice is experiencing a resurgence. Major advances in hardware and technology, along with a rise in voice-enabled interfaces, is fueling innovative use cases. And in many instances, it’s not only about voice commands, but additional voice technologies such as this blog’s subject : Speaker Recognition.
Speaker Recognition: What do we know about it?
According to Dr. Sadaoki Furui’s definition, “Speaker Recognition is the process of automatically recognizing who is speaking by using the speaker-specific information included in speech waves to verify identities being claimed by people accessing systems; that is, it enables access control of various services by voice”. Fundamentally, it is used to answer the question “Who is speaking?”.
The typical way Speaker Recognition works is synthetized by the following scheme :
As of today, there are two major reasons for using Speaker Recognition: identification and verification. Speaker Identification focuses on determining the origin of a given utterance between different enrolled or registered speakers. In contrast, Speaker Verification works to accept or reject the identity claimed by a speaker.
Technically, it is the number of possible alternatives that really separates identification from verification. In the first case, speaker identification, there are as many alternatives as there are people registered whereas for Speaker Verification, it is a 1:1 matching process and there are only two decisions, acceptance or rejection…
Speaker Recognition as we know it: mostly IVRs in Call Centers
Historically, Speaker Recognition debuted in the late 90’s with some experiments. In fact, between 1996 and 1998, this technology was used at the Scobey–Coronach Border (between the USA and Canada) to enable enrolled local residents with nothing to declare to cross the border when the inspection stations were closed for the night.
Since then, we can see traces of Speaker Recognition technologies in a lot of fields and applications but it always (or at least in a large majority of cases) revolves around Call Centers to enable verification of a claimed customer identity during a conversation with a live agent or IVR interaction (Interactive Voice Response). The technology certainly isn’t new. In 2013, Barclays was among the first adopters of a passive, “free speech,” voice verification system for authenticating private banking clients.
Lately, what we have been used to and are the most deployed fields and use cases are :
- Financial Services in Call Centers or Banking Institutions where most of the use cases are centered around Caller Verification for specific operations.
- Healthcare in multiple areas, from healthcare to legal etc… related to personal database access that needs robust verification techniques.
- Retail and Infomercial : was and still is a place where Speaker Recognition is used to identify a user and verify his or her identity to do certain operations like purchasing an item.
- Hospitality such as Hotels, Medical Appointments… for the same principle, identifying someone that could be present in the client database and verify their identity regarding sensitive operations that involve payment or privacy.
Challenges such as robustness in noisy environments, poor telephony connections, the compression of audio in the contact center, and the ability to efficiently isolate speakers impacted the accuracy of early voice biometric products. As such, the technology was often met with resistance by business leaders believing it lacked the necessary security and/or performance to effectively and efficiently authenticate users. This is now changing.
Advancements and innovations made possible by AI are breaking down technological barriers to voice biometric adoption as rising fraud drives contact centers to replace weak, inefficient authentication methods.
These factors are resulting in new advantages and opportunities for using voice biometrics in the contact center… and elsewhere.
What to expect from the future of Speaker Recognition technologies.
Where most of the developed features related to Speaker Recognition were indeed used in Contact Centers through phone channels, future use cases are driven by upcoming consumer environments and devices that focus on innovative experiences and secured interactions.
With IoT booming, smart equipment is everywhere and capable of many things, and by that we mean being able to understand and answer the user with voice. Smart homes, cars and cities are closer than we think. And for speaker recognition, this is a new playground to expand in.
Speaker and Speech Recognition to build tailored user experiences
Shifting from telephone channels to smart devices, embedded or not. This is what to expect from the future of speaker recognition.
Recognizing a speaker by their voice makes it possible to fully customize any voice experience. With more voice-enabled devices making their way into our daily routines, this ability is undeniably important for companies who strive for better customer satisfaction and engagement.
Smart Speakers for instance would be able to recognize who is speaking and immediately adapt their behaviours and answers depending on the person and their authorizations. If a child tries to make a payment through a voice-enabled service, it would be rejected since the required authorization would not be met.
On top of that, a whole new world of experiences could be addressed when different people are using the same product or service, with a specific workflow for each of them, tailored around preferences and behavioural information.
And what’s even better is that devices do not require to be connected. Speaker recognition technologies can be embedded anywhere thanks to major advances in model size and overall technology footprint.
Our partner IDR&D recently joined our Voice Development Kit to introduce on-device voice biometrics to our set of technologies. Coupled with our embedded ASRs, our customers can now meet this kind of tailored use cases where a speaker can be recognized when voice-interacting with any device.
Added layer of security in customer journey
Where user experience is mostly related to user identification, security is often associated with speaker verification. This side of voice biometrics also has a tumultuous past due to the lack of trust businesses were having for this technology, especially regarding tasks with potential risks.
But, for the same reason that phone channels found a way to be efficient, today’s technologies for speaker recognition can meet enterprise-grade requirements to be considered as a reliable authentication method.
With businesses wanting to increase security and at the same time enhancing their products or services workflows by granting frictionless customer journeys, speaker verification is building its path as one of the most expected solutions.
To reinforce authentication, companies want to find new methods and tools, as well as combining different solutions (2FA with mobile or third party apps like Authenticator). Speaker verification is, in this sense, perfectly adapted to merge with other authentication processes to compensate for their limitations as well as granting intuitive and robust voice-print authentication in physician or digital access use cases for instance.