Voice biometrics is a field of technology that involves identifying an individual by their voice. We can already see signs of it in many of the systems we interact with daily (Siri, OK Google, Alexa…!). Naturally, the buzz around the possibilities of this new technology has helped generate its share of myths and false beliefs. We will attempt to set the record straight on the current state of affairs in voice biometrics with this article, based on a talk given by CRIM researcher Gilles Boulianne at Desjardins Lab on March 1, 2018.
First off, it must be said…
Voice biometrics…aren’t actually biometrics.
Biometrics refers to the analysis of a person’s physical characteristics. This means using unique, measurable features to establish an individual’s identity with high reliability. There are many examples of biometrics: fingerprints, iris scans, etc.
The human voice, however, does not present physically measurable characteristics. Moreover, a voice varies according to the speaker’s demeanour, state of health, emotional state, age, etc. Unlike our fingerprints, our voice differs dramatically depending on whether we’ve just rolled out of bed or delivered the karaoke performance of a lifetime!
Source : GIPHY
Despite its relative inaccuracy, voice biometrics is nevertheless likely to spread more and more widely, as it offers a number of advantages. Firstly, there’s no need for specialized equipment or intrusive physical contact. What’s more, voice is the natural means of human communication. As a result, voice identification is becoming much more organically integrated into our daily rituals, as demonstrated by the modules already used by some smartphones.
If I have a cold, will the system recognize me?
The wide variability of a voice under different circumstances is indeed a big problem for researchers, but they’ve been working on it for decades and are making steady progress. We can now identify a voice more accurately than ever, regardless of nasal congestion or ambient noise.
Voice biometric systems are not trained solely on the voices of the few individuals who use them to identify themselves. Instead, the systems maintain a model trained on tens of thousands of speakers’ voice samples from various databases. This helps the system “learn” how the human voice can vary from one individual to another, so it can recognize you despite the frog in your throat!
Is it truly safe? Can someone hack my voice?
Yes and no. Some types of fraud are easy to detect and avoid, but others are more difficult to prevent, at least with current systems.
In systems where each user has a fixed password and only their voice is used as an identifier, it is easy to record a person’s voice and play it over the telephone to gain access to their account.
The most sophisticated systems ward off such attacks by using variable passwords, such as a series of numbers. When you want to log in, the system asks you to say 4-5 numbers, each time in a different order. So there’s no point in someone recording as you say your password!
Could someone successfully imitate my voice?
If a (human) impersonator or actor tries to pass themself off as you by modifying their voice, it won’t work. Voice biometrics systems use a wide variety of sophisticated factors to identify a voice and a human impersonator – no matter how talented! – will almost never be able to outsmart them. So even if Jim Carrey is highly motivated to defraud you, you have nothing to fear!
Can you recreate someone else’s voice on a computer?
The answer is yes. In fact, if you have 20 minutes of recordings of a person’s voice, you can use specific software programs to build a model of their voice and turn it into a synthesized voice that can be made to say whatever you want. Such attacks are the most dangerous, and researchers worldwide are working hard to prevent them. Their efforts are focused primarily on detecting very subtle sound artifacts to determine whether a voice is synthesized or human.
And what about vocal doppelgangers?
It’s the only type of attack for which we don’t yet have a solution, but it’s also the rarest… as far as we know, it has never happened!
In fact, the researchers figure that with billions of humans on the planet, there’s bound to be someone somewhere with the same voice as us or a voice similar enough to fool the system. So if that person wanted to, they could indeed access our accounts. But the chances of them speaking our language, finding us and being highly motivated to defraud us are seriously limited!
Source : GIPHY
Bonus CSI question:
I have a recording of someone committing a crime. Can I record a suspect’s voice and prove it’s the same person using voice biometrics?
In a nutshell: no.
In fact, when we try to authenticate ourselves by voice, the system compares the voice it captures with a reference recording of our voice provided when we registered with the system. The system then produces a similarity score between the two voices, and it’s up to the human programmer to determine at what score two voices can be accepted as belonging to the same person. This limit varies according to the type of information we’re trying to access: it will probably be higher for accessing our bank account than for asking Google Home to dim the lights.
So, each biometric system provides a similarity score after comparing two voices. Moreover, the similarity score only makes sense in the context of a comparison: it’s always relative. It’s like giving a person’s age (a fixed, measurable datum) and then asking whether they are old (a close and comparative datum): the answer will be more complex and vary according to the point of view.
The same applies to voice biometry systems, which process and analyze many voice samples. They can determine how similar two samples are because they have several other recordings in the bank made under the same conditions to serve as reference points. If you only have two recordings taken on their own and no reference population to compare them with, asserting that you can identify someone beyond any doubt is hazardous. The legal weight of voice biometric evidence has yet to be established
And before that can happen, it will be necessary to determine what score is sufficient for voice identification to be used as evidence in a trial, and what biometric system will be used and accepted.
In short, Batman and Robin can rest easy… it will be some time before the Gotham police can track down the Dynamic Duo using only their legendary pronouncements!
Additional sources of information