Blog

Hiba Behidj

Artificial Intelligence

August 24, 2018

Should we be afraid… of voice biometrics?

Voice biometrics is a field of technology that involves identifying an individual by their voice. We can already see signs of it in many of the systems we interact with daily (Siri, OK Google, Alexa…!). Naturally, the buzz around the possibilities of this new technology has helped generate its share of myths and false beliefs. We will attempt to set the record straight on the current state of affairs in voice biometrics with this article, based on a talk given by CRIM researcher Gilles Boulianne at Desjardins Lab on March 1, 2018.

Okay.

First off, it must be said…

Voice biometrics…aren’t actually biometrics.

Biometrics refers to the analysis of a person’s physical characteristics. This means using unique, measurable features to establish an individual’s identity with high reliability. There are many examples of biometrics: fingerprints, iris scans, etc.

The human voice, however, does not present physically measurable characteristics. Moreover, a voice varies according to the speaker’s demeanour, state of health, emotional state, age, etc. Unlike our fingerprints, our voice differs dramatically depending on whether we’ve just rolled out of bed or delivered the karaoke performance of a lifetime!

Source : GIPHY

Despite its relative inaccuracy, voice biometrics is nevertheless likely to spread more and more widely, as it offers a number of advantages. Firstly, there’s no need for specialized equipment or intrusive physical contact. What’s more, voice is the natural means of human communication. As a result, voice identification is becoming much more organically integrated into our daily rituals, as demonstrated by the modules already used by some smartphones.

If I have a cold, will the system recognize me?

The wide variability of a voice under different circumstances is indeed a big problem for researchers, but they’ve been working on it for decades and are making steady progress. We can now identify a voice more accurately than ever, regardless of nasal congestion or ambient noise.

Voice biometric systems are not trained solely on the voices of the few individuals who use them to identify themselves. Instead, the systems maintain a model trained on tens of thousands of speakers’ voice samples from various databases. This helps the system “learn” how the human voice can vary from one individual to another, so it can recognize you despite the frog in your throat!

Is it truly safe? Can someone hack my voice?

Yes and no. Some types of fraud are easy to detect and avoid, but others are more difficult to prevent, at least with current systems.

Recording attacks

In systems where each user has a fixed password and only their voice is used as an identifier, it is easy to record a person’s voice and play it over the telephone to gain access to their account.

The most sophisticated systems ward off such attacks by using variable passwords, such as a series of numbers. When you want to log in, the system asks you to say 4-5 numbers, each time in a different order. So there’s no point in someone recording as you say your password!

Could someone successfully imitate my voice?

If a (human) impersonator or actor tries to pass themself off as you by modifying their voice, it won’t work. Voice biometrics systems use a wide variety of sophisticated factors to identify a voice and a human impersonator – no matter how talented! – will almost never be able to outsmart them. So even if Jim Carrey is highly motivated to defraud you, you have nothing to fear!

Can you recreate someone else’s voice on a computer?

The answer is yes. In fact, if you have 20 minutes of recordings of a person’s voice, you can use specific software programs to build a model of their voice and turn it into a synthesized voice that can be made to say whatever you want. Such attacks are the most dangerous, and researchers worldwide are working hard to prevent them. Their efforts are focused primarily on detecting very subtle sound artifacts to determine whether a voice is synthesized or human.

And what about vocal doppelgangers?

It’s the only type of attack for which we don’t yet have a solution, but it’s also the rarest… as far as we know, it has never happened!

In fact, the researchers figure that with billions of humans on the planet, there’s bound to be someone somewhere with the same voice as us or a voice similar enough to fool the system. So if that person wanted to, they could indeed access our accounts. But the chances of them speaking our language, finding us and being highly motivated to defraud us are seriously limited!

Source : GIPHY

Bonus CSI question:

I have a recording of someone committing a crime. Can I record a suspect’s voice and prove it’s the same person using voice biometrics?

In a nutshell: no.

In fact, when we try to authenticate ourselves by voice, the system compares the voice it captures with a reference recording of our voice provided when we registered with the system. The system then produces a similarity score between the two voices, and it’s up to the human programmer to determine at what score two voices can be accepted as belonging to the same person. This limit varies according to the type of information we’re trying to access: it will probably be higher for accessing our bank account than for asking Google Home to dim the lights.

So, each biometric system provides a similarity score after comparing two voices. Moreover, the similarity score only makes sense in the context of a comparison: it’s always relative. It’s like giving a person’s age (a fixed, measurable datum) and then asking whether they are old (a close and comparative datum): the answer will be more complex and vary according to the point of view.

The same applies to voice biometry systems, which process and analyze many voice samples. They can determine how similar two samples are because they have several other recordings in the bank made under the same conditions to serve as reference points. If you only have two recordings taken on their own and no reference population to compare them with, asserting that you can identify someone beyond any doubt is hazardous. The legal weight of voice biometric evidence has yet to be established

And before that can happen, it will be necessary to determine what score is sufficient for voice identification to be used as evidence in a trial, and what biometric system will be used and accepted.

In short, Batman and Robin can rest easy… it will be some time before the Gotham police can track down the Dynamic Duo using only their legendary pronouncements!

Source: GIPHY

Additional sources of information

Visit the CRIM website to find out (almost) everything about the voice biometric techniques developed by our experts!

Keywords

audio, automated, biometrics, processing, security, speech, voice

Share on your social medias

Filtrer par période:

TousArtificial IntelligenceData ScienceSOFTWARE ARCHITECTURE

Marc Lalonde

Artificial Intelligence

22.11.2022

Identifying Blue Whales Using a Computer Vision Approach

CRIM’s computer vision team is called upon to solve all kinds of image or video analysis problems related to fields as varied as industrial inspection…

From PyTorch to Libtorch: tips and tricks

Marc Lalonde

Artificial Intelligence

14.04.2022

From PyTorch to Libtorch: tips and tricks

Deep learning practitioners hone their skills using PyTorch and Python as their tools of choice. For that reason, on-line courses, blog posts, tutoria…

Contributing to LibTorch: recent architectures and “vanilla” training pipeline

Marc Lalonde

Artificial Intelligence

25.02.2022

Contributing to LibTorch: recent architectures and “vanilla” training pipeline

In August 2021, a PR aimed at adding a SOTA architecture (namely EfficientNet) to TorchVision, a Python-based PyTorch package for computer vision expe…

Deep learning applied to graphs: Extraction and processing of graph information by convolutional neural networks

Luciana

Artificial Intelligence

12.10.2018

Deep learning applied to graphs: Extraction and processing of graph information by convolutional neural networks

Graphs — frequently used in the fields of transport, telecommunication, biology, sociology and others — allow, in the simplest cases, an exploration o…

What I read this week: Spatiotemporal data

Cédric Noiseaux

Data Science

11.09.2018

What I read this week: Spatiotemporal data

This week, I took an interest in spatiotemporal data. There are several reasons why I have been looking into this subject. First, the City of Montreal…

Manipulating categorical variables in a dataset

Luciana

Data Science

12.07.2018

Manipulating categorical variables in a dataset

Broadly speaking, a dataset (excluding textual data and images) has two types of variables: quantitative and qualitative. As early as antiquity…

WebRTC and building a web-based videoconferencing application

Hiba Behidj

SOFTWARE ARCHITECTURE

02.05.2018

WebRTC and building a web-based videoconferencing application

This article results from a technology watch undertaken prior to building a videoconferencing web application based on modern architecture and using t…

Marc Lalonde

Artificial Intelligence

22.11.2022

Identifying Blue Whales Using a Computer Vision Approach

CRIM’s computer vision team is called upon to solve all kinds of image or video analysis problems related to fields as varied as industrial inspection…

Marc Lalonde

Artificial Intelligence

14.04.2022

From PyTorch to Libtorch: tips and tricks

Deep learning practitioners hone their skills using PyTorch and Python as their tools of choice. For that reason, on-line courses, blog posts, tutoria…

Marc Lalonde

Artificial Intelligence

25.02.2022

Contributing to LibTorch: recent architectures and “vanilla” training pipeline

In August 2021, a PR aimed at adding a SOTA architecture (namely EfficientNet) to TorchVision, a Python-based PyTorch package for computer vision expe…

Luciana

Artificial Intelligence

12.10.2018

Deep learning applied to graphs: Extraction and processing of graph information by convolutional neural networks

Graphs — frequently used in the fields of transport, telecommunication, biology, sociology and others — allow, in the simplest cases, an exploration o…

Cédric Noiseaux

Data Science

11.09.2018

What I read this week: Spatiotemporal data

This week, I took an interest in spatiotemporal data. There are several reasons why I have been looking into this subject. First, the City of Montreal…

Luciana

Data Science

12.07.2018

Manipulating categorical variables in a dataset

Broadly speaking, a dataset (excluding textual data and images) has two types of variables: quantitative and qualitative. As early as antiquity…

Hiba Behidj

SOFTWARE ARCHITECTURE

02.05.2018

WebRTC and building a web-based videoconferencing application

This article results from a technology watch undertaken prior to building a videoconferencing web application based on modern architecture and using t…

Blog

Should we be afraid… of voice biometrics?

Voice biometrics…aren’t actually biometrics.

Is it truly safe? Can someone hack my voice?

Bonus CSI question:

Additional sources of information

Keywords

audio, automated, biometrics, processing, security, speech, voice

Share on your social medias

Filtrer par période:

Identifying Blue Whales Using a Computer Vision Approach

From PyTorch to Libtorch: tips and tricks

Contributing to LibTorch: recent architectures and “vanilla” training pipeline

Deep learning applied to graphs: Extraction and processing of graph information by convolutional neural networks

What I read this week: Spatiotemporal data

Manipulating categorical variables in a dataset

WebRTC and building a web-based videoconferencing application

Related posts

Identifying Blue Whales Using a Computer Vision Approach

From PyTorch to Libtorch: tips and tricks

Contributing to LibTorch: recent architectures and “vanilla” training pipeline

Deep learning applied to graphs: Extraction and processing of graph information by convolutional neural networks

What I read this week: Spatiotemporal data

Manipulating categorical variables in a dataset

WebRTC and building a web-based videoconferencing application

Abonnez-vous à notre infolettre

Subscribe to our newsletter