CRIM among the world leaders in voice biometrics!


To learn more about the speaker recognition techniques developed at CRIM, click here! - In French

Summary: The Speech and Text team has been developing various speech and speaker recognition technologies for over 30 years. Its expertise soon placed it among the world leaders in the field. Since 1992, the team has been taking part in international competitions in order to test its methods and compare them with those of other researchers in the field.

This type of event, during which several research teams try to solve the same problem, makes it possible to evaluate the performance of CRIM's technology in relation to similar tools developed by other researchers, but also to reveal the most promising avenues for development in this rapidly evolving field.

In the scientific community, the impact CRIM’s participation in these competitions has led to several changes in the testing paradigms, as well as to the adoption by many of the research sites of certain methods used by CRIM such as i-vectors, joint factor analysis and probabilistic linear discriminant analysis.

NIST SRE - Speaker Recognition Evaluation Campaign


The Speaker Recognition Evaluation of the National Institute of Standards and Technology (NIST) in the United States is one of the most renowned technology assessment processes in the field. These evaluations are intended for all researchers working on the general problem of automatic speaker recognition. More than forty teams take part in each edition of the evaluation.

CRIM's Speech and Text team participated in NIST evaluations in 2005, 2006, 2008, 2010, 2012 and 2016. In their first year of participation, CRIM ranked among the best on the planet. At each edition since 2006, CRIM has ranked well under several NIST SRE evaluation conditions.

How does it work?

Once the challenge is announced, teams have 3-4 months to train their systems to perform the task requested using the data provided by the NIST with machine learning techniques.

When the moment of the evaluation comes, teams receive evaluation data to test their system. This evaluation data is comprised of enrollment recordings for training the speaker models and unknown test recordings to perform speaker recognition.They send the scores they obtain to the NIST. Since all teams used the same data to train their model, it becomes possible to compare system performances and classify them.

Once results are out, a collaborative workshop is organized so that researchers can meet, observe their respective results and share ideas.

Who takes part in these evaluations?

Several dozen research groups, each comprising many partners (research centres, universities and private companies) and based in four continents. There are also representatives of the research departments of some major tech companies such as IBM Research or Alibaba, or companies specialized in the field of voice technologies such as Nuance Communications.

What purpose do the competitions serve?

For researchers, they constitute an opportunity to test new ideas and share their knowhow with other specialists in the field. Although its system is one of the most precise in the world, CRIM modifies its approach or adds components to its system at each competition, in order to test new techniques and compare them with those of other researchers: this sharing of knowledge is what propels the greatest innovations.

Last NIST competition - SRE 2016

CRIM took part in the last SRE NIST evaluation as a member of the ABC consortium. The group was also composed of Agnitio Voice ID  (Spain)  and Brno University of Technology Speech@FIT and IT4I Center of Excellence (Czech Republic).

Nature of the SRE 2016 challenge
The task for the SRE 2016 competition presented two challenges for researchers. The first was the duration of the recordings: in most competitions, the sound data provided is generally of similar length. In 2016, the NIST asked teams to use and identify voice recordings of variable duration, which is more demanding for the system, particularly in the case of short excerpts which are difficult to analyze properly.

The second challenge was the nature of the training data (or background data) compared to the evaluation data. The two datasets did not belong to the same domain (domain mismatch). The training data contained voices speaking only in English. However, during the evaluation, the systems also had to identify voices speaking other languages such as Mandarin, Tagalog, and others. Successfully adapting a system for out-of-domain data analysis was a big challenge for many participants!

CRIM’s performance in the NIST SRE
Since its first participation in NIST evaluations, CRIM has always been ranked among the most successful speaker recognition system. Our experts have been recognized as world leaders in the field for decades.
In addition, during the NIST SRE 2016, CRIM proposed a completely different approach from that of other participants, focusing on deep learning embeddings, unsupervised domain adaptation and a Beta-Bernoulli classifier.

From one edition to the next, are challenges similar?

Not necessarily (apart from the "speaker recognition" aspect, of course!) The evolution of the challenges proposed in competitions often echoes the needs of companies that use these technologies in their products or activities.

Some evaluation campaigns are sponsored by large tech companies like Google and IBM: in that case, competitions are a way to encourage and accelerate innovation, as well as to create tools that will meet current and future market demands. 

For example, at the beginning of NIST SRE competitions, speaker recognition was mostly done using telephone recordings. Today, most competitions use recordings captured by microphones, often several at a time, because this type of situation resembles the reality of smartphones or connected devices (such as Amazon’s Alexa or Google Home), which often comprise several microphones that combine their data to analyse the speaker’s voice.

To meet the needs of this rapidly changing market, competitions such as the NIST SRE have begun to present more demanding challenges that resemble real-life situations rather than controlled lab conditions: recognizing a voice despite the reverberation of a room, the background noise, the poor quality of the microphone, the interruption by a second voice, etc.

Furthermore, anti-spoofing tests are also becoming more frequent as companies providing voice authentication systems to their customers want to ensure that their system is fraud-proof.

And it’s only the beginning! Follow CRIM’s social media and website updates so you don't miss any of the upcoming evaluations in which our experts will participate!

Scientific publications

CRIM has participated in NIST Speaker Recognition Evaluation in 2005, 2006, 2008, 2010, 2012, 2016. Below are the recorded publications since 2008.

NIST SRE 2016
[1] A. Silnova et al., “Analysis and Description of ABC Submission to NIST SRE 2016,” in Interspeech, 2017, pp. 1348–1352.

[2] J. Alam, P. Kenny, G. Bhattacharya, and M. Kockmann, “Speaker Verification Under Adverse Conditions Using I-vector Adaptation and Neural Networks,” in Interspeech, 2017, pp. 3732–3736.

[3] T. Stafylakis, P. Kenny, V. Gupta, J. Alam, and M. Kockmann, “Compensation for phonetic nuisance variability in speaker recognition using DNNs,” in Odyssey The Speaker and Language Recognition Workshop, 2016, pp. 340–345.

NIST SRE 2012
[1] P. Kenny, V. N. Gupta, T. Stafylakis, M. J. Alam, and P. Ouellet, “Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition,” IEEE Speaker and Language Recognition Workshop. IEEE, pp. 1–18, 19-Jun-2014.

[2] P. Kenny, “A small footprint i-vector extractor.” in IEEE Speaker and Language Recognition Workshop, 2012, pp. 1–6.

[3] M. Senoussaoui, N. Dehak, P. Kenny, R. Dehak, and P. Dumouchel, “First attempt at boltzmann machines for speaker verification.,” IEEE Speaker and Language Recognition Workshop. IEEE, pp. 117–121, 01-Jan-2012

[4] T. Stafylakis, P. Kenny, M. Senoussaoui, and P. Dumouchel, “Preliminary investigation of Boltzmann machine classifiers for speaker recognition.,” in IEEE Speaker and Language Recognition Workshop, 2012, pp. 109–116.

[5] T. Stafylakis, V. Katsouros, P. Kenny, and P. Dumouchel, “Mean shift algorithm for exponential families with applications to speaker clustering.” Odyssey, pp. 324–329, Jan. 2012.

NIST SRE 2010

[1] P. Kenny, “Bayesian Speaker Verification with Heavy-Tailed Priors.” in IEEE Speaker and Language Recognition Workshop, 2010, pp. 1–41.

[2] M. Senoussaoui, P. Kenny, N. Dehak, and P. Dumouchel, “An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech.,” Odyssey Speak. Lang. Recognit. Work. p. 6, Jan. 2010.

[3] N. Dehak, R. Dehak, J. R. Glass, D. A. Reynolds, and P. Kenny, “Cosine Similarity Scoring without Score Normalization Techniques.” Odyssey Speak. Lang. Recognit. Work., p. 15, Jan. 2010.

NIST SRE 2008

[1] N. Dehak, R. Dehak, P. Kenny, and P. Dumouchel, “Comparison between factor analysis and GMM support vector machines for speaker verification.” Odyssey, p. 9, Jan. 2008.

[2] P. Kenny, N. Dehak, P. Ouellet, V. N. Gupta, and P. Dumouchel, “Development of the primary CRIM system for the NIST 2008 speaker recognition evaluation.,” Proc Interspeech, pp. 1401–1404, Jan. 2008.

Teams

Recent news

  • ClimateData.ca - An exceptional tool for Canadian leaders!
    15/08/2019

    Launch of the ClimateData.ca portal in the presence of the Honourable Catherine McKenna, Minister of Environment and Climate Change.

    +

Upcoming event

  • Batimatech 2019
    17 September 2019 9:00
    Les Studios des 7 Doigts à Montréal
    Batimatech 2019, le 17 septembre 2019, sous le thème: l'avenir de la construction aujourd'hui.
    +
  • Le CRIM recrute! Conseiller en recrutement en TI
  • Chambly Express.ca RT @chamblyexpress: Un outil 2.0. pour lutter contre les changements climatiques #changements #climatiques #site #web #2019 #outil #techno…

Recent Publications

  • Forage de données géospatiales, quelques applications

    +
  • Des technologies perturbatrices pour de futures applications du bâtiment intelligent utilisant AI

    +