Interspeech 2010 2010
DOI: 10.21437/interspeech.2010-152
|View full text |Cite
|
Sign up to set email alerts
|

Approaching human listener accuracy with modern speaker verification

Abstract: Being able to recognize people from their voice is a natural ability that we take for granted. Recent advances have shown significant improvement in automatic speaker recognition performance. Besides being able to process large amount of data in a fraction of time required by human, automatic systems are now able to deal with diverse channel effects. The goal of this paper is to examine how state-of-the-art automatic system performs in comparison with human listeners, and to investigate the strategy for human-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(4 citation statements)
references
References 8 publications
0
4
0
Order By: Relevance
“…Although machines outperform humans on long utterances in certain conditions (e.g., Hautam€ aki et al, 2010;Kahn et al, 2011), their performance on short utterances is seemingly worse than that of humans. For example, a state-ofthe-art text-independent ASV system using MFCCs was 97.60% accurate at discriminating speakers with 2.5-minlong pairs, but it was only 89.48% accurate with 5-s-long pairs, and performance worsened to 77.69% accuracy with 2-s-long pairs on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2003 database , compared to 82.36% for humans hearing short utterances, as noted earlier.…”
Section: B Machine Speaker Verification and How It Compares To Human ...mentioning
confidence: 98%
“…Although machines outperform humans on long utterances in certain conditions (e.g., Hautam€ aki et al, 2010;Kahn et al, 2011), their performance on short utterances is seemingly worse than that of humans. For example, a state-ofthe-art text-independent ASV system using MFCCs was 97.60% accurate at discriminating speakers with 2.5-minlong pairs, but it was only 89.48% accurate with 5-s-long pairs, and performance worsened to 77.69% accuracy with 2-s-long pairs on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2003 database , compared to 82.36% for humans hearing short utterances, as noted earlier.…”
Section: B Machine Speaker Verification and How It Compares To Human ...mentioning
confidence: 98%
“…For example, to compute the FAR and FRR for the female population, we aggregated the verification trials where both enrolment and test utterance belonging to the female gender. Following Pereira and Mercel [40], we do not consider cross-gender trials (where enrolment and test utterances belong to different genders), because they tend to produce substantially lower FARs than same-gender trials [105]. To compute the demographic-agnostic FAR values useful for evaluating the auFaDR-FAR metric described in Section 4, we pooled all the verification trials agnostic to their demographic attributes.…”
Section: Evaluation Setupmentioning
confidence: 99%
“…As a way of modelling the perceptual experiences of listeners with voices, ASV systems typically train models on large datasets with many different speakers. Only a handful of studies have compared human and machine performances, which have focused on the effects of speech types, such as the length of utterances [17,22,48]. However, to the authors' knowledge no findings have been reported with regards to the effectiveness of automatic scores to model listener responses across perceptual tasks.…”
Section: Introductionmentioning
confidence: 99%