Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing 2021
DOI: 10.18653/v1/2021.gebnlp-1.10
|View full text |Cite
|
Sign up to set email alerts
|

Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech

Abstract: In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 18 publications
(16 reference statements)
0
9
0
Order By: Relevance
“…We reviewed the techniques on identifying and resolving representation bias mostly in tabular data data sets. The existing research has briefly investigated these issues in other data types such as multimedia [13,26,71], text [33,49], graphs, streams [25], spatio-temporal [28], etc. Still, identification and resolving biases in visual data sets has drawn more attention from different research communities and in this section we present a review of the existing works.…”
Section: Expanding the Scope To Other Data Typesmentioning
confidence: 99%
“…We reviewed the techniques on identifying and resolving representation bias mostly in tabular data data sets. The existing research has briefly investigated these issues in other data types such as multimedia [13,26,71], text [33,49], graphs, streams [25], spatio-temporal [28], etc. Still, identification and resolving biases in visual data sets has drawn more attention from different research communities and in this section we present a review of the existing works.…”
Section: Expanding the Scope To Other Data Typesmentioning
confidence: 99%
“…In recent years, a new research area has emerged that investigates the discriminatory performance of AI systems and its causes (Hovy and Spruit, 2016;Garnerin et al, 2021). In the ASR field, traditional metrics like the aggregated WER and CER are used to measure the overall performance of the models.…”
Section: Analysis Of Asr Accuracy Wrt Speaker Metadatamentioning
confidence: 99%
“…Data balancing [7] [23], [24] [13], [25], [26] AT [27], [28], [29] [14], [30], [31], [32], [33], [34] MTL [8] [35], [36], [37], [38]…”
Section: Asv Asr Other ML Domainsmentioning
confidence: 99%
“…Feng et al [71] have analyzed the biases in a Dutch ASR system with respect to gender, age etc. Evaluations of ASR systems using criterion commonly used in Fair-ML research have been explored extensively [24,[72][73][74]. However, a systematic evaluation of fairness in ASV systems is scarce in current literature.…”
Section: Fairness In Asvmentioning
confidence: 99%