Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.350
|View full text |Cite
|
Sign up to set email alerts
|

Breeding Gender-aware Direct Speech Translation Systems

Abstract: In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vocal characteristics) that is otherwise lost in the cascade framework. Although such ability proved to be useful for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
2
1

Relationship

3
2

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 44 publications
0
8
0
Order By: Relevance
“…As a baseline, we report both the ST system developed by (Bentivogli et al 2020) -where target text is represented at character level -and the BPE-based system by (Gaido et al 2021), as they demonstrated that target-text segmentation is an important factor for systems' ability to translate gender and our systems segment target text with BPE, as this text segmentation method leads to the best translation quality. We measure the ability in translating gender with gender accuracy (Gaido et al 2020c), i.e. the percentage of correct gender realizations among the words produced by the system and annotated in MuST-SHE.…”
Section: Kd and Gender Translationmentioning
confidence: 99%
“…As a baseline, we report both the ST system developed by (Bentivogli et al 2020) -where target text is represented at character level -and the BPE-based system by (Gaido et al 2021), as they demonstrated that target-text segmentation is an important factor for systems' ability to translate gender and our systems segment target text with BPE, as this text segmentation method leads to the best translation quality. We measure the ability in translating gender with gender accuracy (Gaido et al 2020c), i.e. the percentage of correct gender realizations among the words produced by the system and annotated in MuST-SHE.…”
Section: Kd and Gender Translationmentioning
confidence: 99%
“…For the translation of speaker-related gender phenomena, prove that direct ST systems exploit speaker's vocal characteristics as a gender cue to improve feminine translation. However, as addressed by Gaido et al (2020), relying on physical gender cues (e.g., pitch) for such task implies reductionist gender classifications (Zimman, 2020) making systems potentially harmful for a diverse range of users. Similarly, although image-guided translation has been claimed useful for gender translation since it relies on visual inputs for disambiguation (Frank et al, 2018;Ive et al, 2019), it could bend toward stereotypical assumptions about appearance.…”
Section: Conclusion and Key Challengesmentioning
confidence: 99%
“…Accordingly, straightforward procedures (e.g., balancing the number of speakers in existing datasets) do not ensure a fairer representation of gender in MT outputs. Since datasets are a crucial source of bias, it is also crucial to advocate for a careful data curation (Mehrabi et al, 2019;Paullada et al, 2020;Hanna et al, 2021;Bender et al, 2021), guided by pragmatically and socially informed analyses (Hitti et al, 2019;Sap et al, 2020;Devinney et al, 2020) and annotation practices (Gaido et al, 2020).…”
Section: Technical Biasmentioning
confidence: 99%
“…As per standard procedure, the encoder of our ST systems is initialized with the weights of an automatic speech recognition (ASR) model (Bahar et al, 2019a) trained on MuST-C audio-transcript pairs. In our ST training, we use the MuST-C gender-balanced validation set (Gaido et al, 2020b) 3 to avoid rewarding systems' biased predictions. Each mini-batch consists of 8 samples, we set the update frequency to 8, train on 4 GPUs, so that a batch contains 256 samples.…”
Section: Speech Translation Modelsmentioning
confidence: 99%