Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021) 2021
DOI: 10.18653/v1/2021.iwslt-1.9
|View full text |Cite
|
Sign up to set email alerts
|

The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task

Abstract: This paper describes the submission of the Ni-uTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription. We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding. To augment the training data, the English transcriptions are translated to German translations. Finally, we employ ensemble decoding to integrat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…Two main word detection strategies are currently employed by the community: fixed (Ma et al, 2020b), and adaptive (Ma et al, 2020b;Ren et al, 2020;Zeng et al, 2021;Chen et al, 2021). The fixed word detection strategy represents the easiest way to address the problem since it assumes that a fixed amount of time is required to pronounce every word, disregarding the information contained in the audio.…”
Section: Word Detection For Wait-kmentioning
confidence: 99%
See 2 more Smart Citations
“…Two main word detection strategies are currently employed by the community: fixed (Ma et al, 2020b), and adaptive (Ma et al, 2020b;Ren et al, 2020;Zeng et al, 2021;Chen et al, 2021). The fixed word detection strategy represents the easiest way to address the problem since it assumes that a fixed amount of time is required to pronounce every word, disregarding the information contained in the audio.…”
Section: Word Detection For Wait-kmentioning
confidence: 99%
“…On the contrary, the adaptive word detection strategy determines the number of words by looking at the content of the audio. The decision about waiting or emitting can be taken through an Automatic Speech Recognition decoder (Chen et al, 2021) 2 or through a Connectionist Temporal Classification (Graves et al, 2006) -or CTC -module (Ren et al, 2020;Zeng et al, 2021), responsible for directly detecting the number of words every time a speech chunk is received by the system.…”
Section: Word Detection For Wait-kmentioning
confidence: 99%
See 1 more Smart Citation