Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021) 2021
DOI: 10.18653/v1/2021.iwslt-1.8
|View full text |Cite
|
Sign up to set email alerts
|

Dealing with training and test segmentation mismatch: FBK@IWSLT2021

Abstract: This paper describes FBK's system submission to the IWSLT 2021 Offline Speech Translation task. We participated with a direct model, which is a Transformer-based architecture trained to translate English speech audio data into German texts. The training pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure. Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…knowledge from the easier MT task, 2 in which models obtain better performance, and hence improve the quality of the resulting ST student model. (Gaido et al 2020a;Papi et al 2021), instead, leverage KD from an MT model trained on a large amount of data to distill into the ST student model information that such a model could not directly access because of the different input modality. All these works employ the Word-KD method.…”
Section: Knowledge Distillation In Stmentioning
confidence: 99%
“…knowledge from the easier MT task, 2 in which models obtain better performance, and hence improve the quality of the resulting ST student model. (Gaido et al 2020a;Papi et al 2021), instead, leverage KD from an MT model trained on a large amount of data to distill into the ST student model information that such a model could not directly access because of the different input modality. All these works employ the Word-KD method.…”
Section: Knowledge Distillation In Stmentioning
confidence: 99%
“…Context-aware ST models have been shown to be robust towards error-prone automatic segmentations of the test set at inference time (Zhang et al, 2021a). Our method bears similarities with Gaido et al (2020b); Papi et al (2021) in that it re-segments the train set to create synthetic data. However, unlike their approach, where they split at random words in the transcript, we use a specialized Audio Segmentation method (Tsiamas et al, 2022b) to directly split the audio into segments resembling proper sentences.…”
Section: Relevant Researchmentioning
confidence: 99%
“…As such, our training set comprised the synthetic data built using SeqKD and the native ST data, both filtered with the method described in Section 2.2. The two types of data were distinguished by means of a tag pre-pended to the target text (Gaido et al, 2020b;Papi et al, 2021a).…”
Section: Datamentioning
confidence: 99%
“…We add the CTC loss in the 8th encoder layer sincePapi et al, 2021a) has demonstrated that it compares favourably with adding the CTC on top of the encoder outputs or in other layers(Bahar et al, 2019).…”
mentioning
confidence: 99%
See 1 more Smart Citation