Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) 2022
DOI: 10.18653/v1/2022.iwslt-1.14
|View full text |Cite
|
Sign up to set email alerts
|

Effective combination of pretrained models - KIT@IWSLT2022

Abstract: Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous End-to-end system by up to 7 BLEU points. More importantly, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 18 publications
0
0
0
Order By: Relevance
“…The English ASR models are built based on pretrained WavLM (Chen et al, 2022) and BART (Lewis et al, 2019) 6 , while for Multilingual ASR we utilized the XLS-R models (Babu et al, 2021) for the encoder and the MBART-50 model (Liu et al, 2020b) for the decoder following (Pham et al, 2022). On the other hand, the translation models are based on the pretrained 6 With the recipe available at here.…”
Section: Transcription and Translation Modelsmentioning
confidence: 99%
“…The English ASR models are built based on pretrained WavLM (Chen et al, 2022) and BART (Lewis et al, 2019) 6 , while for Multilingual ASR we utilized the XLS-R models (Babu et al, 2021) for the encoder and the MBART-50 model (Liu et al, 2020b) for the decoder following (Pham et al, 2022). On the other hand, the translation models are based on the pretrained 6 With the recipe available at here.…”
Section: Transcription and Translation Modelsmentioning
confidence: 99%
“…Another way of incorporating ASR and MT is to leverage large pretrained speech and text models as a foundation for end-to-end ST systems Gállego et al, 2021;Han et al, 2021;Zhang and Ao, 2022;Pham et al, 2022;Tsiamas et al, 2022b). However, these systems encounter representation discrepancy issues, which can hinder the full exploitation of pretrained foundation models.…”
Section: Introductionmentioning
confidence: 99%