Effective combination of pretrained models - KIT@IWSLT2022

Pham, Ngoc-Quan; Nguyen, Tuan Nam; Nguyen, Thin; Liu, Danni; Mullov, Carlos; Niehues, Jan; Waibel, Alexander

doi:10.18653/v1/2022.iwslt-1.14

Cited by 4 publications

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The English ASR models are built based on pretrained WavLM (Chen et al, 2022) and BART (Lewis et al, 2019) 6 , while for Multilingual ASR we utilized the XLS-R models (Babu et al, 2021) for the encoder and the MBART-50 model (Liu et al, 2020b) for the decoder following (Pham et al, 2022). On the other hand, the translation models are based on the pretrained 6 With the recipe available at here.…”

Section: Transcription and Translation Modelsmentioning

confidence: 99%

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Huber,

Dinh,

Mullov

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

View full text Add to dashboard Cite

The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches.In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components.Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare stateof-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user.

show abstract

Section: Transcription and Translation Modelsmentioning

confidence: 99%

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Huber,

Dinh,

Mullov

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

View full text Add to dashboard Cite

show abstract

“…Another way of incorporating ASR and MT is to leverage large pretrained speech and text models as a foundation for end-to-end ST systems Gállego et al, 2021;Han et al, 2021;Zhang and Ao, 2022;Pham et al, 2022;Tsiamas et al, 2022b). However, these systems encounter representation discrepancy issues, which can hinder the full exploitation of pretrained foundation models.…”

Section: Introductionmentioning

confidence: 99%

Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

Tsiamas¹,

Gállego²,

Fonollosa³

et al. 2023

Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

View full text Add to dashboard Cite

This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model, thus maximizing transfer learning from MT. After this pretraining, we fine-tune our system end-to-end on ST, with Cross Entropy and Knowledge Distillation. Apart from the available ST corpora, we create synthetic data with SegAugment to better adapt our models to the custom segmentations of the IWSLT test sets. Our best single model obtains 31.2 BLEU points on MuST-C tst-COMMON, 29.8 points on IWLST.tst2020 and 33.4 points on the newly released IWSLT.ACLdev2023.

show abstract

Chapter 10. Automatic speech translation in the classroom and lecture setting

Lewis,

Niehues

2023

IVITRA Research in Linguistics and Literature

View full text Add to dashboard Cite

With dramatic improvements in quality for the technologies underlying Speech Translation, e.g., Speech Recognition and Machine Translation, the potential viability of Speech Translation in certain scenarios may finally be within reach. This is no truer than in educational settings where, with ever-growing immigration and an increasingly global workforce, multilingual classrooms and educational settings have become the norm around the world rather than the exception. The problem is that many educational institutions are faced with the daunting challenge of meeting the needs of upwards of 30–100 language communities simultaneously. Scenarios include providing translated content or instruction to linguistically diverse student populations, often in the same classroom, and parent-educator interactions, sometimes individually but also often in group settings. In the former scenario, speech translation technology can be a bridge between the student’s home language and the dominant language used in the classroom and an aid for learning the dominant language. In the latter scenario, parents, unlike their children, may never achieve proficiency in the dominant language(s), yet still need to be involved in their children’s education. The technology can provide them access where otherwise there may be none, or where the options may be severely limited. The large-scale multilingual requirements of these immigrant and diverse communities when interacting with educators generally defies reliable human-centric solutions and begs for technological ones.

show abstract

Effective combination of pretrained models - KIT@IWSLT2022

Cited by 4 publications

References 18 publications

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

Chapter 10. Automatic speech translation in the classroom and lecture setting

Contact Info

Product

Resources

About