Cascaded Models with Cyclic Feedback for Direct Speech Translation

Lam, Tsz Kin; Schamoni, Shigehiko; Riezler, Stefan

doi:10.1109/icassp39728.2021.9413719

Cited by 4 publications

(2 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inspired by random online backtranslation , we created our version, explained in Algorithm 1, to help our model better utilize the training dataset, and the 892 monolingual Bambara sentences from Wikipedia. Our approach, dubbed Cyclic backtranslation (Lam et al, 2021), would theoretically enable the model to leverage the available training and monolingual dataset by compelling the MT model for each direction, at each step k, to learn from a concatenation of the original training dataset, its synthetically generated sentences, and those generated by the MT model of the opposite direction in the previous step. Despite its potential benefits, implementing backtranslation presented several challenges.…”

Section: Team Alphamentioning

confidence: 99%

Findings from the Bambara - French Machine Translation Competition (BFMT 2023)

Agostinho Da Silva,

Ajayi,

Antonov

et al. 2023

Proceedings of the the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)

View full text Add to dashboard Cite

Section: Team Alphamentioning

confidence: 99%

Findings from the Bambara - French Machine Translation Competition (BFMT 2023)

Agostinho Da Silva,

Ajayi,

Antonov

et al. 2023

Proceedings of the the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)

View full text Add to dashboard Cite

“…Speech translation can be broadly categorized into cascade system and end-to-end speech translation (E2E ST). Cascade system (Sperber et al, 2017;Lam et al, 2021) usually combines automatic speech recognition (ASR) and machine translation (MT). The MT subsystem uses ASR transcripts as input, which provide clear expression but may contain errors stemming from ASR.…”

Section: Introductionmentioning

confidence: 99%

Improving Speech Translation by Fusing Speech and Text

Yin,

Liu,

Zhao

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

In speech translation, leveraging multimodal data to improve model performance and address limitations of individual modalities has shown significant effectiveness. In this paper, we harness the complementary strengths of speech and text to improve speech translation. However, speech and text are disparate modalities, we observe three aspects of modality gap that impede their integration in a speech translation model. To tackle these gaps, we propose Fuse-Speech-Text (FuseST), a crossmodal model which supports three distinct input modalities for translation: speech, text and fused speech-text. We leverage multiple techniques for cross-modal alignment and conduct a comprehensive analysis to assess its impact on speech translation, machine translation and fused speech-text translation. We evaluate FuseST on MuST-C, GigaST and newstest benchmark. Experiments show that the proposed FuseST achieves an average 34.0 BLEU on MuST-C En→De/Es/Fr (vs SOTA +1.1 BLEU). Further experiments demonstrate that FuseST does not degrade on MT task, as observed in previous works. Instead, it yields an average improvement of 3.2 BLEU over the pre-trained MT model. Code is available at https://github.com/WenbiaoYin/FuseST.

show abstract