2020
DOI: 10.48550/arxiv.2011.12167
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tight Integrated End-to-End Training for Cascaded Speech Translation

Abstract: A cascaded speech translation model relies on discrete and non-differentiable transcription, which provides a supervision signal from the source side and helps the transformation between source speech and target text. Such modeling suffers from error propagation between ASR and MT models. Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system. To use an intermediate representation and preserve the end-to-end trainability, previ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 27 publications
(44 reference statements)
0
1
0
Order By: Relevance
“…Unlike consecutive translation, where the translation is done after the speaker pauses, in SI the translation process starts while the speaker is still talking. With recent developments in machine translation and speech processing, various studies have been conducted aiming at automatic speech translation Inaguma et al, 2021;Bahar et al, 2021), including SI (Oda et al, 2014;Zheng et al, 2019;Arivazhagan et al, 2019;Zhang et al, 2020;Nguyen et al, 2021), based on speech corpora.…”
Section: Introductionmentioning
confidence: 99%
“…Unlike consecutive translation, where the translation is done after the speaker pauses, in SI the translation process starts while the speaker is still talking. With recent developments in machine translation and speech processing, various studies have been conducted aiming at automatic speech translation Inaguma et al, 2021;Bahar et al, 2021), including SI (Oda et al, 2014;Zheng et al, 2019;Arivazhagan et al, 2019;Zhang et al, 2020;Nguyen et al, 2021), based on speech corpora.…”
Section: Introductionmentioning
confidence: 99%