2020
DOI: 10.1162/tacl_a_00340
|View full text |Cite
|
Sign up to set email alerts
|

Consistent Transcription and Translation of Speech

Abstract: The conventional paradigm in speech translation starts with a speech recognition step to generate transcripts, followed by a translation step with the automatic transcripts as input. To address various shortcomings of this paradigm, recent work explores end-to-end trainable direct models that translate without transcribing. However, transcripts can be an indispensable output in practical applications, which often display transcripts alongside the translations to users. We make this common requirement explicit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 23 publications
(31 reference statements)
0
2
0
Order By: Relevance
“…The convergence of model structures for ASR and ST inspires works that use a single model to perform both ASR and ST [8,9,10,11,12] . Liu et al proposed an interactive decoding strategy between ASR and ST [13].…”
Section: Introductionmentioning
confidence: 99%
“…The convergence of model structures for ASR and ST inspires works that use a single model to perform both ASR and ST [8,9,10,11,12] . Liu et al proposed an interactive decoding strategy between ASR and ST [13].…”
Section: Introductionmentioning
confidence: 99%
“…Assuming a realistic setup and not ignoring other available speech-to-source and source-to-target corpora, where cascade and direct models are trained on non-equal amounts of data, the performance of direct models is often behind cascaded systems. The end-to-end methods either conduct translation without transcribing or suffer from inconsistency between transcriptions and translations [25]. Transcriptions are essential in many applications and required to be displayed together with translations to users.…”
Section: Introduction and Related Workmentioning
confidence: 99%