2022
DOI: 10.48550/arxiv.2204.03783
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Does Simultaneous Speech Translation need Simultaneous Models?

Abstract: In simultaneous speech translation (SimulST), finding the best trade-off between high translation quality and low latency is a challenging task. To meet the latency constraints posed by the different application scenarios, multiple dedicated SimulST models are usually trained and maintained, generating high computational costs. In this paper, motivated by the increased social and environmental impact caused by these costs, we investigate whether a single model trained offline can serve not only the offline but… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 28 publications
(48 reference statements)
0
3
0
Order By: Relevance
“…In light of the recent work that questions the necessity of a dedicated training procedure for simultaneous model (Papi et al, 2022), we participate in the Simultaneous task with the same model used for the Offline task. Their finding is perfectly aligned with the spirit of this submission toward the reduction of training computational costs.…”
Section: Simultaneousmentioning
confidence: 99%
See 2 more Smart Citations
“…In light of the recent work that questions the necessity of a dedicated training procedure for simultaneous model (Papi et al, 2022), we participate in the Simultaneous task with the same model used for the Offline task. Their finding is perfectly aligned with the spirit of this submission toward the reduction of training computational costs.…”
Section: Simultaneousmentioning
confidence: 99%
“…We determine when to start generating the output translation adopting the wait-k strategy (Ma et al, 2019) that simply prescribes to wait for k words before starting to generate the translation, where k is a hyper-parameter controlled by the user that can be increased or decreased to directly control the latency of the system. The number of words in a given input speech is determined with an adaptive word detection strategy (Ren et al, 2020), because of its superiority over the fixed strategy in strong models trained in high-resource data conditions (Papi et al, 2022). Our adaptive word detection mechanism exploits the predicted output of CTC module in the encoder (Ren et al, 2020;Zeng et al, 2021) to count the number of words in the source speech.…”
Section: Simultaneousmentioning
confidence: 99%
See 1 more Smart Citation