Does Simultaneous Speech Translation need Simultaneous Models?

Papi, Sara; Gaido, Marco; Negri, Matteo; Turchi, Marco

doi:10.48550/arxiv.2204.03783

Cited by 1 publication

(3 citation statements)

References 28 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In light of the recent work that questions the necessity of a dedicated training procedure for simultaneous model (Papi et al, 2022), we participate in the Simultaneous task with the same model used for the Offline task. Their finding is perfectly aligned with the spirit of this submission toward the reduction of training computational costs.…”

Section: Simultaneousmentioning

confidence: 99%

“…We determine when to start generating the output translation adopting the wait-k strategy (Ma et al, 2019) that simply prescribes to wait for k words before starting to generate the translation, where k is a hyper-parameter controlled by the user that can be increased or decreased to directly control the latency of the system. The number of words in a given input speech is determined with an adaptive word detection strategy (Ren et al, 2020), because of its superiority over the fixed strategy in strong models trained in high-resource data conditions (Papi et al, 2022). Our adaptive word detection mechanism exploits the predicted output of CTC module in the encoder (Ren et al, 2020;Zeng et al, 2021) to count the number of words in the source speech.…”

Section: Simultaneousmentioning

confidence: 99%

“…In the same vein of reducing the overall training computational costs, we participated also in the simultaneous task using our best offline model and without performing any additional training do adapt it to the simultaneous scenario (Papi et al, 2022). The simultaneous version of our offlinetrained model is realized by applying the wait-k strategy (Ma et al, 2019) with adaptive word detection from the audio input (Ren et al, 2020) that determines the number of words in a speech segment using the greedy prediction of the CTC.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient yet Competitive Speech Translation: FBK@IWSLT2022

Gaido¹,

Papi²,

Fucci³

et al. 2022

Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

Self Cite

View full text Add to dashboard Cite

The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ratio between source and target characters yields a quality improvement of 1 BLEU. Third, we compare different methods to reduce the detrimental effect of the audio segmentation mismatch between training data manually segmented at sentence level and inference data that is automatically segmented. Towards the same goal of training cost reduction, we participate in the simultaneous task with the same model trained for offline ST. The effectiveness of our lightweight training strategy is shown by the high score obtained on the MuST-C ende corpus (26.7 BLEU) and is confirmed in high-resource data conditions by a 1.6 BLEU improvement on the IWSLT2020 test set over last year's winning system.

show abstract

Section: Simultaneousmentioning

confidence: 99%