ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414159
|View full text |Cite
|
Sign up to set email alerts
|

Jointly Trained Transformers Models for Spoken Language Translation

Abstract: End-to-End and cascade (ASR-MT) spoken language translation (SLT) systems are reaching comparable performances, however, a large degradation is observed when translating the ASR hypothesis in comparison to using oracle input text. In this work, degradation in performance is reduced by creating an End-to-End differentiable pipeline between the ASR and MT systems. In this work, we train SLT systems with ASR objective as an auxiliary loss and both the networks are connected through the neural hidden representatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 12 publications
(9 reference statements)
0
12
0
Order By: Relevance
“…This is because CSLR acts as an auxiliary task [43] and the intermediate CTC loss for optimizing CSLR, acts as an auxiliary loss function. Including auxiliary task has consistently improved the efficiency of the main tasks ( SLT in our case) [44], [45], [46].…”
Section: Discussionmentioning
confidence: 63%
“…This is because CSLR acts as an auxiliary task [43] and the intermediate CTC loss for optimizing CSLR, acts as an auxiliary loss function. Including auxiliary task has consistently improved the efficiency of the main tasks ( SLT in our case) [44], [45], [46].…”
Section: Discussionmentioning
confidence: 63%
“…End-to-end ST To overcome the error propagation and high latency in the cascaded ST systems, Bérard et al (2016); Duong et al (2016) proved the potential of end-to-end ST without intermediate transcription, which has attracted much attention in recent years (Vila et al, 2018;Salesky et al, 2018Salesky et al, , 2019Di Gangi et al, 2019b,c;Bahar et al, 2019a;Inaguma et al, 2020). Since it is difficult to train an end-to-end ST model directly, some training techniques like pretraining (Weiss et al, 2017;Berard et al, 2018;Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020b;Dong et al, 2021a;Alinejad and Sarkar, 2020;Zheng et al, 2021b;, multi-task learning (Le et al, 2020;Vydana et al, 2021;Tang et al, 2021b;Ye et al, 2021;Tang et al, 2021a), curriculum learning (Kano et al, 2017;Wang et al, 2020c), and meta-learning (Indurthi et al, 2020) have been applied. Recent work has introduced mixup on machine translation (Zhang et al, 2019b;Guo et al, 2022;Fang and Feng, 2022), sentence classification (Chen et al, 2020;Jindal et al, 2020;Sun et al, 2020), multilingual understanding , and speech recognition (Medennikov et al, 2018;Sun et al, 2021;Lam et al, 2021a;Meng et al, 2021), and obtained enhancements.…”
Section: Related Workmentioning
confidence: 99%
“…In 2017, Vaswani et al [8] developed the transformer model, which provides great performance in machine translation and has been widely used in many fields [9,10]. They proposed multi-head attention to improve the feature extraction ability of the network.…”
Section: Transformermentioning
confidence: 99%