Jointly Trained Transformers Models for Spoken Language Translation

Vydana, Hari Krishna; Karafiát, Martin; Žmolíková, Kateřina; Burget, Lukáš; Cernocky, Honza

doi:10.1109/icassp39728.2021.9414159

Cited by 17 publications

(12 citation statements)

References 12 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is because CSLR acts as an auxiliary task [43] and the intermediate CTC loss for optimizing CSLR, acts as an auxiliary loss function. Including auxiliary task has consistently improved the efficiency of the main tasks ( SLT in our case) [44], [45], [46].…”

Section: Discussionmentioning

confidence: 63%

Incorporating Relative Position Information in Transformer-Based Sign Language Recognition and Translation

2021

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 63%

Incorporating Relative Position Information in Transformer-Based Sign Language Recognition and Translation

2021

View full text Add to dashboard Cite

“…End-to-end ST To overcome the error propagation and high latency in the cascaded ST systems, Bérard et al (2016); Duong et al (2016) proved the potential of end-to-end ST without intermediate transcription, which has attracted much attention in recent years (Vila et al, 2018;Salesky et al, 2018Salesky et al, , 2019Di Gangi et al, 2019b,c;Bahar et al, 2019a;Inaguma et al, 2020). Since it is difficult to train an end-to-end ST model directly, some training techniques like pretraining (Weiss et al, 2017;Berard et al, 2018;Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020b;Dong et al, 2021a;Alinejad and Sarkar, 2020;Zheng et al, 2021b;, multi-task learning (Le et al, 2020;Vydana et al, 2021;Tang et al, 2021b;Ye et al, 2021;Tang et al, 2021a), curriculum learning (Kano et al, 2017;Wang et al, 2020c), and meta-learning (Indurthi et al, 2020) have been applied. Recent work has introduced mixup on machine translation (Zhang et al, 2019b;Guo et al, 2022;Fang and Feng, 2022), sentence classification (Chen et al, 2020;Jindal et al, 2020;Sun et al, 2020), multilingual understanding , and speech recognition (Medennikov et al, 2018;Sun et al, 2021;Lam et al, 2021a;Meng et al, 2021), and obtained enhancements.…”

Section: Related Workmentioning

confidence: 99%

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Qingkai¹,

Ye²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

How to learn a better speech representation for end-to-end speech-to-text translation (ST) with limited labeled data? Existing techniques often attempt to transfer powerful machine translation (MT) capabilities to ST, but neglect the representation discrepancy across modalities. In this paper, we propose the Speech-TExt Manifold Mixup (STEMM) method to calibrate such discrepancy. Specifically, we mix up the representation sequences of different modalities, and take both unimodal speech sequences and multimodal mixed sequences as input to the translation model in parallel, and regularize their output predictions with a selflearning framework. Experiments on MuST-C speech translation benchmark and further analysis show that our method effectively alleviates the cross-modal representation discrepancy, and achieves significant improvements over a strong baseline on eight translation directions.

show abstract

“…In 2017, Vaswani et al [8] developed the transformer model, which provides great performance in machine translation and has been widely used in many fields [9,10]. They proposed multi-head attention to improve the feature extraction ability of the network.…”

Section: Transformermentioning

confidence: 99%

Passtrans: An Improved Password Reuse Model Based on Transformer

Cheng

Xie

et al. 2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Passwords have been widely used in online authentication, and they form the front line that protects our data security and privacy. But the security of password may be easily harmed by insecure password generator. Massive reports state that users are always keen to generate new passwords by reusing or fine-tuning old secrets. Once an old password is leaked, the users may suffer from credential tweaking attacks. We propose a password reuse model PassTrans and simulate credential tweaking attacks. We evaluate the performance in leaked password datasets, and the results show that 67.51% of accounts is breakable under 1,000 guesses, indicating our model is accurate in capturing password reuse behavior.

show abstract

Jointly Trained Transformers Models for Spoken Language Translation

Cited by 17 publications

References 12 publications

Incorporating Relative Position Information in Transformer-Based Sign Language Recognition and Translation

Incorporating Relative Position Information in Transformer-Based Sign Language Recognition and Translation

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

Passtrans: An Improved Password Reuse Model Based on Transformer

Contact Info

Product

Resources

About