Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021) 2021
DOI: 10.18653/v1/2021.iwslt-1.13
|View full text |Cite
|
Sign up to set email alerts
|

KIT’s IWSLT 2021 Offline Speech Translation System

Abstract: This paper describes KIT'submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different endto-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year's neural machine translation model was reused. In the end-toend condition, we improved ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…Zeng et al (2023) used shrink embedding gradient technique (Ding et al, 2021). In this study, we proved that the layer normalization to the embedding layer prevents the exploding gradients around layer normalizations in internal layers when we use the scaled initialization, which is a widely used initialization method for LLMs (Nguyen & Salazar, 2019;Shoeybi et al, 2020), and thus, it stabilizes the pre-training. In this study, we indicated that an initialization method affects the LLM pre-training dynamics.…”
Section: B Related Workmentioning
confidence: 69%
See 3 more Smart Citations
“…Zeng et al (2023) used shrink embedding gradient technique (Ding et al, 2021). In this study, we proved that the layer normalization to the embedding layer prevents the exploding gradients around layer normalizations in internal layers when we use the scaled initialization, which is a widely used initialization method for LLMs (Nguyen & Salazar, 2019;Shoeybi et al, 2020), and thus, it stabilizes the pre-training. In this study, we indicated that an initialization method affects the LLM pre-training dynamics.…”
Section: B Related Workmentioning
confidence: 69%
“…Stability To stabilize trainings of Transformer-based neural language models, there have been various discussions on the architecture (Xiong et al, 2020;Liu et al, 2020;Zeng et al, 2023;Zhai et al, 2023), initialization method (Nguyen & Salazar, 2019;Zhang et al, 2019b;Huang et al, 2020;Wang et al, 2022), training strategy (Zhang et al, 2022;Li et al, 2022), and loss function (Chowdhery et al, 2022;Wortsman et al, 2023). Xiong et al (2020) theoretically analyzed gradient scales of each part in Transformers, and indicated that the Pre-LN Transformer is more stable than the Post-LN Transformer, that is the original Transformer architecture (Vaswani et al, 2017).…”
Section: B Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Previous, well-performing systems submitted to the IWLST offline and low-resource speech translation tracks made use of various methods to improve the performance of their cascade system. For the ASR component, many submissions used a combination of transformer and conformer models (Zhang et al, 2022;Li et al, 2022;Nguyen et al, 2021) or fine-tuned existing models (Zhang and Ao, 2022;Zanon Boito et al, 2022;Denisov et al, 2021). They managed to increase ASR performance by voice activity detection for segmentation (Zhang et al, 2022;Ding and Tao, 2021), training the ASR on synthetic data with added punctuation, noise-filtering and domain-specific finetuning (Zhang and Ao, 2022;Li et al, 2022) or adding an intermediate model that cleans the ASR output in terms of casing and punctuation (Nguyen et al, 2021).…”
Section: Previous Iwslt Approaches Formentioning
confidence: 99%