Lightweight Adapter Tuning for Multilingual Speech Translation

Le, Hang; Pino, Juan; Wang, Changhan; Gu, Jiatao; Schwab, Didier; Besacier, Laurent

doi:10.18653/v1/2021.acl-short.103

Cited by 33 publications

(16 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a similar trend, 'expert' modules have been added to (non-modular) pre-trained LMs post-hoc, predominantly referred to as adapters (Rebuffi et al, 2017(Rebuffi et al, , 2018Houlsby et al, 2019). Next to being extremely parameter (Houlsby et al, 2019;Mahabadi et al, 2021a;He et al, 2022) and training efficient (Pfeiffer et al, 2020a;, these modular approaches allow models to be extended to new data settings (Chen et al, 2019;, where newly learned knowledge can be combined (Stickland and Murray, 2019;Wang et al, 2021a;Pfeiffer et al, 2021a;Lauscher et al, 2020a;Mahabadi et al, 2021b;Poth et al, 2021), or stacked for combinatory cross-lingual (Pfeiffer et al, 2020bÜstün et al, 2020;Vidoni et al, 2020;Ansell et al, 2021b,a;Wang et al, 2021b) as well as NMT scenarios (Bapna and Firat, 2019;Philip et al, 2020;Chronopoulou et al, 2020;Le et al, 2021;Üstün et al, 2021;Stickland et al, 2021;Garcia et al, 2021).…”

Section: Modular Language Modelsmentioning

confidence: 99%

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Pfeiffer¹,

Goyal²,

Lin³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns languagespecific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-MOD) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.

show abstract

Section: Modular Language Modelsmentioning

confidence: 99%

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Pfeiffer¹,

Goyal²,

Lin³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…To address the inefficiency and overfitting issues in low-resource abstractive summarization, Chen et al [25] inserted the adapters into both encoder and decoder of PLMs by restricting the number of trainable parameters and layers. Besides, many studies have shown that adpaters can be used to help PLMs efficiently capture some input characteristics for generating more accurate output text with a low extra cost in terms of parameters [93,158]. For example, Ribeiro et al [158] utilized the adapters to effectively model the input graph structure when fine-tuning PLMs, which usually are pretrained using natural language and not structured data.…”

Section: 14mentioning

confidence: 99%

Pretrained Language Models for Text Generation: A Survey

Li¹,

Tang²,

Zhao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Text Generation aims to produce plausible and readable text in human language from input data. The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). Grounding text generation on PLMs is seen as a promising direction in both academia and industry. In this survey, we present the recent advances achieved in the topic of PLMs for text generation. In detail, we begin with introducing three key points of applying PLMs to text generation: 1) how to encode the input data as representations preserving input semantics which can be fused into PLMs; 2) how to design a universal and performant architecture of PLMs served as generation models; and 3) how to optimize PLMs given the reference text and ensure the generated text satisfying special text properties. Then, we figure out several challenges and future directions within each key point. Next, we present a summary of various useful resources and typical text generation applications to work with PLMs. Finally, we conclude and summarize the contribution of this survey.CCS Concepts: • Computing methodologies → Natural language generation.

show abstract

“…Baseline Models In Table 1, we compared our method with end-to-end baseline models whose audio inputs are 80-channel log Mel-filter bank, including: FairseqST (Wang et al, 2020a), NeurST (Zhao et al, 2021a), Espnet ST (Inaguma et al, 2020), Dual-decoder Transformer (Le et al, 2020), SATE , Speechformer (Papi et al, 2021), self training and mutual learning (Zhao et al, 2021b) method, STAST , bi-KD (Inaguma et al, 2021), MLT method (Tang et al, 2021b), Lightweight Adaptor (Le et al, 2021), JT-S-MT (Tang et al, 2021a), FAT-ST , TaskAware (Indurthi et al, 2021), and STPT (Tang et al, 2022). We also compare our method to baseline models that have pretrained Wav2vec2.0 as a module, including:…”

Section: B Experimental Detailsmentioning

confidence: 99%

Cross-modal Contrastive Learning for Speech Translation

Ye¹,

Wang²,

Li³

2022

Preprint

View full text Add to dashboard Cite

How can we learn unified representations for spoken utterances and their written text? Learning similar representations for semantically similar speech and text is important for speech translation. To this end, we propose ConST, a cross-modal contrastive learning method for end-to-end speech-to-text translation. We evaluate ConST and a variety of previous baselines on a popular benchmark MuST-C. Experiments show that the proposed ConST consistently outperforms the previous methods on, and achieves an average BLEU of 29.4. The analysis further verifies that ConST indeed closes the representation gap of different modalities -its learned representation improves the accuracy of cross-modal speechtext retrieval from 4% to 88%. Code and models are available at https://github. com/ReneeYe/ConST.

show abstract

Lightweight Adapter Tuning for Multilingual Speech Translation

Cited by 33 publications

References 22 publications

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Pretrained Language Models for Text Generation: A Survey

Cross-modal Contrastive Learning for Speech Translation

Contact Info

Product

Resources

About