Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-short.103
|View full text |Cite
|
Sign up to set email alerts
|

Lightweight Adapter Tuning for Multilingual Speech Translation

Abstract: Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of taskspecific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). Starting from different pre-trained… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(16 citation statements)
references
References 22 publications
0
16
0
Order By: Relevance
“…In a similar trend, 'expert' modules have been added to (non-modular) pre-trained LMs post-hoc, predominantly referred to as adapters (Rebuffi et al, 2017(Rebuffi et al, , 2018Houlsby et al, 2019). Next to being extremely parameter (Houlsby et al, 2019;Mahabadi et al, 2021a;He et al, 2022) and training efficient (Pfeiffer et al, 2020a;, these modular approaches allow models to be extended to new data settings (Chen et al, 2019;, where newly learned knowledge can be combined (Stickland and Murray, 2019;Wang et al, 2021a;Pfeiffer et al, 2021a;Lauscher et al, 2020a;Mahabadi et al, 2021b;Poth et al, 2021), or stacked for combinatory cross-lingual (Pfeiffer et al, 2020bÜstün et al, 2020;Vidoni et al, 2020;Ansell et al, 2021b,a;Wang et al, 2021b) as well as NMT scenarios (Bapna and Firat, 2019;Philip et al, 2020;Chronopoulou et al, 2020;Le et al, 2021;Üstün et al, 2021;Stickland et al, 2021;Garcia et al, 2021).…”
Section: Modular Language Modelsmentioning
confidence: 99%
“…In a similar trend, 'expert' modules have been added to (non-modular) pre-trained LMs post-hoc, predominantly referred to as adapters (Rebuffi et al, 2017(Rebuffi et al, , 2018Houlsby et al, 2019). Next to being extremely parameter (Houlsby et al, 2019;Mahabadi et al, 2021a;He et al, 2022) and training efficient (Pfeiffer et al, 2020a;, these modular approaches allow models to be extended to new data settings (Chen et al, 2019;, where newly learned knowledge can be combined (Stickland and Murray, 2019;Wang et al, 2021a;Pfeiffer et al, 2021a;Lauscher et al, 2020a;Mahabadi et al, 2021b;Poth et al, 2021), or stacked for combinatory cross-lingual (Pfeiffer et al, 2020bÜstün et al, 2020;Vidoni et al, 2020;Ansell et al, 2021b,a;Wang et al, 2021b) as well as NMT scenarios (Bapna and Firat, 2019;Philip et al, 2020;Chronopoulou et al, 2020;Le et al, 2021;Üstün et al, 2021;Stickland et al, 2021;Garcia et al, 2021).…”
Section: Modular Language Modelsmentioning
confidence: 99%
“…To address the inefficiency and overfitting issues in low-resource abstractive summarization, Chen et al [25] inserted the adapters into both encoder and decoder of PLMs by restricting the number of trainable parameters and layers. Besides, many studies have shown that adpaters can be used to help PLMs efficiently capture some input characteristics for generating more accurate output text with a low extra cost in terms of parameters [93,158]. For example, Ribeiro et al [158] utilized the adapters to effectively model the input graph structure when fine-tuning PLMs, which usually are pretrained using natural language and not structured data.…”
Section: 14mentioning
confidence: 99%
“…Baseline Models In Table 1, we compared our method with end-to-end baseline models whose audio inputs are 80-channel log Mel-filter bank, including: FairseqST (Wang et al, 2020a), NeurST (Zhao et al, 2021a), Espnet ST (Inaguma et al, 2020), Dual-decoder Transformer (Le et al, 2020), SATE , Speechformer (Papi et al, 2021), self training and mutual learning (Zhao et al, 2021b) method, STAST , bi-KD (Inaguma et al, 2021), MLT method (Tang et al, 2021b), Lightweight Adaptor (Le et al, 2021), JT-S-MT (Tang et al, 2021a), FAT-ST , TaskAware (Indurthi et al, 2021), and STPT (Tang et al, 2022). We also compare our method to baseline models that have pretrained Wav2vec2.0 as a module, including:…”
Section: B Experimental Detailsmentioning
confidence: 99%