On the Prunability of Attention Heads in Multilingual BERT

Budhraja, Aakriti; Pande, Madhura; Kumar, Pratyush; Khapra, Mitesh M.

doi:10.48550/arxiv.2109.12683

Cited by 1 publication

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Frankle and Carbin (2018) showed that subnetworks can be found through pruning methods (Han et al, 2015;Li et al, 2016) that match the performance of the full model. Since then, it has been shown that such subnetworks exist within BERT models (Prasanna et al, 2020;Budhraja et al, 2021;, and that both languageneutral and language-specific subnetworks can be found in multilingual LMs (Foroutan et al, 2022). Hence, sparse training gained popularity in multilingual NLP: Nooralahzadeh and Sennrich (2023) show that training task-specific subnetworks can help in cross-lingual transfer, Lin et al (2021) use language-pair-specific subnetworks for neural machine translation, and Hendy et al (2022) use domain-specific subnetworks.…”

Section: Subnetwork and Sftmentioning

confidence: 99%

Cross-Lingual Transfer with Language-Specific Subnetworks for Low-Resource Dependency Parsing

Choenni

Garrette

Shutova

2023

Computational Linguistics

View full text Add to dashboard Cite

Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this article, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive transfer during fine-tuning. We introduce dynamic subnetworks, which are jointly updated with the model, and we combine our methods with meta-learning, an established, but complementary, technique for improving cross-lingual transfer. Finally, we provide extensive analyses of how each of our methods affects the models.

show abstract