2021
DOI: 10.48550/arxiv.2109.12683
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Prunability of Attention Heads in Multilingual BERT

Abstract: Large multilingual models, such as mBERT, have shown promise in crosslingual transfer. In this work, we employ pruning to quantify the robustness and interpret layer-wise importance of mBERT. On four GLUE tasks, the relative drops in accuracy due to pruning have almost identical results on mBERT and BERT suggesting that the reduced attention capacity of the multilingual models does not affect robustness to pruning. For the crosslingual task XNLI, we report higher drops in accuracy with pruning indicating lower… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…Frankle and Carbin (2018) showed that subnetworks can be found through pruning methods (Han et al, 2015;Li et al, 2016) that match the performance of the full model. Since then, it has been shown that such subnetworks exist within BERT models (Prasanna et al, 2020;Budhraja et al, 2021;, and that both languageneutral and language-specific subnetworks can be found in multilingual LMs (Foroutan et al, 2022). Hence, sparse training gained popularity in multilingual NLP: Nooralahzadeh and Sennrich (2023) show that training task-specific subnetworks can help in cross-lingual transfer, Lin et al (2021) use language-pair-specific subnetworks for neural machine translation, and Hendy et al (2022) use domain-specific subnetworks.…”
Section: Subnetwork and Sftmentioning
confidence: 99%
“…Frankle and Carbin (2018) showed that subnetworks can be found through pruning methods (Han et al, 2015;Li et al, 2016) that match the performance of the full model. Since then, it has been shown that such subnetworks exist within BERT models (Prasanna et al, 2020;Budhraja et al, 2021;, and that both languageneutral and language-specific subnetworks can be found in multilingual LMs (Foroutan et al, 2022). Hence, sparse training gained popularity in multilingual NLP: Nooralahzadeh and Sennrich (2023) show that training task-specific subnetworks can help in cross-lingual transfer, Lin et al (2021) use language-pair-specific subnetworks for neural machine translation, and Hendy et al (2022) use domain-specific subnetworks.…”
Section: Subnetwork and Sftmentioning
confidence: 99%