How multilingual is Multilingual BERT?

Pires, Telmo; Schlinger, Eva; Garrette, Dan

doi:10.48550/arxiv.1906.01502

Cited by 119 publications

(102 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…if ! = then = ( )[ ] 30: end for 31: return False, , _ over the years and have decided to finalize Bi-LSTM-CNN [3], Bi-GRU-CNN [3], Transformer [12], char-CNN [21] and mBERT [15] based architectures for demonstration of the model agnostic nature of our adversarial attack technique. The maximum input sequence length, vocabulary size, learning rate for these experiments were set at 25, 17k, and 0.001 respectively.…”

Section: Experiments and Resultsmentioning

confidence: 99%

“…We introduce a three-step attack strategy that can be used for generating adversarial examples using minimal resources for any type of code-mixed data (with and without transliteration). We have used our framework to evaluate the success of adversarial attacks on a few sentiment classification models [14,15,21] that have been diagnosed effective on code-mixed data. Research on adversarial techniques has become an important aspect, especially for securitycritical applications, as it helps us in both analyzing the fallacies of the models, and make them more robust.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

AdvCodeMix: Adversarial Attack on Code-Mixed Data

Das¹,

Basak²,

Mandal³

et al. 2022

Proceedings of the 5th Joint International Conference on Data Science &Amp; Management of Data (9th ACM IKDD CODS and 27th COMA

View full text Add to dashboard Cite

Research on adversarial attacks are becoming widely popular in the recent years. One of the unexplored areas where prior research is lacking is the effect of adversarial attacks on code-mixed data. Therefore, in the present work, we have explained the first generalized framework on text perturbation to attack code-mixed classification models in a black-box setting. We rely on various perturbation techniques that preserve the semantic structures of the sentences and also obscure the attacks from the perception of a human user. The present methodology leverages the importance of a token to decide where to attack by employing various perturbation strategies. We test our strategies on various sentiment classification models trained on Bengali-English and Hindi-English code-mixed datasets, and reduce their F1-scores by nearly 51% and 53% respectively, which can be further reduced if a larger number of tokens are perturbed in a given sentence.

show abstract

Section: Experiments and Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

AdvCodeMix: Adversarial Attack on Code-Mixed Data

Das¹,

Basak²,

Mandal³

et al. 2022

Proceedings of the 5th Joint International Conference on Data Science &Amp; Management of Data (9th ACM IKDD CODS and 27th COMA

View full text Add to dashboard Cite

show abstract

“…Oversampling of low resource languages is done to overcome data imbalance. It has shown great results on zero-shot transfer learning for various downstream tasks and also helped in code-switched data tasks [21].…”

Section: Multilingual-bert (M-bert)mentioning

confidence: 99%

Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches

Ravindra¹,

Joshi²

2021

Preprint

View full text Add to dashboard Cite

In the recent past, social media platforms have helped people in connecting and communicating to a wider audience. But this has also led to a drastic increase in cyberbullying. It is essential to detect and curb hate speech to keep the sanity of social media platforms. Also, code mixed text containing more than one language is frequently used on these platforms. We, therefore, propose automated techniques for hate speech detection in code mixed text from scraped Twitter. We specifically focus on code mixed English-Hindi text and transformer-based approaches. While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets. We try to evaluate the performances of multilingual BERT and Indic-BERT in single-encoder and dual-encoder settings. The first approach is to concatenate the target text and context text using a separator token and get a single representation from the BERT model. The second approach encodes the two texts independently using a dual BERT encoder and the corresponding representations are averaged. We show that the dual-encoder approach using independent representations yields better performance. We also employ simple ensemble methods to further improve the performance. Using these methods we report the best F1 score of 73.07% on the HASOC 2021 ICHCL code mixed data set.

show abstract

“…Another set of studies have identified ways to make these models more efficient by methods such as pruning (McCarley, 2019;Gordon et al, 2020;Sajjad et al, 2020;Budhraja et al, 2020). A third set of studies show that multilingual extensions of these models, such as Multilingual BERT (Devlin et al, 2019), have surprisingly high crosslingual transfer (Pires et al, 2019;Wu and Dredze, 2019).…”

Section: Introductionmentioning

confidence: 99%

On the Prunability of Attention Heads in Multilingual BERT

Budhraja¹,

Pande²,

Kumar³

et al. 2021

Preprint

View full text Add to dashboard Cite

Large multilingual models, such as mBERT, have shown promise in crosslingual transfer. In this work, we employ pruning to quantify the robustness and interpret layer-wise importance of mBERT. On four GLUE tasks, the relative drops in accuracy due to pruning have almost identical results on mBERT and BERT suggesting that the reduced attention capacity of the multilingual models does not affect robustness to pruning. For the crosslingual task XNLI, we report higher drops in accuracy with pruning indicating lower robustness in crosslingual transfer. Also, the importance of the encoder layers sensitively depends on the language family and the pre-training corpus size. The top layers, which are relatively more influenced by fine-tuning, encode important information for languages similar to English (SVO) while the bottom layers, which are relatively less influenced by fine-tuning, are particularly important for agglutinative and low-resource languages.

show abstract

How multilingual is Multilingual BERT?

Cited by 119 publications

References 11 publications

AdvCodeMix: Adversarial Attack on Code-Mixed Data

AdvCodeMix: Adversarial Attack on Code-Mixed Data

Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches

On the Prunability of Attention Heads in Multilingual BERT

Contact Info

Product

Resources

About