Predicting the Performance of Multilingual NLP Models

Srinivasan, Anirudh; Sitaram, Sunayana; Ganu, Tanuja; Dandapat, Sandipan; Bali, Kalika; Choudhury, Monojit

doi:10.48550/arxiv.2110.08875

Cited by 6 publications

(18 citation statements)

References 25 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, in practice, one would like to understand the trade-offs before collecting the data. Recently, Srinivasan et al (2021) showed that it is possible to predict the zero-shot and few-shot performance of MMLMs for different languages using linguistic properties and their representation in the pre-training corpus. Understanding if there exists a similar dependence of the performance trade-offs with the linguistic properties of different languages can help us generalize our framework to the new languages without the need for explicit data collection.…”

Section: Discussionmentioning

confidence: 99%

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

Ahuja¹,

Choudhury²,

Dandapat³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Borrowing ideas from Production functions in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models. We illustrate the effectiveness of our framework through a case-study on the TyDIQA-GoldP dataset. One of the interesting conclusions of the study is that if the cost of machine translation is greater than zero, the optimal performance at least cost is always achieved with at least some or only manually-created data. To our knowledge, this is the first attempt towards extending the concept of production functions to study data collection strategies for training multilingual models, and can serve as a valuable tool for other similar cost vs data tradeoffs in NLP.

show abstract

Section: Discussionmentioning

confidence: 99%

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

Ahuja¹,

Choudhury²,

Dandapat³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…We consider two different regression models to estimate the perfor-mance in our experiments. i) XGBoost: We use the popular Tree Boosting algorithm XGBoost for solving the regression problem, which has been previously shown to achieve impressive results on the task (Xia et al, 2020;Srinivasan et al, 2021).…”

Section: Performance Predictorsmentioning

confidence: 99%

“…Xia et al (2020) showed that it is possible to build regression models that can accurately predict evaluation scores of NLP models under different experimental settings using various linguistic and dataset specific features. Srinivasan et al (2021) (c) Number of multilingual tasks containing test data for each of the 106 languages supported by the MMLMs (mBERT, XLMR). The bars are shaded according to the class taxonomy proposed by Joshi et al (2020).…”

Section: Introductionmentioning

confidence: 99%

“…Finally, there are a number of ways in which the current performance prediction methods can be improved for a more reliable estimation. Both Xia et al (2020); Srinivasan et al (2021) observed that these models can struggle to generalize on languages or configurations that have features that are remarkably different from the training data. Multitask learning as hinted by Lin et al (2019) and our experiments with Group Lasso can be a possible way to address this issue.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Beyond Static models and test sets: Benchmarking the potential of pre-trained models across tasks and languages

Ahuja¹,

Dandapat²,

Sitaram³

et al. 2022

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

Self Cite

View full text Add to dashboard Cite

Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 100 languages, most existing multilingual NLP benchmarks provide evaluation data in only a handful of these languages with little linguistic diversity. We argue that this makes the existing practices in multilingual evaluation unreliable and does not provide a full picture of the performance of MMLMs across the linguistic landscape. We propose that the recent work done in Performance Prediction for NLP tasks can serve as a potential solution in fixing benchmarking in Multilingual NLP by utilizing features related to data and language typology to estimate the performance of an MMLM on different languages. We compare performance prediction with translating test data with a case study on four different multilingual datasets, and observe that these methods can provide reliable estimates of the performance that are often on-par with the translation based approaches, without the need for any additional translation as well as evaluation costs.

show abstract

“…Lauscher et al (2020) recently, showed that it is possible to predict the zero shot performance of mBERT and XLM-R on different languages by formulating it as a regression problem, with pretraining data size and typological similarities between the pivot and target languages as the input features, and the performance on downstream task as the prediction target. Along similar lines Srinivasan et al (2021) and Dolicki and Spanakis (2021) explore zero-shot performance prediction with a larger set of features and different regression techniques.…”

Section: Introductionmentioning

confidence: 99%

Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models

Ahuja¹,

Kumar²,

Dandapat³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages, though the performance varies from language to language depending on the pivot language(s) used for fine-tuning. In this work, we build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multitask learning problem. We jointly train predictive models for different tasks which helps us build more accurate predictors for tasks where we have test data in very few languages to measure the actual performance of the model. Our approach also lends us the ability to perform a much more robust feature selection, and identify a common set of features that influence zero-shot performance across a variety of tasks.

show abstract

Predicting the Performance of Multilingual NLP Models

Cited by 6 publications

References 25 publications

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

Beyond Static models and test sets: Benchmarking the potential of pre-trained models across tasks and languages

Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models

Contact Info

Product

Resources

About