Local Languages, Third Spaces, and other High-Resource Scenarios

Bird, Steven

doi:10.18653/v1/2022.acl-long.539

Cited by 12 publications

(16 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To our knowledge, the term language modelling bias has not been used so far in any way similar to ours. Many of the underlying exploitative mechanisms have, however, been pointed out, in particular in relation to the most disempowered social groups, namely small indigenous speaker communities (Bird, 2022;Schwartz, 2022). In terms of actual bias in AI systems and data, the research closest to ours concerns inductive bias in language models towards certain morphological and syntactic structures (Ravfogel et al, 2019;White and Cotterell, 2021).…”

Section: Linguistic Diversity and Language Modelling Biasmentioning

confidence: 94%

“…He observes the importance of vehicular or trade languages in addressing local vernaculars-beyond Spanish, French, or English, also Arabic, Persian, Hindi, Urdu, Amharic, Hausa, or Swahili are also widely used trade languages. In (Bird, 2022), a multipolar model is proposed for working with language communities, where trade languages function as bridges or pivots across local languages and vernaculars. Along a similar philosophy, Masakhane adopts a research methodology they call participatory, which makes sure that human agents are from local communities or, if this is not entirely possible, at least knowledge transfer takes place (Nekoto et al, 2020).…”

Section: Methodology As a Source Of Biasmentioning

confidence: 99%

See 1 more Smart Citation

Tackling Language Modelling Bias in Support of Linguistic Diversity

Bella,

Helm,

Koch

et al. 2024

The 2024 ACM Conference on Fairness, Accountability, and Transparency

View full text Add to dashboard Cite

Section: Linguistic Diversity and Language Modelling Biasmentioning

confidence: 94%

Section: Methodology As a Source Of Biasmentioning

confidence: 99%

Tackling Language Modelling Bias in Support of Linguistic Diversity

Bella,

Helm,

Koch

et al. 2024

The 2024 ACM Conference on Fairness, Accountability, and Transparency

View full text Add to dashboard Cite

“…Indeed, the findings of the European Language Equality project (https://europeanlanguage-equality.eu/, accessed on 14 December 2022) over the past two years demonstrate a very sorry state of affairs: despite the obvious improvements in language technology since the implementation of methods based on neural networks, language barriers still hamper cross-lingual communication and the free flow of knowledge across borders, and many languages are endangered or on the edge of extinction [2,3]. On a global scale, the situation is far worse, of course, especially for languages that do not have a written tradition [4].…”

Section: Introductionmentioning

confidence: 99%

Building Neural Machine Translation Systems for Multilingual Participatory Spaces

et al. 2023

View full text Add to dashboard Cite

This work presents the development of the translation component in a multistage, multilevel, multimode, multilingual and dynamic deliberative (M4D2) system, built to facilitate automated moderation and translation in the languages of five European countries: Italy, Ireland, Germany, France and Poland. Two main topics were to be addressed in the deliberation process: (i) the environment and climate change; and (ii) the economy and inequality. In this work, we describe the development of neural machine translation (NMT) models for these domains for six European languages: Italian, English (included as the second official language of Ireland), Irish, German, French and Polish. As a result, we generate 30 NMT models, initially baseline systems built using freely available online data, which are then adapted to the domains of interest in the project by (i) filtering the corpora, (ii) tuning the systems with automatically extracted in-domain development datasets and (iii) using corpus concatenation techniques to expand the amount of data available. We compare our results produced by the domain-adapted systems with those produced by Google Translate, and demonstrate that fast, high-quality systems can be produced that facilitate multilingual deliberation in a secure environment.

show abstract

“…The definition of low-resource actually differs greatly between works. One definition byBird (2022) advocates the usage for (would-be) standardized languages with a large amount of speakers and a written tradition, but a lack of resources for language technologies. Another way is a task-dependent definition: For dependency parsing, Müller-Eberstein et al (2021) define low-resource as providing less than 5000 annotated sentences in the Universal Dependencies Treebank Hedderich et al (2021)…”

mentioning

confidence: 99%

Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Ulmer¹,

Frellsen²,

Hardmeier³

2022

Findings of the Association for Computational Linguistics: EMNLP 2022

View full text Add to dashboard Cite

We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are opensourced in a software package. 1 The model zoo is available under https://github.com/ Kaleidophon/nlp-uncertainty-zoo, with the code for the experiments available under https://github.com/ Kaleidophon/nlp-low-resource-uncertainty.2 That is, unless the model class we chose is too restrictive.

show abstract

Local Languages, Third Spaces, and other High-Resource Scenarios

Cited by 12 publications

References 29 publications

Tackling Language Modelling Bias in Support of Linguistic Diversity

Tackling Language Modelling Bias in Support of Linguistic Diversity

Building Neural Machine Translation Systems for Multilingual Participatory Spaces

Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Contact Info

Product

Resources

About