Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.539
|View full text |Cite
|
Sign up to set email alerts
|

Local Languages, Third Spaces, and other High-Resource Scenarios

Abstract: How can language technology address the diverse situations of the world's languages? In one view, languages exist on a resource continuum and the challenge is to scale existing solutions, bringing under-resourced languages into the high-resource world. In another view, presented here, the world's language ecology includes standardised languages, local languages, and contact languages. These are often subsumed under the label of 'under-resourced languages' even though they have distinct functions and prospects.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(16 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…To our knowledge, the term language modelling bias has not been used so far in any way similar to ours. Many of the underlying exploitative mechanisms have, however, been pointed out, in particular in relation to the most disempowered social groups, namely small indigenous speaker communities (Bird, 2022;Schwartz, 2022). In terms of actual bias in AI systems and data, the research closest to ours concerns inductive bias in language models towards certain morphological and syntactic structures (Ravfogel et al, 2019;White and Cotterell, 2021).…”
Section: Linguistic Diversity and Language Modelling Biasmentioning
confidence: 94%
See 1 more Smart Citation
“…To our knowledge, the term language modelling bias has not been used so far in any way similar to ours. Many of the underlying exploitative mechanisms have, however, been pointed out, in particular in relation to the most disempowered social groups, namely small indigenous speaker communities (Bird, 2022;Schwartz, 2022). In terms of actual bias in AI systems and data, the research closest to ours concerns inductive bias in language models towards certain morphological and syntactic structures (Ravfogel et al, 2019;White and Cotterell, 2021).…”
Section: Linguistic Diversity and Language Modelling Biasmentioning
confidence: 94%
“…He observes the importance of vehicular or trade languages in addressing local vernaculars-beyond Spanish, French, or English, also Arabic, Persian, Hindi, Urdu, Amharic, Hausa, or Swahili are also widely used trade languages. In (Bird, 2022), a multipolar model is proposed for working with language communities, where trade languages function as bridges or pivots across local languages and vernaculars. Along a similar philosophy, Masakhane adopts a research methodology they call participatory, which makes sure that human agents are from local communities or, if this is not entirely possible, at least knowledge transfer takes place (Nekoto et al, 2020).…”
Section: Methodology As a Source Of Biasmentioning
confidence: 99%
“…Indeed, the findings of the European Language Equality project (https://europeanlanguage-equality.eu/, accessed on 14 December 2022) over the past two years demonstrate a very sorry state of affairs: despite the obvious improvements in language technology since the implementation of methods based on neural networks, language barriers still hamper cross-lingual communication and the free flow of knowledge across borders, and many languages are endangered or on the edge of extinction [2,3]. On a global scale, the situation is far worse, of course, especially for languages that do not have a written tradition [4].…”
Section: Introductionmentioning
confidence: 99%
“…The definition of low-resource actually differs greatly between works. One definition byBird (2022) advocates the usage for (would-be) standardized languages with a large amount of speakers and a written tradition, but a lack of resources for language technologies. Another way is a task-dependent definition: For dependency parsing, Müller-Eberstein et al (2021) define low-resource as providing less than 5000 annotated sentences in the Universal Dependencies Treebank Hedderich et al (2021)…”
mentioning
confidence: 99%