XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment

El-Kishky, Ahmed; Renduchintala, Adithya; Cross, James; Guzmán, Francisco; Koehn, Philipp

doi:10.18653/v1/2021.emnlp-main.814

Cited by 5 publications

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…15 For each corpus, a native speaker of each included language variety was asked to label a 14 https://iso639-3.sil.org/request/2008 -040. 15 We do not include XLEnt (El-Kishky et al, 2021) since it comprises cross-lingual named entities rather than texts. random sample of 50 texts (or parallel texts, in CCAligned and WikiMatrix) according to the labeling scheme and guidelines presented in Kreutzer et al (2022).…”

Section: Little Attention To Representativenessmentioning

confidence: 99%

Language Varieties of Italy: Technology Challenges and Opportunities

Ramponi

2024

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Italy is characterized by a one-of-a-kind linguistic diversity landscape in Europe, which implicitly encodes local knowledge, cultural traditions, artistic expressions, and history of its speakers. However, most local languages and dialects in Italy are at risk of disappearing within a few generations. The NLP community has recently begun to engage with endangered languages, including those of Italy. Yet, most efforts assume that these varieties are under-resourced language monoliths with an established written form and homogeneous functions and needs, and thus highly interchangeable with each other and with high-resource, standardized languages. In this paper, we introduce the linguistic context of Italy and challenge the default machine-centric assumptions of NLP for Italy’s language varieties. We advocate for a shift in the paradigm from machine-centric to speaker-centric NLP, and provide recommendations and opportunities for work that prioritizes languages and their speakers over technological advances. To facilitate the process, we finally propose building a local community towards responsible, participatory efforts aimed at supporting vitality of languages and dialects of Italy.

show abstract

Section: Little Attention To Representativenessmentioning

confidence: 99%