Proceedings of the 25th Conference on Computational Natural Language Learning 2021
DOI: 10.18653/v1/2021.conll-1.5
|View full text |Cite
|
Sign up to set email alerts
|

On Language Models for Creoles

Abstract: Creole languages such as Nigerian Pidgin English and Haitian Creole are under-resourced and largely ignored in the NLP literature. Creoles typically result from the fusion of a foreign language with multiple local languages, and what grammatical and lexical features are transferred to the creole is a complex process (Sessarego, 2020). While creoles are generally stable, the prominence of some features may be much stronger with certain demographics or in some linguistic situations (Winford, 1999;Patrick, 1999).… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 29 publications
(34 reference statements)
0
7
0
Order By: Relevance
“…We additionally contribute to the body work seeking to characterize and adapt neural model performance on rare or novel examples and classes (Horn & Perona, 2017;Bengio, 2015). In the context of language modeling, Lent et al (2021) explored performance on under-resourced languages, whereas Oren et al (2019) did so on under-represented domains in training corpora. Mc-Coy et al (2021) introduced analyses to assess sequential and syntactic novelty in LMs.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We additionally contribute to the body work seeking to characterize and adapt neural model performance on rare or novel examples and classes (Horn & Perona, 2017;Bengio, 2015). In the context of language modeling, Lent et al (2021) explored performance on under-resourced languages, whereas Oren et al (2019) did so on under-represented domains in training corpora. Mc-Coy et al (2021) introduced analyses to assess sequential and syntactic novelty in LMs.…”
Section: Related Workmentioning
confidence: 99%
“…Meister & Cotterell (2021), for example, investigated the statistical tendencies of the distribution defined by neural LMs, whereas Kulikov et al (2021) explored whether they adequately capture the modes of the distribution they attempt to model. At the same time, increased focus has been given to performance on rare or novel events in the data distribution, both for models of natural language (McCoy et al, 2021;Lent et al, 2021;Dudy & Bedrick, 2020;Oren et al, 2019) and neural models more generally (see, for example Sagawa et al, 2020;D'souza et al, 2021;Blevins & Zettlemoyer, 2020;Czarnowska et al, 2019;Horn & Perona, 2017;Ouyang et al, 2016;Bengio, 2015;Zhu et al, 2014). Neither of these branches of work, however, has explored instancelevel LM performance on rare sequences in the distribution.…”
Section: Introductionmentioning
confidence: 99%
“…Hagemeijer et al (2014) presents an extensive overview of Creole data resources through 2014 for a wide variety of Creoles, many of which are more traditional corpora, (e.g., transcriptions of conversations made by linguists with formal training, or scans of documents originally written in the Creole language); though these may not have the relevant annotations for common NLP tasks. Lent et al (2021) also provides a thorough overview of existing NLP datasets for Haitian Kreyol, Singaporean Colloquial English (Singlish), and Nigerian Pidgin English. In this work, we set about the task of manually verifying each dataset presented by Hagemeijer et al (2014) and Lent et al (2021), as well as searching for additional resources.…”
Section: Creole Data and Creole Nlpmentioning
confidence: 99%
“…Lent et al (2021) also provides a thorough overview of existing NLP datasets for Haitian Kreyol, Singaporean Colloquial English (Singlish), and Nigerian Pidgin English. In this work, we set about the task of manually verifying each dataset presented by Hagemeijer et al (2014) and Lent et al (2021), as well as searching for additional resources. We present all "verified" datasets in Table 1.…”
Section: Creole Data and Creole Nlpmentioning
confidence: 99%
See 1 more Smart Citation