Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages 2017
DOI: 10.18653/v1/w17-0122
|View full text |Cite
|
Sign up to set email alerts
|

Cross-language forced alignment to assist community-based linguistics for low resource languages

Abstract: In community-based linguistics, community members become involved in the analysis of their own language. This insider perspective can radically increase the speed and accuracy of phonological analysis, e.g. providing rapid identification of phonemic contrasts. However, due to the nature of these community-based sessions, much of the phonetic data is left undocumented. Rather than going back to traditional fieldwork, this paper argues that corpus phonetics can be applied to recordings of the community-based ana… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 6 publications
0
2
0
Order By: Relevance
“…An example of this would be using English ARPABET to transcribe words from Mixtec, and then treating the Mixtec words as new English words, which would be handled by a pre‐existing English acoustic model (DiCanio et al., 2013). This technique has been used to align data from languages like Swedish (Young & McGarrah, 2021); Tongan (Johnson et al., 2018), Chatino (Ćavar et al., 2016) and Triqui (Hatcher & DiCanio, 2019) from Mexico, Bribri (Coto‐Solano & Flores‐Solórzano, 2017), Malecu and Cabécar from Costa Rica (Coto‐Solano & Flores‐Solórzano, 2016); Nikyob from Nigeria (Kempton, 2017), Matukar Panau from Papua New Guinea (Barth et al., 2020); Yidiny from Australia (Babinsky et al., 2019); and North Australian Kriol (Jones et al., 2017).…”
Section: Extracting Linguistic Data From Aligned Transcriptionsmentioning
confidence: 99%
“…An example of this would be using English ARPABET to transcribe words from Mixtec, and then treating the Mixtec words as new English words, which would be handled by a pre‐existing English acoustic model (DiCanio et al., 2013). This technique has been used to align data from languages like Swedish (Young & McGarrah, 2021); Tongan (Johnson et al., 2018), Chatino (Ćavar et al., 2016) and Triqui (Hatcher & DiCanio, 2019) from Mexico, Bribri (Coto‐Solano & Flores‐Solórzano, 2017), Malecu and Cabécar from Costa Rica (Coto‐Solano & Flores‐Solórzano, 2016); Nikyob from Nigeria (Kempton, 2017), Matukar Panau from Papua New Guinea (Barth et al., 2020); Yidiny from Australia (Babinsky et al., 2019); and North Australian Kriol (Jones et al., 2017).…”
Section: Extracting Linguistic Data From Aligned Transcriptionsmentioning
confidence: 99%
“…Searching for the term "community" in the ACL Anthology 2 returns 100 papers. However, by manually inspecting each of them, we discovered that only 9 present some sort of engagement with a community of speakers (Garcia et al, 2008;Levin, 2009;Bird et al, 2014;Everson et al, 2019;Kempton, 2017;Susarla and Challa, 2019;Conforti et al, 2 Accessed on April 30th, 2021 2020; Griscom, 2020;Le Ferrand et al, 2020). These works target endangered languages and propose technological solutions to an array of problems (e.g., archiving, documenting, or tooling).…”
Section: Data and Communities Are Not Separate Thingsmentioning
confidence: 99%