Rolando Coto-Solano scite author profile

Rolando Coto-Solano

5Publications

60Citation Statements Received

59Citation Statements Given

How they've been cited

168

How they cite others

Affiliations

Dartmouth College, Universidad de Costa Rica

Publications

Order By: Most citations

Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas

Mager¹,

Oncevay²,

Ebrahimi³

et al. 2021

View full text Add to dashboard Cite

This paper presents the results of the 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas. The shared task featured two independent tracks, and participants submitted machine translation systems for up to 10 indigenous languages. Overall, 8 teams participated with a total of 214 submissions. We provided training sets consisting of data collected from various sources, as well as manually translated sentences for the development and test sets. An official baseline trained on this data was also provided. Team submissions featured a variety of architectures, including both statistical and neural models, and for the majority of languages, many teams were able to considerably improve over the baseline. The best performing systems achieved 12.97 ChrF higher than baseline, when averaged across languages.

show abstract

Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri

Feldman¹,

Coto-Solano²

2020

View full text Add to dashboard Cite

This paper presents a neural machine translation model and dataset for the Chibchan language Bribri, with an average performance of BLEU 16.9±1.7. This was trained on an extremely small dataset (5923 Bribri-Spanish pairs), providing evidence for the applicability of NMT in extremely low-resource environments. We discuss the challenges entailed in managing training input from languages without standard orthographies, we provide evidence of successful learning of Bribri grammar, and also examine the translations of structures that are infrequent in major Indo-European languages, such as positional verbs, ergative markers, numerical classifiers and complex demonstrative systems. In addition to this, we perform an experiment of augmenting the dataset through iterative back-translation (Sennrich et al., 2016a;Hoang et al., 2018), by using Spanish sentences to create synthetic Bribri sentences. This improves the score by an average of 1.0 BLEU, but only when the new Spanish sentences belong to the same domain as the other Spanish examples. This contributes to the small but growing body of research on Chibchan NLP.

show abstract

Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS)

Foley¹,

Arnold²,

Coto-Solano³

et al. 2018

107

View full text Add to dashboard Cite

Machine learning has revolutionised speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers. This paper describes the development of Elpis, a pipeline which language documentation workers with minimal computational experience can use to build their own speech recognition models, resulting in models being built for 16 languages from the Asia-Pacific region. Elpis puts machine learning speech technologies within reach of people working with languages with scarce data, in a scalable way. This is impactful since it enables language communities to cross the digital divide, and speeds up language documentation. Complete automation of the process is not feasible for languages with small quantities of data and potentially large vocabularies. Hence our goal is not full automation, but rather to make a practical and effective workflow that integrates machine learning technologies.

show abstract

Comparison of Two Forced Alignments Systems for Aligning Bribri Speech

Coto-Solano

Solórzano

2017

CLEIej

View full text Add to dashboard Cite

Abstract: Forced alignment provides drastic savings in time when aligning speech recordings and is particularly useful for the study of Indigenous languages, which are severely under-resourced in corpora and models. Here we compare two forced alignment systems, FAVE-align and EasyAlign, to determine which one provides more precision when processing running speech in the Chibchan language Bribri. We aligned a segment of a story narrated in Bribri and compared the errors in finding the center of the words and the edges of phonemes when compared with the manual correction. FAVE-align showed better performance: It has an error of 7% compared to 24% with EasyAlign when finding the center of words, and errors of 22~24 ms when finding the edges of phonemes, compared to errors of 86~130 ms with EasyAlign. In addition to this, EasyAlign failed to detect 7% of phonemes, while also inserting 58 spurious phones into the transcription. Future research includes verifying these results for other genres and other Chibchan languages. Finally, these results provide additional evidence for the applicability of natural language processing methods to Chibchan languages and point to future work such as the construction of corpora and the training of automated speech recognition systems. Spanish Abstract: El alineamiento forzado provee un ahorro drástico de tiempo al alinear grabaciones del habla, y es útil para el estudio de las lenguas indígenas, las cuales cuentan con pocos recursos para generar corpus y modelos computacionales. Aquí comparamos dos sistemas de alineamiento, FAVE-align e EasyAlign, para determinar cuál provee mayor precisión al alinear habla en la lengua chibcha bribri. Alineamos una narración y comparamos el error al tratar de encontrar el centro de las palabras y los bordes de los fonemas con sus equivalentes en una corrección manual. FAVE-align tuvo mejor rendimiento, con un error de 7% comparado con 24% de EasyAlign para el centro de las palabras, y con errores de 22~24 ms para el borde de los fonemas, comparado con 86~130 ms con EasyAlign. Además, EasyAlign no pudo detectar el 7% de los fonemas, y al mismo tiempo añadió 58 sonidos espurios a la transcripción. Como trabajo futuro verificaremos estos resultados con otros géneros hablados y con otras lenguas chibchas. Finalmente, estos resultados comprueban la aplicabilidad de los métodos de procesamiento de lengua natural a las lenguas chibchas, y apuntan a trabajo futuro en la construcción de corpus y el entrenamiento de sistemas de reconocimiento automático del habla.

show abstract

Alineación forzada sin entrenamiento para la anotación automática de corpus orales de las lenguas indígenas de Costa Rica

Coto-Solano¹,

Solórzano

2017

View full text Add to dashboard Cite

La alineación forzada provee un ahorro drástico de tiempo al segmentar grabaciones de habla. Esto es parti- cularmente útil para las lenguas indígenas, las cuales carecen de recursos para su estudio desde la lingüística computacional. Este artículo presenta un método para alinear grabaciones en bribri, cabécar y malecu usando modelos acústicos entrenados para inglés y francés. Se usaron los sistemas FAVE-align e EasyAlign para pro- ducir TextGrids de Praat, y se obtuvieron errores de 2~3 milisegundos para el centro de las palabras en bribri y malecu (8~13% de la duración de las palabras) y de 7 milisegundos para el cabécar (37% de la duración de las palabras). Los fonemas también tuvieron un desempeño adecuado; para el bribri y el malecu el 40% de los fonemas estaban alineados con un error igual o menor a 1 milisegundo, mientras que esta cifra es de 24% para el cabécar. El desempeño más bajo del cabécar puede deberse a que usó una grabación con más ruido ambien- tal. Estos sistemas de alineación forzada pueden ayudar al estudio automatizado de las lenguas de Costa Rica mediante la generación de corpus alineados que puedan usarse para estudios fonéticos y para entrenamiento de modelos acústicos y de reconocimiento del habla.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.