Universal Dependencies for Learner English

Berzak, Yevgeni; Kenney, Jessica; Spadine, Carolyn; Wang, Jing-Xian; Lam, Lucia L.C.; Mori, Keiko Sophie; Garza, Sebastian; Katz, Boris

doi:10.48550/arxiv.1605.04278

Cited by 1 publication

(1 citation statement)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although previous studies have consistently shown that fine-tuning L1 pre-trained models with L2 data improves the accuracy of both tokenization and POS tagging for L2 data (Berzak et al, 2016;Kyle et al, 2022;Sung and Shin, 2023), there are two key questions unresolved with respect to developing L2 domain-specific models. First, it is unclear how the models perform in zero-shot scenarios with unseen L2 data (i.e., L2 test sets not sourced from the same origin as L2 training data 1 ), which is a crucial factor for enhancing the model's reliability and robustness (Choi and Palmer, 2012).…”

Section: L2 Domain-specific Model Developmentmentioning

confidence: 99%

Diversifying language models for lesser-studied languages and language-usage contexts: A case of second language Korean

Sung,

Shin

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

This study investigates the extent to which currently available morpheme parsers/taggers apply to lesser-studied languages and languageusage contexts, specifically focusing on second language (L2) Korean. We pursue this inquiry by ( 1) training a neural-network model (pretrained on first language [L1] Korean data) on varying L2 datasets and (2) measuring its morpheme parsing/tagging performance on L2 test sets from both the same and different sources of L2 training sets. The results show that the L2 trained models generally excel in L2 domainspecific parsing and tagging tasks compared to the L1 pre-trained baseline model. Interestingly, increasing the size of L2 training data does not lead to improving model performance consistently.

show abstract

Section: L2 Domain-specific Model Developmentmentioning

confidence: 99%