“…Although previous studies have consistently shown that fine-tuning L1 pre-trained models with L2 data improves the accuracy of both tokenization and POS tagging for L2 data (Berzak et al, 2016;Kyle et al, 2022;Sung and Shin, 2023), there are two key questions unresolved with respect to developing L2 domain-specific models. First, it is unclear how the models perform in zero-shot scenarios with unseen L2 data (i.e., L2 test sets not sourced from the same origin as L2 training data 1 ), which is a crucial factor for enhancing the model's reliability and robustness (Choi and Palmer, 2012).…”