Masato Hagiwara scite author profile

We present the task of second language acquisition (SLA) modeling. Given a history of errors made by learners of a second language, the task is to predict errors that they are likely to make at arbitrary points in the future. We describe a large corpus of more than 7M words produced by more than 6k learners of English, Spanish, and French using Duolingo, a popular online language-learning app. Then we report on the results of a shared task challenge aimed studying the SLA task via this corpus, which attracted 15 teams and synthesized work from various fields including cognitive science, linguistics, and machine learning.

show abstract

Machine Learning–Driven Language Assessment

Settles¹,

LaFlair²,

Hagiwara³

2020

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

We describe a method for rapidly creating language proficiency assessments, and provide experimental evidence that such tests can be valid, reliable, and secure. Our approach is the first to use machine learning and natural language processing to induce proficiency scales based on a given standard, and then use linguistic models to estimate item difficulty directly for computer-adaptive testing. This alleviates the need for expensive pilot testing with human subjects. We used these methods to develop an online proficiency exam called the Duolingo English Test, and demonstrate that its scores align significantly with other high-stakes English assessments. Furthermore, our approach produces test scores that are highly reliable, while generating item banks large enough to satisfy security requirements.

show abstract

Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora

Zirikly

Hagiwara

2015

View full text Add to dashboard Cite

We propose an approach to cross-lingual named entity recognition model transfer without the use of parallel corpora. In addition to global de-lexicalized features, we introduce multilingual gazetteers that are generated using graph propagation, and cross-lingual word representation mappings without the use of parallel data. We target the e-commerce domain, which is challenging due to its unstructured and noisy nature. The experiments have shown that our approaches beat the strong MT baseline, where the English model is transferred to two languages: Spanish and Chinese.

show abstract

A supervised learning approach to automatic synonym identification based on distributional features

Hagiwara

2008

View full text Add to dashboard Cite

Distributional similarity has been widely used to capture the semantic relatedness of words in many NLP tasks. However, various parameters such as similarity measures must be handtuned to make it work effectively. Instead, we propose a novel approach to synonym identification based on supervised learning and distributional features, which correspond to the commonality of individual context types shared by word pairs. Considering the integration with pattern-based features, we have built and compared five synonym classifiers. The evaluation experiment has shown a dramatic performance increase of over 120% on the F-1 measure basis, compared to the conventional similarity-based classification. On the other hand, the pattern-based features have appeared almost redundant.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Masato Hagiwara

Comparative genomics of bidirectional gene pairs and its implications for the evolution of a transcriptional regulation system

Second Language Acquisition Modeling

Machine Learning–Driven Language Assessment

Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora

A supervised learning approach to automatic synonym identification based on distributional features

Contact Info

Product

Resources

About