Erin Gustafson scite author profile

We present the task of second language acquisition (SLA) modeling. Given a history of errors made by learners of a second language, the task is to predict errors that they are likely to make at arbitrary points in the future. We describe a large corpus of more than 7M words produced by more than 6k learners of English, Spanish, and French using Duolingo, a popular online language-learning app. Then we report on the results of a shared task challenge aimed studying the SLA task via this corpus, which attracted 15 teams and synthesized work from various fields including cognitive science, linguistics, and machine learning.

show abstract

A Machine Learning Algorithm for Identifying Atopic Dermatitis in Adults from Electronic Health Records

Gustafson

Pacheco

Wehbe

et al. 2017

View full text Add to dashboard Cite

The current work aims to identify patients with atopic dermatitis for inclusion in genome-wide association studies (GWAS). Here we describe a machine learning-based phenotype algorithm. Using the electronic health record (EHR), we combined coded information with information extracted from encounter notes as features in a lasso logistic regression. Our algorithm achieves high positive predictive value (PPV) and sensitivity, improving on previous algorithms with low sensitivity. These results demonstrate the utility of natural language processing (NLP) and machine learning for EHR-based phenotyping.

show abstract

Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production

et al. 2016

View full text Add to dashboard Cite

Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When “pig” is mispronounced as “big,” the resulting /b/ sound differs from correct productions of “big,” moving towards intended “pig”—revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing.

show abstract

The influence of lexical selection disruptions on articulation.

Goldrick¹,

McClain²,

Cibelli³

et al. 2019

Journal of Experimental Psychology: Learning, Memory, and Cogni

View full text Add to dashboard Cite

Interactive models of language production predict that it should be possible to observe long-distance interactions; effects that arise at one level of processing influence multiple subsequent stages of representation and processing. We examine the hypothesis that disruptions arising in nonform-based levels of planning-specifically, lexical selection-should modulate articulatory processing. A novel automatic phonetic analysis method was used to examine productions in a paradigm yielding both general disruptions to formulation processes and, more specifically, overt errors during lexical selection. This analysis method allowed us to examine articulatory disruptions at multiple levels of analysis, from whole words to individual segments. Baseline performance by young adults was contrasted with young speakers' performance under time pressure (which previous work has argued increases interaction between planning and articulation) and performance by older adults (who may have difficulties inhibiting nontarget representations, leading to heightened interactive effects). The results revealed the presence of interactive effects. Our new analysis techniques revealed these effects were strongest in initial portions of responses, suggesting that speech is initiated as soon as the first segment has been planned. Interactive effects did not increase under response pressure, suggesting interaction between planning and articulation is relatively fixed. Unexpectedly, lexical selection disruptions appeared to yield some degree of facilitation in articulatory processing (possibly reflecting semantic facilitation of target retrieval) and older adults showed weaker, not stronger interactive effects (possibly reflecting weakened connections between lexical and form-level representations). (PsycINFO Database Record

show abstract

Adherence to US Preventive Services Task Force recommendations for breast and cervical cancer screening for women who have a spinal cord injury

Mann

Hardin

et al. 2016

The Journal of Spinal Cord Medicine

View full text Add to dashboard Cite

show abstract

Automatic measurement of vowel duration via structured prediction

Adi

Keshet

Cibelli

et al. 2016

View full text Add to dashboard Cite

A key barrier to making phonetic studies scalable and replicable is the need to rely on subjective, manual annotation. To help meet this challenge, a machine learning algorithm was developed for automatic measurement of a widely used phonetic measure: vowel duration. Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel. The model is based on the structured prediction framework. The input signal and a hypothesized set of a vowel's onset and offset are mapped to an abstract vector space by a set of acoustic feature functions. The learning algorithm is trained in this space to minimize the difference in expectations between predicted and manually-measured vowel durations. The trained model can then automatically estimate vowel durations without phonetic or orthographic transcription. Results comparing the model to three sets of manually annotated data suggest it outperformed the current gold standard for duration measurement, an hidden Markov model-based forced aligner (which requires orthographic or phonetic transcription as an input).

show abstract

Evaluating the reading and listening outcomes of beginning‐level Duolingo courses

Jiang¹,

Rollinson²,

Plonsky

et al. 2021

Foreign Language Annals

View full text Add to dashboard Cite

show abstract

Phonetic processing of non-native speech in semantic vs non-semantic tasks

Gustafson

Engstler

Goldrick

2013

View full text Add to dashboard Cite

Research with speakers with acquired production difficulties has suggested phonetic processing is more difficult in tasks that require semantic processing. The current research examined whether similar effects are found in bilingual phonetic processing. English-French bilinguals' productions in picture naming (which requires semantic processing) were compared to those elicited by repetition (which does not require semantic processing). Picture naming elicited slower, more accented speech than repetition. These results provide additional support for theories integrating cognitive and phonetic processes in speech production and suggest that bilingual speech research must take cognitive factors into account when assessing the structure of non-native sound systems.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Erin Gustafson

Second Language Acquisition Modeling

A Machine Learning Algorithm for Identifying Atopic Dermatitis in Adults from Electronic Health Records

Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production

The influence of lexical selection disruptions on articulation.

Adherence to US Preventive Services Task Force recommendations for breast and cervical cancer screening for women who have a spinal cord injury

Automatic measurement of vowel duration via structured prediction

Evaluating the reading and listening outcomes of beginning‐level Duolingo courses

Phonetic processing of non-native speech in semantic vs non-semantic tasks

Contact Info

Product

Resources

About