Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology

Cucchiarini, Catia; Strik, Helmer; Boves, L.W.J.

doi:10.1121/1.428279

Cited by 189 publications

(173 citation statements)

References 10 publications

Supporting

Mentioning

156

Contrasting

Unclassified

Order By: Relevance

“…Cucchiarini, Strik, and Boves (2000) and Strik and Cucchiarini (1999), for example, have focused on the fluency features of free speech that can be extracted automatically from the output of a typical ASR engine. Their work has been influential in the conceptualization and implementation of relevant fluency features for this effort.…”

Section: Background On Speech Recognition and Scoring Systemsmentioning

confidence: 99%

AUTOMATED SCORING OF SPONTANEOUS SPEECH USING SPEECHRATER^SM V1.0

Xi¹,

Higgins²,

Zechner³

et al. 2008

ETS Research Report Series

View full text Add to dashboard Cite

This report presents the results of a research and development effort for SpeechRater SM Version 1.0 (v1.0), an automated scoring system for the spontaneous speech of English language learners used operationally in the Test of English as a Foreign Language™ (TOEFL ® ) Practice Online assessment (TPO). The report includes a summary of the validity considerations and analyses that drive both the development and the evaluation of the quality of automated scoring. These considerations include perspectives on the construct of interest, the context of use, and the empirical performance of the SpeechRater in relation to both the human scores and the intended use of the scores. The outcomes of this work have implications for short-and long-term goals for iterative improvements to SpeechRater scoring. which is used by prospective test takers to prepare for the official TOEFL iBT test. This study reports the development and validation of the system for low-stakes practice purposes. The process we followed to build this system represented a principled approach to maximizing 2 essential qualities: substantively meaningful and technically sound. In developing and evaluating the features and the scoring models to predict human assigned scores, we engaged both content and technical experts actively to ensure the construct representation and technical soundness of the system. We compared primarily two alternative methodologies of building scoring modelsmultiple regression and classification trees-in terms of their construct representation and empirical performance in predicting human scores. Based on the evaluation results, we concluded that a multiple regression model with feature weights determined by content experts was superior to the other competing models evaluated.We then used an argument-based approach to integrate and evaluate the existing evidence to support the use of SpeechRater SM v1.0 in a low-stakes practice environment. The argumentbased approach provided a mechanism for us to articulate the strengths and weaknesses in the validity argument for using SpeechRater v1.0 and put forward a transparent argument for using it for a low-stakes practice environment. In particular, the construct representation of the multiple regression model with expert weights was sufficiently broad to justify its use in a low-stakes application. While some higher-order aspects of the speaking construct (such as content and organization) are missing, more basic aspects of the construct (such as pronunciation and fluency) are richly represented. In addition, these different parts of the speaking construct tend to be highly correlated, so that the absence of higher order factors is not as detrimental to the model's agreement with human raters as it otherwise might be. The model's agreement with human raters was not sufficiently high to support high-stakes decisions but was still suitable for use in low-stakes applications. The correlation of the 6-item aggregate score with human raters was .57 and was deemed acceptable given the lo...

show abstract

Section: Background On Speech Recognition and Scoring Systemsmentioning

confidence: 99%

AUTOMATED SCORING OF SPONTANEOUS SPEECH USING SPEECHRATER^SM V1.0

Xi¹,

Higgins²,

Zechner³

et al. 2008

ETS Research Report Series

View full text Add to dashboard Cite

show abstract

“…For testing, we used two different conditions: [A] native speech from the same Polyphone database, and [B] non-native speech from the DL2N1 corpus (Dutch as Second Language, Nijmegen corpus 1). The DL2N1 corpus was collected in a previous study (Cucchiarini et al, 2000). In this corpus, 60 non-native speakers called from their home and read 10 Dutch phonetically rich sentences over the telephone.…”

Section: Methodsmentioning

confidence: 99%

“…Duration could be a discriminative cue since plosives are typically short acoustic events, while fricatives tend to be somewhat longer in duration. As non-natives tend to have lower articulation rates and longer segment durations (Cucchiarini et al, 2000) duration was normalized for articulation rate (articulation rate is defined as the number of sounds divided by the total duration of the utterance without internal pauses). Duration normalization per speaker was achieved by computing the product of the articulation rate per speaker and segment duration: duration segment rate on articulati duration normalized × = All acoustic measurements were made in Praat and were based on the same automatic segmentation that was used for the other methods.…”

Section: Methods Lda-apf: Linear Discriminant Analysis With Acoustic-pmentioning

confidence: 99%

Comparing different approaches for automatic pronunciation error detection

Strik

Truong

Wet

et al. 2009

Speech Communication

Self Cite

View full text Add to dashboard Cite

International audienceOne of the biggest challenges in designing computer assisted language learning (CALL) applications that provide automatic feedback on pronunciation errors consists in reliably detecting the pronunciation errors at such a detailed level that the information provided can be useful to learners. In our research we investigate pronunciation errors frequently made by foreigners learning Dutch as a second language. In the present paper we focus on the velar fricative // and the velar plosive /k/. We compare four types of classifiers that can be used to detect erroneous pronunciations of these phones: two acoustic-phonetic classifiers (one of which employs linear-discriminant analysis (LDA)), a classifier based on cepstral coefficients in combination with LDA, and one based on confidence measures (the so-called Goodness Of Pronunciation score). The best results were obtained for the two LDA classifiers which produced accuracy levels of about 85-93%

show abstract

“…L'un des problèmes presque universel pour les locuteurs non natifs est que leur discours est moins fluide en L2 que dans leur L1. Celui-ci est marqué par une vitesse d'élocution plus lente, des pauses plus importantes et plus longues et des disfluences qui peuvent être notamment des hésitations sur le son à produire ou des faux départs (Cucchiarini et al, 2000). Le niveau de compétence dans une langue est hautement corrélé avec la vitesse d'élocution (Gut, 2009).…”

Section: Introductionunclassified