This paper describes two experiments aimed at exploring the relationship between objective properties of speech and perceived fluency in read and spontaneous speech. The aim is to determine whether such quantitative measures can be used to develop objective fluency tests. Fragments of read speech (Experiment 1) of 60 non-native speakers of Dutch and of spontaneous speech (Experiment 2) of another group of 57 non-native speakers of Dutch were scored for fluency by human raters and were analyzed by means of a continuous speech recognizer to calculate a number of objective measures of speech quality known to be related to perceived fluency. The results show that the objective measures investigated in this study can be employed to predict fluency ratings, but the predictive power of such measures is stronger for read speech than for spontaneous speech. Moreover, the adequacy of the variables to be employed appears to be dependent on the specific type of speech material investigated and the specific task performed by the speaker.
To determine whether expert fluency ratings of read speech can be predicted on the basis of automatically calculated temporal measures of speech quality, an experiment was conducted with read speech of 20 native and 60 non-native speakers of Dutch. The speech material was scored for fluency by nine experts and was then analyzed by means of an automatic speech recognizer in terms of quantitative measures such as speech rate, articulation rate, number and length of pauses, number of dysfluencies, mean length of runs, and phonation/time ratio. The results show that expert ratings of fluency in read speech are reliable (Cronbach's a varies between 0.90 and 0.96) and that these ratings can be predicted on the basis of quantitative measures: for six automatic measures the magnitude of the correlations with the fluency scores varies between 0.81 and 0.93. Rate of speech appears to be the best predictor: correlations vary between 0.90 and 0.93. Two other important determinants of reading fluency are the rate at which speakers articulate the sounds and the number of pauses they make. Apparently, rate of speech is such a good predictor of perceived fluency because it incorporates these two aspects.
In this paper, we examine the relationship between pedagogy and technology in Computer Assisted Pronunciation Training (CAPT) courseware. First, we will analyse available literature on second language pronunciation teaching and learning in order to derive some general guidelines for effective training. Second, we will present an appraisal of various CAPT systems with a view to establishing whether they meet pedagogical requirements. In this respect, we will show that many commercial systems tend to prefer technological novelties to the detriment of pedagogical criteria that could benefit the learner more. While examining the limitations of today's technology, we will consider possible ways to deal with these shortcomings. Finally, we will combine the information thus gathered to suggest some recommendations for future CAPT.
Although the success of automatic speech recognition (ASR)-based Computer Assisted Pronunciation Training (CAPT) systems is increasing, little is known about the pedagogical effectiveness of these systems. This is particularly regrettable because ASR technology still suffers from limitations that may result in the provision of erroneous feedback, possibly leading to learning breakdowns. To study the effectiveness of ASR-based feedback for improving pronunciation, we developed and tested a CAPT system providing automatic feedback on Dutch phonemes that are problematic for adult learners of Dutch. Thirty immigrants who were studying Dutch were assigned to three groups using either the ASR-based CAPT system with automatic feedback, a CAPT system without feedback, or no CAPT system. Pronunciation quality was assessed for each participant before and after the training by human experts who evaluated overall segmental quality and the quality of the phonemes addressed in the training. The participants' impressions of the CAPT system used were also studied through anonymous questionnaires. The results on global segmental quality show that the group receiving ASR-based feedback made the largest mean improvement, but the groups' mean improvements did not differ significantly. The group receiving ASR-based feedback showed a significantly larger improvement than the no-feedback group in the segmental quality of the problematic phonemes targeted.
The current emphasis in second language teaching lies in the achievement of communicative effectiveness. In line with this approach, pronunciation training is nowadays geared towards helping learners avoid serious pronunciation errors, rather than eradicating the finest traces of foreign accent. However, to devise optimal pronunciation training programmes, systematic information on these pronunciation problems is needed, especially in the case of the development of Computer Assisted Pronunciation Training systems. The research reported on in this paper is aimed at obtaining systematic information on segmental pronunciation errors made by learners of Dutch with different mother tongues. In particular, we aimed at identifying errors that are frequent, perceptually salient, persistent, and potentially hampering to communication. To achieve this goal we conducted analyses on different corpora of speech produced by L2 learners under different conditions. This resulted in a robust inventory of pronunciation errors that can be used for designing efficient pronunciation training programs.
International audienceOne of the biggest challenges in designing computer assisted language learning (CALL) applications that provide automatic feedback on pronunciation errors consists in reliably detecting the pronunciation errors at such a detailed level that the information provided can be useful to learners. In our research we investigate pronunciation errors frequently made by foreigners learning Dutch as a second language. In the present paper we focus on the velar fricative // and the velar plosive /k/. We compare four types of classifiers that can be used to detect erroneous pronunciations of these phones: two acoustic-phonetic classifiers (one of which employs linear-discriminant analysis (LDA)), a classifier based on cepstral coefficients in combination with LDA, and one based on confidence measures (the so-called Goodness Of Pronunciation score). The best results were obtained for the two LDA classifiers which produced accuracy levels of about 85-93%
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.