i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition

Behravan, Hamid; Hautamäki, Ville; Siniscalchi, Sabato Marco; Kinnunen, Tomi; Lee, Chin‐Hui

doi:10.1109/taslp.2015.2489558

Cited by 31 publications

(31 citation statements)

References 36 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this regard, the use of a simple approach based solely on mean and standard deviation of the short time features has shown good performance compared to the more complex BoAW. More complex approaches using contextual information such as ivectors [39] could also be explored. However, the particular context (continuous, smartphone-based monitoring with low battery consumption) would not benefit from this approach.…”

Section: Discussionmentioning

confidence: 99%

A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

Monge-Álvarez

Hoyos-Barceló

San‐José‐Revuelta

et al. 2019

IEEE Trans. Biomed. Eng.

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

Monge-Álvarez

Hoyos-Barceló

San‐José‐Revuelta

et al. 2019

IEEE Trans. Biomed. Eng.

View full text Add to dashboard Cite

“…current version of GLAFF-IT counts 37, 320 lemmas for 457, 702 wordforms and includes nouns, verbs, adjectives and adverbs. 2 Each entry of the lexicon includes a wordform, a tag in MULTEXT-GRACE format [13] specifying the main syntactic category and inflection features, a lemma and API phonological transcriptions with the stress placement when present in GLAW-IT. An extract of GLAFF-IT is reported in Figure 2.…”

Section: From Glaw-it To Glaff-itmentioning

confidence: 99%

“…We use a model that has learned the phonological contexts for stressed and unstressed Italian vowels. Orthographic and phonological context-based approaches have been extensively used in the text-to-speech domain for stress detection [2,7] and for accenting unknown words in a specialised language [18]. The rationale behind our approach is that the exploitation of the phonological neighbourhood of a vowel helps estimate its probability of being stressed or unstressed.…”

Section: Machine Learningmentioning

confidence: 99%

Hybrid Method for Stress Prediction Applied to GLAFF-IT, a Large-Scale Italian Lexicon

Calderone

Pascoli

Sajous

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

International audienceThis paper presents a hybrid method for automatic stress prediction that we apply to GLAFF-IT, a large-scale Italian lexicon we extracted from GLAW-IT, a Machine-Readable Dictionary grounded on Wikizionario. Our approach combines heuristic rules and a logistic model trained on the words' sets of phono-logical features. This model reaches a 98.1% accuracy. The resulting resource is a large lexicon for the Italian language that we release under a free licence. It includes morphological and phonological information for each of its 457, 702 entries. As of today, it is the only Italian lexicon featuring both large coverage and indication of stress position

show abstract

“…Approaches employed so far in the literature for L1 classification include i-vector modelling [14], GMMs trained on MFCCs [15], and prosodic [16] approaches, with varying degrees of success. The approach investigated in this paper predicts, from recordings of spontaneous speech, the speaker's native language (L1) from among 21 different languages and, in the case of Spanish speakers, their country of origin from among three countries.…”

Section: Introductionmentioning

confidence: 99%

Automatic Characterisation of the Pronunciation of Non-native English Speakers using Phone Distance Features

Kyriakopoulos

Gales

Knill

2017

7th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2017)

View full text Add to dashboard Cite

The distances between and relative movements of phones in acoustic space in language learners have been shown to be indicative of the speaker's proficiency, in a way that is compact and independent of bias-inducing voice qualities. Typically these features are based on known transcriptions, "read aloud" style tasks. This paper examines the information that can be extracted about speakers from phone distance features (PDFs) when the transcription is unknown. Here, phone distances are obtained by measuring the relative entropy between a distribution trained on the speaker's manner of pronunciation of each of the phones of the English language and distributions trained on each of the other phones. These features are extracted from untranscribed audio and so rely on automatic speech recognition (ASR) output. The ASR can have high word error rates, as spontaneous, non-native speech is being recognised. Two forms of speaker characterisation are examined using these features: first, the use of PDFs to predict the speaker's proficiency and second, their use in classifying the mother tongue (L1) of the speaker. For both tasks, recorded answers to sections of the BULATS English Speaking test were used. Using only PDFs for predicting the grade within a Gaussian Process based grader showed performance comparable to using a range of standard fluency style features. This indicates the robustness of PDFs to errors in ASR output. Additionally, the same PDF features can detect with high accuracy the L1 of the speakers from among 21 L1s using a deep neural network based classifier. Experiments on South American Spanish show that it is further possible to discriminate between the speakers' countries of origin.

show abstract

i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition

Cited by 31 publications

References 36 publications

A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

Hybrid Method for Stress Prediction Applied to GLAFF-IT, a Large-Scale Italian Lexicon

Automatic Characterisation of the Pronunciation of Non-native English Speakers using Phone Distance Features

Contact Info

Product

Resources

About