Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection

Huang, Hao; Xu, Haihua; Wang, Xianhui; Silamu, Wushour

doi:10.1109/taslp.2015.2409733

Cited by 100 publications

(46 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this paper, we explore to learn the DNN-HMM based acoustic models, as well as the decision function, with a discriminative objective that is directly linked to the ultimate evaluation metric of mispronunciation detection. Here we take the F1-score for investigation, since it was frequently adopted as the evaluation metric in previous work on mispronunciation detection [22][23][24]. Further, in this paper, the parameters of the decision function is set to be either phone-or senonedependent when the phone-level (cf.…”

Section: Maximum F1-score Criterion Trainingmentioning

confidence: 99%

“…Yet there still are a wide array of studies that capitalize on various acoustic and prosodic cues, confidence measures and speaking-style information, to name just a few, for use in mispronunciation detection. Interested readers may also refer to [13][14][15][16][17] for comprehensive and enjoyable overviews of state-of-the-art methods that have been successfully developed and applied to various mispronunciation detection tasks.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions

et al. 2016

View full text Add to dashboard Cite

Mispronunciation detection is part and parcel of a computer assisted pronunciation training (CAPT) system, facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This paper presents a continuation of such a general line of research and the major contributions are twofold. First, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Second, along the same vein, two disparate logistic sigmoid based decision functions with either phone-or senone-dependent parameterization are also inferred and used for enhanced mispronunciation detection. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed method.

show abstract

Section: Maximum F1-score Criterion Trainingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions

et al. 2016

View full text Add to dashboard Cite

show abstract

“…Precision and Recall are measurements originated from Information Recovery and used in Classification when working with non-balanced classes. Precision is the percentage of instances that were correctly classified as positive among all of the data that were classified as positive, while Recall is the percentage of instances that were correctly classified as positive among the ones that really were positive, and F1-score is the harmonic mean between precision and recall [26]. The advantage of the F1-score is that it offers only one quality metric, facilitating a better understanding for end users.…”

Section: Assessment Of Classification Resultsconfusion Matrixmentioning

confidence: 99%

Pattern Recognition in Cattle Brand using Bag of Visual Words and Support Vector Machines Multi-Class

Silva

Welfer

Dornelles

2018

View full text Add to dashboard Cite

The recognition images of cattle brand in an automatic way is a necessity to governmental organs responsible for this activity. To help this process, this work presents a method that consists in using Bag of Visual Words for extracting of characteristics from images of cattle brand and Support Vector Machines Multi-Class for classification. This method consists of six stages: a) select database of images; b) extract points of interest (SURF); c) create vocabulary (K-means); d) create vector of image characteristics (visual words); e) train and sort images (SVM); f) evaluate the classification results. The accuracy of the method was tested on database of municipal city hall, where it achieved satisfactory results, reporting 86.02% of accuracy and 56.705 seconds of processing time, respectively.

show abstract

“…The number of selected features was constant: N = 100. In order to compare our method with the old one, we used F 1 score [11] of SVM classifier.…”

Section: Methodsmentioning

confidence: 99%

MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing

Isaev

Smetannikov

2016

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

show abstract

Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection

Cited by 100 publications

References 32 publications

Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions

Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions

Pattern Recognition in Cattle Brand using Bag of Visual Words and Support Vector Machines Multi-Class

MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing

Contact Info

Product

Resources

About