Broad phonetic class definition driven by phone confusions

Lopes, Carla; Perdigão, Fernando

doi:10.1186/1687-6180-2012-158

Cited by 10 publications

(4 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The integration of a discriminative classifier with the template-based traditional HMM approach results in better performance for phoneme recognition over the generative HMM-based model (Kaewtip & Alwan, 2019). A hybrid MLP/HMM system (Lopes & Perdigão, 2009) was developed and tested on the TIMIT database, wherein probabilities for each broad group were estimated. The model enhanced the performance by 5% compared to the baseline system.…”

Section: Review Of Hierarchical Approachmentioning

confidence: 99%

A hierarchical automatic phoneme recognition model forHindi‐Devanagariconsonants using machine learning technique

Malakar

Keskar

Zadgaonkar³

2023

Expert Systems

View full text Add to dashboard Cite

A phoneme is perceptually the smallest distinct sound unit distinguished among words in a particular language. Every language has its own set of phonemes, and all the words are ordered sequences of phonemes. Therefore, phoneme recognition is essential to automatic speech recognition (ASR) systems. Phonemes of a language can be classified together using a single machine learning (ML) model through the direct classification (also known as the baseline or flat classification) approach. However, it is observed that the performance of such phoneme recognition degrades with the increase in the number of phoneme classes. The challenge is pronounced in languages with a larger number of phoneme classes, like Hindi, which has 48 phonemes.In this paper, we propose a speaker-independent hierarchical classification approach for 33 Hindi-Devanagari consonants/phonemes using cepstral features with ML techniques like support vector machine (SVM), random forest (RF) and fully connected deep neural network (DNN). In this hierarchical approach, a given phoneme is classified into successive subgroups until the particular phoneme class is identified.To perform the classification task, a binary or multi-class classifier is invoked for each internal (non-leaf) node in the hierarchy tree. Our model identified pairs of Optimal Feature Sets (based on mutual information) and the best suitable ML classifier for each internal decision node in the hierarchy using 10-fold cross-validation to help in efficient classification. Our proposed hierarchical model leads to better accuracy and 57% improved performance for phoneme recognition compared with the non-hierarchical, that is, the direct classification approach.

show abstract

Section: Review Of Hierarchical Approachmentioning

confidence: 99%

A hierarchical automatic phoneme recognition model forHindi‐Devanagariconsonants using machine learning technique

Malakar

Keskar

Zadgaonkar³

2023

Expert Systems

View full text Add to dashboard Cite

show abstract

“…In contrast to the knowledge-based criteria, the data-driven criterion conducts phoneme clustering through the phoneme similarity measured by an ASR. Based on [30], the confusion matrix, M, contains information about the similarities between each pair of phonemes, where the entry Mij denotes the number of the event for phoneme i being mistakenly recognized as phoneme j. A symmetric similarity matrix, S, can be computed from the confusion matrix, M, where Sij, the similarity between phonemes i and j, is computed by (1):…”

Section: Data-driven Criterionmentioning

confidence: 99%

“…This process repeats until the cluster number meets our expectations. In Table 1, we list the clustering results (9 clusters, which are recommended in [30]) obtained by the data-driven criterion on the TIMIT dataset.…”

Section: Data-driven Criterionmentioning

confidence: 99%

Incorporating Broad Phonetic Information for Speech Enhancement

Liao

et al. 2020

Preprint

View full text Add to dashboard Cite

In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals. Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance. To obtain the phonetic information, we usually prepare a phoneme-based acoustic model, which is trained using speech waveforms and phoneme labels. Despite performing well in normal noisy conditions, when operating in very noisy conditions, however, the recognized phonemes may be erroneous and thus misguide the SE process. To overcome the limitation, this study proposes to incorporate the broad phonetic class (BPC) information into the SE process. We have investigated three criteria to build the BPC, including two knowledgebased criteria: place and manner of articulatory and one datadriven criterion. Moreover, the recognition accuracies of BPCs are much higher than that of phonemes, thus providing more accurate phonetic information to guide the SE process under very noisy conditions. Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset.

show abstract

“…A standard tool used in spoken term detection and speech recognition for quantifying variation is the phone confusion matrix [14] [16][17][18][19] which captures the confusion statistics between phones thus providing a way of defining commonalities or groups [20][21][22]. However, a confusion matrix can suffer from data sparseness due to the fact that although some phones may be phonetically similar, only a small number of confusions may be found with one or more other phones.…”

Section: Introductionmentioning

confidence: 99%

Enhancing Data-Driven Phone Confusions Using Restricted Recognition

Kane

Carson-Berndsen

2016

Interspeech 2016

View full text Add to dashboard Cite

This paper presents a novel approach to address data sparseness in standard confusion matrices and demonstrates how enhanced matrices, which capture additional similarities, can impact the performance of spoken term detection. Using the same training data as for the standard phone confusion matrix, an enhanced confusion matrix is created by iteratively restricting the recognition process to exclude one acoustic model per iteration. Since this results in a greater amount of confusion data for each phone, the enhanced confusion matrix encodes more similarities. The enhanced phone confusion matrices perform demonstrably better than standard confusion matrices on a spoken term detection task which uses both HMMs and DNNs.

show abstract

Broad phonetic class definition driven by phone confusions

Cited by 10 publications

References 19 publications

A hierarchical automatic phoneme recognition model forHindi‐Devanagariconsonants using machine learning technique

A hierarchical automatic phoneme recognition model forHindi‐Devanagariconsonants using machine learning technique

Incorporating Broad Phonetic Information for Speech Enhancement

Enhancing Data-Driven Phone Confusions Using Restricted Recognition

Contact Info

Product

Resources

About