2012
DOI: 10.1186/1687-6180-2012-158
|View full text |Cite
|
Sign up to set email alerts
|

Broad phonetic class definition driven by phone confusions

Abstract: Intermediate representations between the speech signal and phones may be used to improve discrimination among phones that are often confused. These representations are usually found according to broad phonetic classes, which are defined by a phonetician. This article proposes an alternative data-driven method to generate these classes. Phone confusion information from the analysis of the output of a phone recognition system is used to find clusters at high risk of mutual confusion. A metric is defined to compu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…The integration of a discriminative classifier with the template-based traditional HMM approach results in better performance for phoneme recognition over the generative HMM-based model (Kaewtip & Alwan, 2019). A hybrid MLP/HMM system (Lopes & Perdigão, 2009) was developed and tested on the TIMIT database, wherein probabilities for each broad group were estimated. The model enhanced the performance by 5% compared to the baseline system.…”
Section: Review Of Hierarchical Approachmentioning
confidence: 99%
“…The integration of a discriminative classifier with the template-based traditional HMM approach results in better performance for phoneme recognition over the generative HMM-based model (Kaewtip & Alwan, 2019). A hybrid MLP/HMM system (Lopes & Perdigão, 2009) was developed and tested on the TIMIT database, wherein probabilities for each broad group were estimated. The model enhanced the performance by 5% compared to the baseline system.…”
Section: Review Of Hierarchical Approachmentioning
confidence: 99%
“…In contrast to the knowledge-based criteria, the data-driven criterion conducts phoneme clustering through the phoneme similarity measured by an ASR. Based on [30], the confusion matrix, M, contains information about the similarities between each pair of phonemes, where the entry Mij denotes the number of the event for phoneme i being mistakenly recognized as phoneme j. A symmetric similarity matrix, S, can be computed from the confusion matrix, M, where Sij, the similarity between phonemes i and j, is computed by (1):…”
Section: Data-driven Criterionmentioning
confidence: 99%
“…This process repeats until the cluster number meets our expectations. In Table 1, we list the clustering results (9 clusters, which are recommended in [30]) obtained by the data-driven criterion on the TIMIT dataset.…”
Section: Data-driven Criterionmentioning
confidence: 99%
“…A standard tool used in spoken term detection and speech recognition for quantifying variation is the phone confusion matrix [14] [16][17][18][19] which captures the confusion statistics between phones thus providing a way of defining commonalities or groups [20][21][22]. However, a confusion matrix can suffer from data sparseness due to the fact that although some phones may be phonetically similar, only a small number of confusions may be found with one or more other phones.…”
Section: Introductionmentioning
confidence: 99%