Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-489
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Data-Driven Phone Confusions Using Restricted Recognition

Abstract: This paper presents a novel approach to address data sparseness in standard confusion matrices and demonstrates how enhanced matrices, which capture additional similarities, can impact the performance of spoken term detection. Using the same training data as for the standard phone confusion matrix, an enhanced confusion matrix is created by iteratively restricting the recognition process to exclude one acoustic model per iteration. Since this results in a greater amount of confusion data for each phone, the en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…O'Neill and Carson-Berndsen (2019) demonstrate that embeddings derived purely from text using a grapheme-to-phoneme mapping and applying a word2vec approach exhibit similarity between phoneme classes. These phoneme embeddings were subsequently integrated with the data-driven acoustic similarities of Kane and Carson-Berndsen (2016) to generate a similarity matrix for use in phonemically driven spell checking (O'Neill et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…O'Neill and Carson-Berndsen (2019) demonstrate that embeddings derived purely from text using a grapheme-to-phoneme mapping and applying a word2vec approach exhibit similarity between phoneme classes. These phoneme embeddings were subsequently integrated with the data-driven acoustic similarities of Kane and Carson-Berndsen (2016) to generate a similarity matrix for use in phonemically driven spell checking (O'Neill et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…The costs of these operations are taken from a phoneme distance matrix that models the similarity between phonemes. Since similarity can be considered a function of confusability (Gallagher and Graff, 2012), this distance matrix was generated based on the confusability of phonemes both acoustically and distributionally (Kane and Carson-Berndsen, 2016;O'Neill and Carson-Berndsen, 2019). If two phonemes are likely to be confused, then they are considered highly similar and thus have a low distance score and low substitution cost.…”
Section: S-capade's Phoneme Distance Matrixmentioning
confidence: 99%