2021
DOI: 10.1109/taslp.2020.3042016
|View full text |Cite
|
Sign up to set email alerts
|

Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load

Abstract: The human perception of phonemes is biased against speech sounds. The lack of correspondence between perceputal phonemes and acoustic signals forms a big challenge in designing unsupervised algorithms to distinguish phonemes from sound. We propose the DPGMM-RNN hybrid model that improves phoneme categorization by relieving the fragmentation problem. We also merge segments with low functional load, which is the work done by segment contrasts to differentiate between utterances, just like humans who convert unam… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
15
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(15 citation statements)
references
References 56 publications
0
15
0
Order By: Relevance
“…Such sensitivity makes DPGMM clustering uncertain for assigning clusters to frames and creates small, random cluster segments inside a phoneme. This is DPGMM's "fragmentation problem" [49].…”
Section: Modeling Unsupervised Empirical Adaptation By Dpgmm-rnn Hybr...mentioning
confidence: 99%
See 4 more Smart Citations
“…Such sensitivity makes DPGMM clustering uncertain for assigning clusters to frames and creates small, random cluster segments inside a phoneme. This is DPGMM's "fragmentation problem" [49].…”
Section: Modeling Unsupervised Empirical Adaptation By Dpgmm-rnn Hybr...mentioning
confidence: 99%
“…In unsupervised phoneme discovery, DPGMM tends to suffers from a fragmentation problem when the model encounters the frames from such acoustically complex phonemes as a fricative with noise-like high frequencies or a vowel with rapid formant transitions [49], [50]. DPGMM tends to generate more clusters than the number of phonemes in any human language [30], [50] when it struggles to discriminate between complex phonemes with higher resolution.…”
Section: Modeling Unsupervised Empirical Adaptation By Dpgmm-rnn Hybr...mentioning
confidence: 99%
See 3 more Smart Citations