Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-339
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing

Abstract: Zero-resource speech processing (ZS) systems aim to learn structural representations of speech without access to labeled data. A starting point for these systems is the extraction of syllable tokens utilizing the rhythmic structure of a speech signal. Several recent ZS systems have therefore focused on clustering such syllable tokens into linguistically meaningful units. These systems have so far used heuristically set number of clusters, which can, however, be highly dataset dependent and cannot be optimized … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…Figure 2: NED/COV curves on Zerospeech corpora. Our method is represented with a line of blue dots and each other competitors as grey points: Garcia-Granada (A [33]), Jansen (B, [6]), Räsänen (L,M,N [7] and D [8] and G,H [9]) , Kamper (O [4] and P [34]), Lynsinski (I,L,J [10]), Bhati (E,F [11]).…”
Section: Models Architecturesmentioning
confidence: 99%
See 1 more Smart Citation
“…Figure 2: NED/COV curves on Zerospeech corpora. Our method is represented with a line of blue dots and each other competitors as grey points: Garcia-Granada (A [33]), Jansen (B, [6]), Räsänen (L,M,N [7] and D [8] and G,H [9]) , Kamper (O [4] and P [34]), Lynsinski (I,L,J [10]), Bhati (E,F [11]).…”
Section: Models Architecturesmentioning
confidence: 99%
“…However, such representations are still not good enough for other downstream tasks that need to handle larger word-like units that typically span larger time windows (100ms-1000ms) and have varying duration depending on speech rate. These tasks include word segmentation [4,5] term discovery [6,7,8,9,10,11] and unsupervised alignment of text and speech [12,13] Despite recent research efforts into developing word embeddings also known as speech sequence embeddings (SSE), [14,15,16,17,18] their performances are low and difficult to compare for lack of a common evaluation benchmark.…”
Section: Introductionmentioning
confidence: 99%