Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing

Seshadri, Shreyas; Remes, Ulpu; Räsänen, Okko

doi:10.21437/interspeech.2017-339

Cited by 3 publications

(2 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2: NED/COV curves on Zerospeech corpora. Our method is represented with a line of blue dots and each other competitors as grey points: Garcia-Granada (A [33]), Jansen (B, [6]), Räsänen (L,M,N [7] and D [8] and G,H [9]) , Kamper (O [4] and P [34]), Lynsinski (I,L,J [10]), Bhati (E,F [11]).…”

Section: Models Architecturesmentioning

confidence: 99%

“…However, such representations are still not good enough for other downstream tasks that need to handle larger word-like units that typically span larger time windows (100ms-1000ms) and have varying duration depending on speech rate. These tasks include word segmentation [4,5] term discovery [6,7,8,9,10,11] and unsupervised alignment of text and speech [12,13] Despite recent research efforts into developing word embeddings also known as speech sequence embeddings (SSE), [14,15,16,17,18] their performances are low and difficult to compare for lack of a common evaluation benchmark.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

Algayres,

Nabli,

Sagot

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce a simple neural encoder architecture that can be trained using an unsupervised contrastive learning objective which gets its positive samples from data-augmented k-Nearest Neighbors search. We show that when built on top of recent self-supervised audio representations [1, 2, 3], this method can be applied iteratively and yield competitive SSE as evaluated on two tasks: query-by-example of random sequences of speech, and spoken term discovery. On both tasks our method pushes the state-of-the-art by a significant margin across 5 different languages. Finally, we establish a benchmark on a query-byexample task on the LibriSpeech dataset to monitor future improvements in the field.

show abstract

Section: Models Architecturesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

Algayres,

Nabli,

Sagot

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Design of mixture of GMMs for Query-by-Example Spoken Term Detection

Madhavi

Patil

2018

Computer Speech & Language

View full text Add to dashboard Cite

Self-Supervised Language Learning From Raw Audio: Lessons From the Zero Resource Speech Challenge

Dunbar

Hamilakis²,

Dupoux³

2022

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees. The contribution of the Zero Resource Speech Challenge series since 2015 has been to break down this long-term objective into four well-defined tasks-Acoustic Unit Discovery, Spoken Term Discovery, Discrete Resynthesis, and Spoken Language Modeling-and introduce associated metrics and benchmarks enabling model comparison and cumulative progress. We present an overview of the six editions of this challenge series since 2015, discuss the lessons learned, and outline the areas which need more work or give puzzling results.

show abstract

Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing

Cited by 3 publications

References 23 publications

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

Design of mixture of GMMs for Query-by-Example Spoken Term Detection

Self-Supervised Language Learning From Raw Audio: Lessons From the Zero Resource Speech Challenge

Contact Info

Product

Resources

About