Bayesian Models for Unit Discovery on a Very Low Resource Language

Ondel, Lucas; Godard, Pierre; Besacier, Laurent; Larsen, Elin; Hasegawa‐Johnson, Mark; Scharenborg, Odette; Dupoux, Emmanuel; Burget, Lukáš; Yvon, Francois; Khudanpur, Sanjeev

doi:10.1109/icassp.2018.8461545

Cited by 16 publications

(15 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are two main research strands in UAM. The first strand formulates the problem as discovering a finite set of phoneme-like speech units [5], [6], [12]. This is often referred to as acoustic unit/model discovery (AUD) [5], [8].…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Feng¹,

Scharenborg²

2020

Interspeech 2020

Self Cite

View full text Add to dashboard Cite

This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations that can distinguish between subword units of a language. We propose a two-stage learning framework that combines self-supervised learning and cross-lingual knowledge transfer. The framework consists of autoregressive predictive coding (APC) as the front-end and a cross-lingual deep neural network (DNN) as the back-end. Experiments on the ABX subword discriminability task conducted with the Libri-light and ZeroSpeech 2017 databases showed that our approach is competitive or superior to state-of-the-art studies. Comprehensive and systematic analyses at the phoneme-and articulatory feature (AF)-level showed that our approach was better at capturing diphthong than monophthong vowel information, while also differences in the amount of information captured for different types of consonants were observed. Moreover, a positive correlation was found between the effectiveness of the back-end in capturing a phoneme's information and the quality of the cross-lingual phone labels assigned to the phoneme. The AF-level analysis together with t-SNE visualization results showed that the proposed approach is better than MFCC and APC features in capturing manner and place of articulation information, vowel height, and backness information. Taken together, the analyses showed that the two stages in our approach are both effective in capturing phoneme and AF information. Nevertheless, monophthong vowel information is less well captured than consonant information, which suggests that future research should focus on improving capturing monophthong vowel information.

show abstract

Section: Introductionmentioning

confidence: 99%

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Feng¹,

Scharenborg²

2020

Interspeech 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…The base measure p(η) defines a prior probability that a soundrepresented by an HMM with parameters η-is an acoustic unit. Earlier works on Bayesian AUD [8,9,15,16] use exponential family distributions as the base measure. These distributions, while mathematically convenient since they form conjugate priors, do not incorporate any knowledge about phones.…”

Section: Problem Definitionmentioning

confidence: 99%

A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

Yusuf

Ondel

Burget

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

In this work, we propose a hierarchical subspace model for acoustic unit discovery. In this approach, we frame the task as one of learning embeddings on a low-dimensional phonetic subspace, and simultaneously specify the subspace itself as an embedding on a hyper-subspace. We train the hyper-subspace on a set of transcribed languages and transfer it to the target language. In the target language, we infer both the language and unit embeddings in an unsupervised manner, and in so doing, we simultaneously learn a subspace of units specific to that language and the units that dwell on it. We conduct experiments on TIMIT and two low-resource languages: Mboshi and Yoruba. Results show that our model outperforms major acoustic unit discovery techniques, both in terms of clustering quality and segmentation accuracy.

show abstract

“…2 Toplines and baselines. A baseline system is provided, consisting of a pipeline with a nonparametric Bayesian acoustic unit discovery system [6,7], and a parametric speech synthesizer based on Merlin [8]. As linguistic features, we use contextual information (leading and preceding phones, number of preceding and following phones in current sentence), but no features related to prosody, articulatory features (vowel, nasal, and so on), or part-of-speech (noun, verb, adjective, and so on).…”

Section: Unsupervised Unit Discovery For Speech Synthesismentioning

confidence: 99%

The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units

Dunbar¹,

Karadayi²,

Bernard³

et al. 2020

Interspeech 2020

Self Cite

View full text Add to dashboard Cite

We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speech synthesis; the second one is to discover word-like units from unsegmented raw speech. We present the results of the twenty submitted models and discuss the implications of the main findings for unsupervised speech learning.

show abstract

Bayesian Models for Unit Discovery on a Very Low Resource Language

Cited by 16 publications

References 14 publications

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units

Contact Info

Product

Resources

About