2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461545
| View full text |Cite
|
Sign up to set email alerts
|

Abstract: Developing speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ experiments. Our work applies state-of-the-art Bayesian models to unsupervised Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also show that Bayesian models can naturally integrate information from other resourceful languages by means of informative prio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4

Relationship

3
1

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 14 publications
(34 reference statements)
0
14
0
Order By: Relevance
“…There are two main research strands in UAM. The first strand formulates the problem as discovering a finite set of phoneme-like speech units [5], [6], [12]. This is often referred to as acoustic unit/model discovery (AUD) [5], [8].…”
Section: Introductionmentioning
confidence: 99%
“…The base measure p(η) defines a prior probability that a soundrepresented by an HMM with parameters η-is an acoustic unit. Earlier works on Bayesian AUD [8,9,15,16] use exponential family distributions as the base measure. These distributions, while mathematically convenient since they form conjugate priors, do not incorporate any knowledge about phones.…”
Section: Problem Definitionmentioning
confidence: 99%
“…2 Toplines and baselines. A baseline system is provided, consisting of a pipeline with a nonparametric Bayesian acoustic unit discovery system [6,7], and a parametric speech synthesizer based on Merlin [8]. As linguistic features, we use contextual information (leading and preceding phones, number of preceding and following phones in current sentence), but no features related to prosody, articulatory features (vowel, nasal, and so on), or part-of-speech (noun, verb, adjective, and so on).…”
Section: Unsupervised Unit Discovery For Speech Synthesismentioning
confidence: 99%
“…Methods that imitate child language acquisition often begin by finding recurring patterns in audio [1,4]. Non-parametric Bayesian hidden Markov models (HMMs) have been widely used in word-unit discovery and various other clustering problem with audio, e.g., a latent Dirichlet process with HMM acoustic models can be used to jointly segment and cluster raw audio into sub-word units [5,6], or the HMM can be regularized using an L-p norm as sparsity constraint to encourage purer clusters [2]. Using word embeddings as features, it is possible to perform automatic word discovery by modeling each word as a Gaussian mixture model with a Dirichlet prior on its parameters; the model can be trained using expectation maximization (EM), or using a weighted K-means algorithm [3].…”
Section: Introductionmentioning
confidence: 99%
“…The same task has been attempted [14] using NMT with attention [15] to align speech or phone sequences to the word labels of the high-resourced language; modifications of the attention mechanism to ensure coverage and richer context. If the true phone sequence in the under-resourced language is unknown, pseudo-phone labels generated by an unsupervised non-parametric Bayesian model [6] can be used as input to the NMT [16].…”
Section: Introductionmentioning
confidence: 99%