Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1664
| View full text |Cite
|
Sign up to set email alerts
|

Abstract: This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data. Past studies usually proposed single-step approaches. We propose a twostage approach: the first stage learns a subword-discriminative feature representation, and the second stage applies clustering to the learned representation and obtains phone-like clusters as the discovered acoustic units. In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…While APC is not a model of infant learning per se, it is one of the simplest and hence perhaps the most easy to understand of modern machine learning methods for unsupervised learning. The latent representations APC learns have been shown to be applicable to a range of speech tasks, including phone classification (Chung et al, 2019) and subword modeling (Feng, Żelasko, Moro-Velázquez, & Scharenborg, 2021; see also Yang et al (2021)). Central to modeling purposes, APC learning criterion aligns with the idea of predictive processing in human perception (e.g., Friston, 2010;Rao & Ballard, 1999;Babineau, Havron, Dautriche, de Carvalho, & Christophe, 2022) and a similar learning mechanism can thereby be assumed to be available to any mammalian learner.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…While APC is not a model of infant learning per se, it is one of the simplest and hence perhaps the most easy to understand of modern machine learning methods for unsupervised learning. The latent representations APC learns have been shown to be applicable to a range of speech tasks, including phone classification (Chung et al, 2019) and subword modeling (Feng, Żelasko, Moro-Velázquez, & Scharenborg, 2021; see also Yang et al (2021)). Central to modeling purposes, APC learning criterion aligns with the idea of predictive processing in human perception (e.g., Friston, 2010;Rao & Ballard, 1999;Babineau, Havron, Dautriche, de Carvalho, & Christophe, 2022) and a similar learning mechanism can thereby be assumed to be available to any mammalian learner.…”
Section: Methodsmentioning
confidence: 99%
“…While APC is not a model of infant learning per se, it is one of the simplest and hence perhaps the most easy to understand of modern machine learning methods for unsupervised learning. The latent representations APC learns have been shown to be applicable to a range of speech tasks, including phone classification (Chung et al., 2019) and subword modeling (Feng, Żelasko, Moro‐Velázquez, & Scharenborg, 2021; see also Yang et al. (2021)).…”
Section: Methodsmentioning
confidence: 99%
“…Finally, several recent studies have attempted a combination of Kempton's approach with the unsupervised clustering approach: cross-lingual ASR is used to annotate the articulatory features of an unknown language, which are then clustered to form unsupervised phonelike units [56,57]. To our knowledge, only two of these papers [71,72] directly evaluated phone inventory NMI or F1; using oracle cluster combination strategies that are standard in the field of unsupervised phone discovery, [72] achieved F1=64.14% for cross-lingual automatic phone inventory estimation.…”
Section: Related Workmentioning
confidence: 99%
“…In the last decade, unsupervised learning of acoustic models for ASR has gained increasing research interests [2]- [4]. It aims at discovering [2], [5], [6] (also referred to as acoustic unit discovery; AUD) or modeling [3], [4], [7] (also referred to as unsupervised subword modeling; USM) a set of basic speech units that represents all the sounds in the language in a zero-resource scenario, i.e., with only untranscribed data available. This research field aims to pave the way to developing high-performance ASR systems for languages that have very limited or no transcribed data.…”
Section: Introductionmentioning
confidence: 99%