ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414899
|View full text |Cite
|
Sign up to set email alerts
|

A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

Abstract: In this work, we propose a hierarchical subspace model for acoustic unit discovery. In this approach, we frame the task as one of learning embeddings on a low-dimensional phonetic subspace, and simultaneously specify the subspace itself as an embedding on a hyper-subspace. We train the hyper-subspace on a set of transcribed languages and transfer it to the target language. In the target language, we infer both the language and unit embeddings in an unsupervised manner, and in so doing, we simultaneously learn … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 18 publications
0
13
0
Order By: Relevance
“…Finally, several recent studies have attempted a combination of Kempton's approach with the unsupervised clustering approach: cross-lingual ASR is used to annotate the articulatory features of an unknown language, which are then clustered to form unsupervised phonelike units [56,57]. To our knowledge, only two of these papers [71,72] directly evaluated phone inventory NMI or F1; using oracle cluster combination strategies that are standard in the field of unsupervised phone discovery, [72] achieved F1=64.14% for cross-lingual automatic phone inventory estimation.…”
Section: Related Workmentioning
confidence: 99%
“…Finally, several recent studies have attempted a combination of Kempton's approach with the unsupervised clustering approach: cross-lingual ASR is used to annotate the articulatory features of an unknown language, which are then clustered to form unsupervised phonelike units [56,57]. To our knowledge, only two of these papers [71,72] directly evaluated phone inventory NMI or F1; using oracle cluster combination strategies that are standard in the field of unsupervised phone discovery, [72] achieved F1=64.14% for cross-lingual automatic phone inventory estimation.…”
Section: Related Workmentioning
confidence: 99%
“…In [21], a multilingual AUD system is constructed which defines a subspace of AUs which is learned in a supervised way from multilingual data in an attempt to capture the commonalities on what an AU is across different languages. They aim at providing a better prior for the AU learning, while we are concerned with removing speaker dependence.…”
Section: Related Workmentioning
confidence: 99%
“…In the case of TIMIT, the proper phone-level transcriptions are used. For Yoruba and Mboshi, forced alignments were provided by the authors of [21]. The databases for AUD are deliberately chosen as in [21] and [2] to provide comparability.…”
Section: Speech Databasesmentioning
confidence: 99%
See 2 more Smart Citations