Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization

Chung, Chi-Ming; Chan, Chin-Feng; Lee, Lin-Shan

doi:10.1109/icassp.2013.6639239

Cited by 24 publications

(56 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Take inner product as the similarity measure, (24) Similar to (15), the computation of Laplacian matrices in (18) becomes, (25) where, (26) with . Then problem (20) is rewritten as,…”

Section: Multiview Segment Clusteringmentioning

confidence: 99%

“…With this approach, the number of speech units can be estimated automatically. In [26], a three-stage approach involving word-level pattern construction and word-level decoding was proposed. In [6] [27], the problem of unsupervised acoustic modeling was tackled by first discovering large-size units (e.g., words), and performing Gaussian component clustering with top-down constraints.…”

Section: A Unsupervised Acoustic Modeling Techniquesmentioning

confidence: 99%

“…In [9]- [26], segment labeling is performed by vector quantization (VQ). Each segment is represented by the mean of frame-level feature vectors, and the -mean algorithm is used for clustering of the mean feature vectors.…”

Section: B Segment Labelingmentioning

confidence: 99%

See 2 more Smart Citations

Acoustic Segment Modeling with Spectral Clustering Methods

Wang

Lee

Leung

et al. 2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

This paper presents a study of spectral clustering-based approaches to acoustic segment modeling (ASM). ASM aims at finding the underlying phoneme-like speech units and building the corresponding acoustic models in the unsupervised setting, where no prior linguistic knowledge and manual transcriptions are available. A typical ASM process involves three stages, namely initial segmentation, segment labeling, and iterative modeling. This work focuses on the improvement of segment labeling. Specifically, we use posterior features as the segment representations, and apply spectral clustering algorithms on the posterior representations. We propose a Gaussian component clustering (GCC) approach and a segment clustering (SC) approach. GCC applies spectral clustering on a set of Gaussian components, and SC applies spectral clustering on a large number of speech segments. Moreover, to exploit the complementary information of different posterior representations, a multiview segment clustering (MSC) approach is proposed. MSC simultaneously utilizes multiple posterior representations to cluster speech segments. To address the computational problem of spectral clustering in dealing with large numbers of speech segments, we use inner product similarity graph and make reformulations to avoid the explicit computation of the affinity matrix and Laplacian matrix. We carried out two sets of experiments for evaluation. First, we evaluated the ASM accuracy on the OGI-MTS dataset, and it was shown that our approach could yield 18.7% relative purity improvement and 15.1% relative NMI improvement compared with the baseline approach. Second, we examined the performances of our approaches in the real application of zero-resource query-by-example spoken term detection on SWS2012 dataset, and it was shown that our approaches could provide consistent improvement on four different testing scenarios with three evaluation metrics.Index Terms-Acoustic segment modeling, multiview segment clustering, sub-word unit discovery, unsupervised training, zeroresource query-by-example spoken term detection.

show abstract

“…Take inner product as the similarity measure, (24) Similar to (15), the computation of Laplacian matrices in (18) becomes, (25) where, (26) with . Then problem (20) is rewritten as,…”

Section: Multiview Segment Clusteringmentioning

confidence: 99%

Section: A Unsupervised Acoustic Modeling Techniquesmentioning

confidence: 99%

See 1 more Smart Citation

Acoustic Segment Modeling with Spectral Clustering Methods

Wang

Lee

Leung

et al. 2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Alternately, the lexicon development process is weakly-supervised similar to acoustic model development in an ASR system. More recently, in the context of "zero-resourced" ASR system development, there are efforts towards developing methods that are fully unsupervised (Chung et al, 2013;Lee et al, 2015). Such methods are at very early stages and are out of the scope of this paper.…”

Section: Literature Survey On Aswu Derivation and Pronunciation Genermentioning

confidence: 99%

Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models

Razavi

Rasipuram

Magimai.-Doss

2018

Speech Communication

View full text Add to dashboard Cite

State-of-the-art automatic speech recognition and text-to-speech systems are based on subword units, typically phonemes. This necessitates a lexicon that maps each word to a sequence of subword units. Development of a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be always readily available, particularly for under-resourced languages.In such scenarios, an alternative approach is to use a lexicon based on units such as, graphemes or subword units automatically derived from the acoustic data. This article focuses on automatic subword unit based lexicon development using methods that are employed for development of grapheme-based systems.Specifically, we present a novel hidden Markov model (HMM) based formalism for automatic derivation of subword units and pronunciation generation using only transcribed speech data. In this approach, the subword units are derived from the clustered context-dependent units in a grapheme based system using the maximum-likelihood criterion. The subword unit based pronunciations are then generated by learning either a deterministic or a probabilistic relationship between the graphemes and the acoustic subword units (ASWUs). In this article, we first establish the proposed framework on a well resourced language by comparing it against related approaches in the literature and investigating the transferability of the derived subword units to other domains. We then show the scalability of the proposed approach on real under-resourced scenarios by conducting studies on Scottish Gaelic, a genuinely under-resourced language, * Corresponding author Email addresses: marzieh.razavi@idiap.ch (Marzieh Razavi), ramya.murali@gmail.com (Ramya Rasipuram), mathew@idiap.ch (Mathew Magimai.-Doss) Preprint submitted to ElsevierMarch 17, 2017and comparing the approach against state-of-the-art grapheme-based ASR approaches. Our experimental studies on English show that the derived subword units can not only lead to better ASR systems compared to graphemes, but can also be transferred across domains. The experimental studies on Scottish Gaelic show that the proposed ASWU-based lexicon development approach scales without any language specific considerations and leads to better ASR systems compared to a grapheme-based lexicon, including the case where ASR system performance is boosted through the use of acoustic models built with multilingual resources from resource-rich languages.

show abstract

“…which is actually parallel to (11). The superscript n indicates the n-th training utterance and d indicates the number of dimensions of x n t .…”

Section: Baseline: Recurrent Predictor Modelmentioning

confidence: 99%

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries

2017

Self Cite

View full text Add to dashboard Cite

show abstract

Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization

Cited by 24 publications

References 32 publications

Acoustic Segment Modeling with Spectral Clustering Methods

Acoustic Segment Modeling with Spectral Clustering Methods

Towards weakly supervised acoustic subword unit discovery and lexicon development using hidden Markov models

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries

Contact Info

Product

Resources

About