Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

Siu, Man-Hung; Gish, H.; Chan, Arthur; Belfield, W.; Lowe, Steve

doi:10.1016/j.csl.2013.05.002

Cited by 73 publications

(50 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each speech segment is represented by the mean of frame-level spectral feature vectors (e.g., MFCC), and -means is used to cluster the mean feature vectors. Segmental GMM (SGMM) [22] is another segment labeling method that is commonly used in the literature [7]- [19]. SGMM explicitly models the dynamic trajectory of the spectral features within each segment with a polynomial function of time.…”

Section: A Unsupervised Acoustic Modeling Techniquesmentioning

confidence: 99%

“…In [7], [19], [20], an approach similar to ASM was investigated, and the discovered speech units were referred to as self-organized units (SOUs). As discussed in Section I, the ASM framework consists of three stages.…”

Section: A Unsupervised Acoustic Modeling Techniquesmentioning

confidence: 99%

“…Although supervised training of acoustic models has attained great success for many resource-rich languages (e.g., English, Mandarin), it is not straightforwardly applicable to other languages for which manual transcriptions and linguistic knowledge are difficult to be acquired or even completely absent. In recent years there is an increasing research interest in designing acoustic modeling methods that are less reliant on well-organized training resources [1]- [7].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Acoustic Segment Modeling with Spectral Clustering Methods

Wang

Lee

Leung

et al. 2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

This paper presents a study of spectral clustering-based approaches to acoustic segment modeling (ASM). ASM aims at finding the underlying phoneme-like speech units and building the corresponding acoustic models in the unsupervised setting, where no prior linguistic knowledge and manual transcriptions are available. A typical ASM process involves three stages, namely initial segmentation, segment labeling, and iterative modeling. This work focuses on the improvement of segment labeling. Specifically, we use posterior features as the segment representations, and apply spectral clustering algorithms on the posterior representations. We propose a Gaussian component clustering (GCC) approach and a segment clustering (SC) approach. GCC applies spectral clustering on a set of Gaussian components, and SC applies spectral clustering on a large number of speech segments. Moreover, to exploit the complementary information of different posterior representations, a multiview segment clustering (MSC) approach is proposed. MSC simultaneously utilizes multiple posterior representations to cluster speech segments. To address the computational problem of spectral clustering in dealing with large numbers of speech segments, we use inner product similarity graph and make reformulations to avoid the explicit computation of the affinity matrix and Laplacian matrix. We carried out two sets of experiments for evaluation. First, we evaluated the ASM accuracy on the OGI-MTS dataset, and it was shown that our approach could yield 18.7% relative purity improvement and 15.1% relative NMI improvement compared with the baseline approach. Second, we examined the performances of our approaches in the real application of zero-resource query-by-example spoken term detection on SWS2012 dataset, and it was shown that our approaches could provide consistent improvement on four different testing scenarios with three evaluation metrics.Index Terms-Acoustic segment modeling, multiview segment clustering, sub-word unit discovery, unsupervised training, zeroresource query-by-example spoken term detection.

show abstract

Section: A Unsupervised Acoustic Modeling Techniquesmentioning

confidence: 99%

Section: A Unsupervised Acoustic Modeling Techniquesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Acoustic Segment Modeling with Spectral Clustering Methods

Wang

Lee

Leung

et al. 2015

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Figure 6 shows an example of a two level hierarchical representation of a speech signal. On the first hierarchical level the aim is to discover the acoustic building blocks of speech, the phonemes, and to learn a statistical model for each of them, the acoustic model [11,56,53,47]. In speech recognition, the acoustic model usually consists of Hidden Markov Models (HMMs), where each HMM emits a time series of vectors of cepstral coefficients.…”

Section: Representation Learning From Sequential Datamentioning

confidence: 99%

Autonomous Learning of Representations

et al. 2015

View full text Add to dashboard Cite

Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively.

show abstract

“…A similar model but, without constraints on the topology of the HMMs was studied in [12]. Siu et al [16] first use a segmental GMM (SGMM) to generate a transcription of the data and then iteratively train a standard HMM to improve the transcriptions. Note that the number of allowed states are here defined in advance.…”

Section: Introductionmentioning

confidence: 99%

Partitioning of Posteriorgrams Using Siamese Models for Unsupervised Acoustic Modelling

Myrman¹,

Salvi²

2017

GLU 2017 International Workshop on Grounding Language Understanding

View full text Add to dashboard Cite

Unsupervised methods tend to discover highly speaker-specific representations of speech. We propose a method for improving the quality of posteriorgrams generated from an unsupervised model through partitioning of the latent classes. We do this by training a sparse siamese model to find a linear transformation of the input posteriorgrams to lower-dimensional posteriorgrams. The siamese model makes use of same-category and differentcategory speech fragment pairs obtained by unsupervised term discovery. After training, the model is converted into an exact partitioning of the posteriorgrams. We evaluate the model on the minimal-pair ABX task in the context of the Zero Resource Speech Challenge. We are able to demonstrate that our method significantly reduces the dimensionality of standard Gaussian mixture model posteriorgrams, while still making them more robust to speaker variations. This suggests that the model may be viable as a general post-processing step to improve probabilistic acoustic features obtained by unsupervised learning.

show abstract

Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

Cited by 73 publications

References 5 publications

Acoustic Segment Modeling with Spectral Clustering Methods

Acoustic Segment Modeling with Spectral Clustering Methods

Autonomous Learning of Representations

Partitioning of Posteriorgrams Using Siamese Models for Unsupervised Acoustic Modelling

Contact Info

Product

Resources

About