Discriminative training by iterative linear programming optimization

Mak, Brian; Ng, Bernard

doi:10.1109/icassp.2008.4518546

Cited by 4 publications

(7 citation statements)

References 7 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The ILP method described above is a simple modification of the original version proposed in [52,59], which is found to work well in practice. The major differences compared with the original method are as follows:…”

Section: Difference Compared With the Original Ilp Methodsmentioning

confidence: 99%

“…In conventional DHMM or SCHMM, the stream weights are all set to be one (i.e., the importance of different streams is treated as equal). However, it was shown that setting different weights for different streams in different states may improve the model performance [52,59,33,16].…”

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

“…In this section, we would like to estimate the stream weights for SHDDHMM based on the method in [52,59], which discriminatively trains the weights by iterative linear programming (ILP) optimization. The reasons to choose this ILP method for our SHDDHMM are as follows:…”

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

“…The ILP method has been applied successfully to estimate the stream weights of a 4-stream monophone CDHMM system, with the resulting 4-stream model showing comparable performance to 1-stream CDHMM (See [52,59] for details). However, directly applying it to our more complex triphone SHDDHMM did not show any improvement, which is probably due to that our system and task are more difficult and complex.…”

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

“…In the next section, we introduce our modified ILP method. The key differences compared with the original version (i.e., the one in [52,59]) are also listed.…”

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

See 4 more Smart Citations

The use of discrete distributions with a very large codebook for automatic speech recognition and speaker verification

Mak

View full text Add to dashboard Cite

Chapter 1 Introduction 1 1.1 Problems with Very Large Codebook Discrete Systems 3 1.2 Contributions of the Thesis 4 1.3 Thesis Outline 6 Chapter 2 Introduction to Automatic Speech Recognition (ASR) 8 2.1 Definition of ASR 8 2.1.1 Statistical ASR 9 2.1.2 ASR System Performance Evaluation Criterion 2.2 Hidden Markov Model (HMM) for ASR 2.2.1 Dynamic Features for HMM 2.2.2 Training of and Recognition with HMM in ASR 2.3 Different Types of HMM in ASR 2.3.1 Continuous Density HMM (CDHMM) 2.3.2 Semi-continuous HMM (SCHMM) 2.3.3 Discrete HMM (DHMM) 2.3.4 Multiple-stream HMM v Chapter 3 Related Work in DHMM 3.1 Different Types of Codebooks 3.1.1 Codebook by Unsupervised Construction 3.1.2 Codebook by Supervised Construction LIST OF FIGURES 2.1 Phone HMM /i/: first-order 3-state left-to-right topology 2.2 Word HMM "it": constructed from phone HMMs /i/ and /t/ 2.3 Different types of HMM 2.4 Examples of 2-stream HMM systems 3.1 Codebooks produced by unsupervised and supervised construction. Each method partitions the acoustic space into 2 parts. The dots represent training samples in codebook construction, with colors representing different classes. 4.1 SVQ codebooks, where x t is a d-dimensional feature vector at frame t, and VQ(x t) is its corresponding full space VQ codeword constructed from SVQ or SQ codewords. Notice that when L = d, it becomes SQ codebooks. 4.2 An example of a 2-stream system with SVQ codebooks. Each stream is further split into 2 subvectors with 2 SVQ codebooks. 4.3 Relationship between model size and recognition error rate for HD-DHMM with SQ codebooks 4.4 One-stream SHDDHMM architecture overview. The shaded ellipse represents the global pool of bases, spanning the subspace. Each state output discrete distribution table (i.e., b i , and b j), lies in the subspace. The state-dependent weights (the dotted lines) and the global pool of bases (the shaded ellipse) are temporary parameters, which only exist during model training; the final model only stores b i and b j. 4.5 An example of a 2-stream SHDDHMM, each stream is further split into 2 subvectors. A global pool of bases is stored for each stream independently, and the number of bases in a pool is 3. 4.6 Stream weight estimation results for SHDDHMM with iterative linear programming 4.7 Operating characteristics of various SI-84 models (finding codeword time is included for discrete models.) 4.8 An example of "smoothing by adding 1" technique 5.1 An example of computer access control using speaker verification technology 5.2 GMM-UBM based speaker verification system 5.3 SV operating characteristics of various models, when the UBM and speaker models are pre-loaded to the memory. 5.4 SV operating characteristics of various models, when the UBM and speaker models are loaded on-the-fly. ix LIST OF TABLES 4.1 Experimental settings for different ASR systems 29 4.2 Bit allocation for each stream for HDDHMM in [53] 38 4.3 Baseline model performance on WSJ SI-84 42 4.4 Effect of codebook size for 4s-HDDHMM with SQ codebooks 44 4.5 Comparison of different inte...

show abstract

Section: Difference Compared With the Original Ilp Methodsmentioning

confidence: 99%

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

“…In the next section, we introduce our modified ILP method. The key differences compared with the original version (i.e., the one in [52,59]) are also listed.…”

Section: Stream Weight Estimation By Iterative Linear Programming (Ilp)mentioning

confidence: 99%

See 3 more Smart Citations

The use of discrete distributions with a very large codebook for automatic speech recognition and speaker verification

Mak

View full text Add to dashboard Cite

show abstract

Discriminative training of stream weights in a multi-stream HMM as a linear programming problem

Lun¹

View full text Add to dashboard Cite

Min-max discriminative training of decoding parameters using iterative linear programming

Mak

Ko²

2008

Interspeech 2008

Self Cite

View full text Add to dashboard Cite

In automatic speech recognition, the decoding parametersgrammar factor and word insertion penalty -are usually handtuned to give the best recognition performance. This paper investigates an automatic procedure to determine their values using an iterative linear programming (LP) algorithm. LP naturally implements discriminative training by mapping linear discriminants into LP constraints. A min-max cost function is also defined to get more stable and robust result. Empirical evaluations on the RM1 and WSJ0 speech recognition tasks show that decoding parameters found by the proposed algorithm are as good as those found by a brute-force grid search; their optimal values also seem to be independent of the initial values set to start the iterative LP algorithm.

show abstract

Discriminative training by iterative linear programming optimization

Cited by 4 publications

References 7 publications

The use of discrete distributions with a very large codebook for automatic speech recognition and speaker verification

The use of discrete distributions with a very large codebook for automatic speech recognition and speaker verification

Discriminative training of stream weights in a multi-stream HMM as a linear programming problem

Min-max discriminative training of decoding parameters using iterative linear programming

Contact Info

Product

Resources

About