Yiu-Pong Lai scite author profile

Yiu-Pong Lai

4Publications

1Citation Statement Received

15Citation Statements Given

How they've been cited

How they cite others

Affiliations

Hong Kong University of Science and Technology, University of Hong Kong

Publications

Order By: Most citations

Joint Optimization of the Frequency-Domain and Time-Domain Transformations in Deriving Generalized Static and Dynamic MFCCs

Lai

Siu

Mak

2006

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Abstract-Traditionally, static mel-frequency cepstral coefficients (MFCCs) are derived by discrete cosine transformation (DCT), and dynamic MFCCs are derived by linear regression. Their derivation may be generalized as a frequency-domain transformation of the log filter-bank energies (FBEs) followed by a time-domain transformation. In the past, these two transformations are usually estimated or optimized separately. In this paper, we consider sequences of log FBEs as a set of spectrogram images, and investigate an image compression technique to jointly optimize the two transformations so that the reconstruction error of the spectrogram images is minimized; there is an efficient algorithm that solves the optimization problem. The framework allows extension to other optimization costs as well.

show abstract

High-density discrete HMM with the use of scalar quantization indexing

Mak

Yeung

Lai

et al. 2005

View full text Add to dashboard Cite

With the advance in semiconductor memory and the availability of very large speech corpora (of hundreds to thousands of hours of speech), we would like to revisit the use of discrete hidden Markov model (DHMM) in automatic speech recognition. To estimate the discrete density in a DHMM state, the acoustic space is divided into bins and one simply count the relative amount of observations falling into each bin. With a very large speech corpus, we believe that the number of bins may be greatly increased to get a much higher density than before, and we will call the new models, the high-density discrete hidden Markov model (HDDHMM). Our HDDHMM is different from traditional DHMM in two aspects: firstly, the codebook will have a size in thousands or even tens of thousands; secondly, we propose a method based on scalar quantization indexing so that for a d-dimensional acoustic vector, the discrete codeword can be determined in O(d) time. During recognition, the state probability is reduced to an O(1) table look-up. The new HDDHMM was tested on WSJ0 with 5K vocabulary. Compared with a baseline 4-stream continuous density HMM system which has a WER of 9.71%, a 4-stream HDDHMM system converted from the former achieves a WER of 11.60%, with no distance or Gaussian computation.

show abstract

Maximum likelihood normalization for robust speech recognition

Lai¹,

Siu²

View full text Add to dashboard Cite

Maximum likelihood normalization for robust speech recognition

Lai¹,

Siu²

2003

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yiu-Pong Lai

Joint Optimization of the Frequency-Domain and Time-Domain Transformations in Deriving Generalized Static and Dynamic MFCCs

High-density discrete HMM with the use of scalar quantization indexing

Maximum likelihood normalization for robust speech recognition

Maximum likelihood normalization for robust speech recognition

Contact Info

Product

Resources

About