D. O’Shaughnessy scite author profile

The focus of a continuous speech recognition process is to match an input signal with a set of words or sentences according to some optimality criteria. The first step of this process is parameterization, whose major task is data reduction by converting the input signal into parameters while preserving virtually all of the speech signal information dealing with the text message. This contribution presents a detailed analysis of a widely used set of parameters, the mel frequency cepstral coefficients (MFCC's), and suggests a new parameterization approach taking into account the whole energy zone in the spectrum. Results obtained with the proposed new coefficients give a confidence interval about their use in a large-vocabulary speaker-independent continuous-speech recognition system.

show abstract

Linear predictive coding

O’Shaughnessy

1988

IEEE Potentials

172

View full text Add to dashboard Cite

Linear predictive coding (LPC) is defined as a digital method for encoding an analog signal in which a particular value is predicted by a linear function of the past values of the signal. It was first proposed as a method for encoding human speech by the United States Department of Defense in federal standard 1015, published in 1984. Human speech is produced in the vocal tract which can be approximated as a variable diameter tube. The linear predictive coding (LPC) model is based on a mathematical approximation of the vocal tract represented by this tube of a varying diameter. At a Particular time, t, the speech sample s(t) is represented as a linear sum of the p previous samples. The most important aspect of LPC is the linear predictive filter which allows the value of the next sample to be determined by a linear combination of previous samples. Under normal circumstances, speech is sampled at 8000 samples/second with 8 bits used to represent each sample. This provides a rate of 64000 bits/second. Linear predictive coding reduces this to 2400 bits/second. At this reduced rate the speech has a distinctive synthetic sound and there is a noticeable loss of quality. However, the speech is still audible and it can still be easily understood. Since there is information loss in linear predictive coding, it is a lossy form of compression.

show abstract

Developments and directions in speech recognition and understanding, Part 1 [DSP Education]

Baker¹,

Deng

Glass³

et al. 2009

IEEE Signal Process. Mag.

181

View full text Add to dashboard Cite

Invited paper: Automatic speech recognition: History, methods and challenges

O’Shaughnessy¹

2008

Pattern Recognition

165

View full text Add to dashboard Cite

Interacting with computers by voice: automatic speech recognition and synthesis

O’Shaughnessy¹

2003

Proc. IEEE

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

D. O’Shaughnessy

Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition

Linear predictive coding

Developments and directions in speech recognition and understanding, Part 1 [DSP Education]

Invited paper: Automatic speech recognition: History, methods and challenges

Interacting with computers by voice: automatic speech recognition and synthesis

Contact Info

Product

Resources

About