Abstract. This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automatic sentence segmentation and disfluency detection, topic segmentation, dialog act labeling, and word recognition.Key words. Prosody, speech recognition and understanding, hidden Markov models.
Introduction.Prosody has long been studied as an important knowledge source for speech understanding. In recent years there has been a large amount of computational work aimed at prosodic modeling for automatic speech recognition and understanding. 1 Whereas most current approaches to speech processing model only the words, prosody provides an additional knowledge source that is inherent in, and exclusive to, spoken language. It can therefore provide additional information that is not directly available from text alone, and also serves as a partially redundant knowledge source that may help overcome the errors resulting from faulty word recognition.In this paper, we summarize recent work at SRI International in the area of computational prosody modeling, and results from several recognition tasks where prosodic knowledge proved to be of help. We present only a high-level perspective and summary of our research; for details the reader is referred to publications cited.