Dialog act classification with the help of prosody

Mast, Marion; Kompe, Ralf; Harbeck, Stefan; Kießling, Andreas; Niemann, Heinrich; Nöth, Elmar; Schukat-Talamazzini, Ernst Günter; Warnke, Volker

doi:10.1109/icslp.1996.607962

Cited by 38 publications

(36 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A crucial aspect of our work, as well as that of some other researchers [6,5] is that the dependence between prosodic features and target classes (e.g., dialog acts, phrase boundaries) is modeled directly in a statistical classifier-without the use of intermediate abstract phonological categories, such as pitch accent or boundary tone labels. This bypasses the need to hand-annotate such labels for training purposes, avoids problems of annotation reliability, and allows the model to choose the level of granularity of the representation that is best suited for the task [2].…”

Section: Direct Modeling Of Target Classesmentioning

confidence: 99%

Prosody Modeling for Automatic Speech Recognition and Understanding

Shriberg

Stolcke

2004

Mathematical Foundations of Speech and Language Processing

View full text Add to dashboard Cite

Abstract. This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automatic sentence segmentation and disfluency detection, topic segmentation, dialog act labeling, and word recognition.Key words. Prosody, speech recognition and understanding, hidden Markov models. Introduction.Prosody has long been studied as an important knowledge source for speech understanding. In recent years there has been a large amount of computational work aimed at prosodic modeling for automatic speech recognition and understanding. 1 Whereas most current approaches to speech processing model only the words, prosody provides an additional knowledge source that is inherent in, and exclusive to, spoken language. It can therefore provide additional information that is not directly available from text alone, and also serves as a partially redundant knowledge source that may help overcome the errors resulting from faulty word recognition.In this paper, we summarize recent work at SRI International in the area of computational prosody modeling, and results from several recognition tasks where prosodic knowledge proved to be of help. We present only a high-level perspective and summary of our research; for details the reader is referred to publications cited.

show abstract

Section: Direct Modeling Of Target Classesmentioning

confidence: 99%

Prosody Modeling for Automatic Speech Recognition and Understanding

Shriberg

Stolcke

2004

Mathematical Foundations of Speech and Language Processing

View full text Add to dashboard Cite

show abstract

“…For example, House carried out extensive studies on Swedish (e.g., [14]), extending them with some multimodal aspects (e.g., [15]). Much hope was risen by early works on the prosodic properties of the realisations of selected dialogue acts [16,17,18].…”

Section: Dialogue Acts and Prosodymentioning

confidence: 99%

Preliminary Prosodic and Gestural Characteristics of Instructing Acts in Polish Task-Oriented Dialogues

Karpiński

2009

Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions

View full text Add to dashboard Cite

Abstract. In the present study, selected properties of multimodal instructing acts are discussed. Realisations of the instructing acts extracted from a corpus of task-oriented dialogues are analysed in terms of their syntactic structure, prosodic properties and accompanying gestures. The syntactic structures found in the material are similar to those found in earlier studies on map task dialogues. Deictic vocabulary is more frequent in gesture-supported instructions. The mean relative pitch range is similar to the values obtained for instructions in earlier studies and different from the values for syntactically similar questions. As opposite to verbally ill-formed instructions, the wellformed ones tend to contain at least one gestural stroke. It is shown that the relative range of pitch frequency is higher in the gesture-accompanied instructing acts. It is also noticed that prosody and gesture may play similar roles in utterances.Keywords: task-oriented dialogue, instructing acts, gesture, intonation Multimodal Dialogue Analysis and Dialogue ActsThe aim of the present study is to formulate a preliminary description of multimodal instructing dialogue acts in Polish task-oriented dialogues. The analysis is focused on the intonational features of utterances and hand gestures but it also includes some aspects of syntactic and lexical realisation. This research forms a part of the DiaGest project [1] confessed to the study of gestural, prosodic, grammatical and lexical components of task-oriented dialogues. It is meant as a step towards an applicationoriented holistic, multidisciplinary study of dialogue.The idea of a comprehensive approach to the studies on interpersonal communication, in which all of its relevant aspects would be paid the attention they deserve is not new. However, it gained more influence in the twentieth century with the contributions of great philosophers of language, psycho-and sociolinguistics and other researchers interested both in interpersonal and man-machine communication. Also the impact of technology cannot be overestimated. It created demand for formal models of human behaviour, simultaneously providing means for capturing and analysing it.

show abstract

“…In [46], prosody is used to segment utterances. The duration, pause, F0-contour and energy features are used in [13] and [47].…”

Section: Related Workmentioning

confidence: 99%

Automatic dialogue act recognition with syntactic features

Král

Cerisara

2014

Lang Resources & Evaluation

View full text Add to dashboard Cite

This work studies the usefulness of syntactic information in the context of automatic dialogue act recognition in Czech. Several pieces of evidence are presented in this work that support our claim that syntax might bring valuable information for dialogue act recognition. In particular, a parallel is drawn with the related domain of automatic punctuation generation and a set of syntactic features derived from a deep parse tree is further proposed and successfully used in a Czech dialogue act recognition system based on Conditional Random Fields. We finally discuss the possible reasons why so few works have exploited this type of information before and propose future research directions to further progress in this area.

show abstract

Dialog act classification with the help of prosody

Cited by 38 publications

References 8 publications

Prosody Modeling for Automatic Speech Recognition and Understanding

Prosody Modeling for Automatic Speech Recognition and Understanding

Preliminary Prosodic and Gestural Characteristics of Instructing Acts in Polish Task-Oriented Dialogues

Automatic dialogue act recognition with syntactic features

Contact Info

Product

Resources

About