Marie Meteer scite author profile

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as STATEMENT, QUESTION, BACKCHANNEL, AGREEMENT, DIS-AGREEMENT, and APOLOGY. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.

show abstract

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?

Shriberg

et al. 1998

View full text Add to dashboard Cite

Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard corpus. DAs were hand-annotated, and prosodic features (duration, pause, F0, energy, and speaking rate) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. Performance was evaluated for prosody models alone, and after combining the prosody models with word information--either from true words or from the output of an automatic speech recognizer. For an overall classification task, as well as three subtasks, prosody made significant contributions to classification. Feature-specific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DA-specific statistical language model improved performance over that of the language model alone, especially for the case of recognized words. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.

show abstract

Automatic detection of discourse structure for speech recognition and understanding

Jurafsky

Bates

Coccaro

et al.

View full text Add to dashboard Cite

, Carol Van Ess-Dykema (DoD) We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 'Dialog Acts' (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc). We labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of human-to-human telephone conversations with these 42 types and trained a Dialog Act detector based on three distinct knowledge sources: sequences of words which characterize a dialog act, prosodic features which characterize a dialog act, and a statistical Discourse Grammar. Our combined detector, although still in preliminary stages, already achieves a 65% Dialog Act detection rate based on acoustic waveforms, and 72% accuracy based on word transcripts. Using this detector to switch among the 42 Dialog-Act-Specific trigram LMs also gave us an encouraging but not statistically significant reduction in SWBD word error.

show abstract

Analysis of Errors in Dictated Clinical Documents Assisted by Speech Recognition Software and Professional Transcriptionists

et al. 2018

View full text Add to dashboard Cite

Key Points Question How accurate are dictated clinical documents created by speech recognition software, edited by professional medical transcriptionists, and reviewed and signed by physicians? Findings Among 217 clinical notes randomly selected from 2 health care organizations, the error rate was 7.4% in the version generated by speech recognition software, 0.4% after transcriptionist review, and 0.3% in the final version signed by physicians. Among the errors at each stage, 15.8%, 26.9%, and 25.9% involved clinical information, and 5.7%, 8.9%, and 6.4% were clinically significant, respectively. Meaning An observed error rate of more than 7% in speech recognition–generated clinical documents demonstrates the importance of manual editing and review.

show abstract

A clinician survey of using speech recognition for clinical documentation in the electronic health record

Goss

Blackley

Ortega

et al. 2019

International Journal of Medical Informatics

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marie Meteer

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?

Automatic detection of discourse structure for speech recognition and understanding

Analysis of Errors in Dictated Clinical Documents Assisted by Speech Recognition Software and Professional Transcriptionists

A clinician survey of using speech recognition for clinical documentation in the electronic health record

Contact Info

Product

Resources

About