A Spoken Dialog System for Chat-Like Conversations Considering Response Timing

Nishimura, Ryota; Kitaoka, Norihide; Nakagawa, Seiichi

doi:10.1007/978-3-540-74628-7_77

Cited by 23 publications

(20 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nishimura et al [31] present a unimodal decision-tree approach for producing backchannels based on prosodic features. The system analyzes speech in 100 ms intervals and generates backchannels as well as other paralinguistic cues (e.g., turn taking) as a function of pitch and power contours.…”

Section: Previous Workmentioning

confidence: 99%

A probabilistic multimodal approach for predicting listener backchannels

Morency

Kok

Gratch

2009

Auton Agent Multi-Agent Syst

121

View full text Add to dashboard Cite

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.Keywords Listener backchannel feedback · Nonverbal behavior prediction · Sequential probabilistic model · Conditional random field · Head nod · Multimodal

show abstract

Section: Previous Workmentioning

confidence: 99%

A probabilistic multimodal approach for predicting listener backchannels

Morency

Kok

Gratch

2009

Auton Agent Multi-Agent Syst

121

View full text Add to dashboard Cite

show abstract

“…Kitaoka et al used first-order regression coefficients of pitch and power contours to describe patterns and generate response timing [29]. Nishimura et al pointed out that both the last short regions and the longer ones contained information which triggered backchannel responses [30]. Thus, first-order regression coefficients of pitch and power contours in both the last 90ms and the last 500ms of the utterances are adopted to represent the changing trend of prosody.…”

Section: Prosodic Featuresmentioning

confidence: 99%

Backchannel Prediction for Mandarin Human-Computer Interaction

Mao

Peng

Xue

et al. 2015

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYIn recent years, researchers have tried to create unhindered human-computer interaction by giving virtual agents human-like conversational skills. Predicting backchannel feedback for agent listeners has become a novel research hot-spot. The main goal of this paper is to identify appropriate features and methods for backchannel prediction in Mandarin conversations. Firstly, multimodal Mandarin conversations are recorded for the analysis of backchannel behaviors. In order to eliminate individual difference in the original face-to-face conversations, more backchannels from different listeners are gathered together. These data confirm that backchannels occurring in the speakers' pauses form a vast majority in Mandarin conversations. Both prosodic and visual features are used in backchannel prediction. Four types of models based on the speakers' pauses are built by using support vector machine classifiers. An evaluation of the pause-based prediction model has shown relatively high accuracy in consideration of the optional nature of backchannel feedback. Finally, the results of the subjective evaluation validate that the conversations performed between humans and virtual listeners using backchannels predicted by the proposed models is more unhindered compared to other backchannel prediction methods. key words : human-computer interaction, virtual agent, backchannel, Mandarin, support vector machine

show abstract

“…The following features are used to implement the process above 26. Duration from the start of the user's preceding utterance Elapsed time from the end of the previous user utterance Elapsed time from the end of the previous system utterance Pitch/energy contour of the last 100 ms (consisting of three values) Pitch/energy contour of the last 500 ms (consisting of five values) Attribute of the last word in the last recognition results (or current intermediate hypothesis). …”

Section: Response Timing Generationmentioning

confidence: 99%

A spoken dialog system for spontaneous conversations considering response timing and response type

Nishimura

Nakagawa

2010

IEEJ Transactions Elec Engng

Self Cite

View full text Add to dashboard Cite

If a spoken dialog system can respond to a user as naturally as a human, the interaction will appear smoother. In this research, we aim to develop a spoken dialog system that emulates human behavior in a dialog. The proposed system makes use of a decision tree to generate responses at the appropriate times. These responses include 'aizuchi ' (back-channel), 'repetition', 'collaborative completion', etc. At each time interval, the decision tree generates the response timing features referring to the pitch and energy contours, recognition hypotheses, and the preparation status of the response generator. A subjective evaluation shows that there is a high degree of naturalness in the timing of ordinary responses and aizuchi, and that the spoken dialog system exhibits user-friendly behavior. The recorded voice system was preferred to a text-to-speech system (synthesized speech), and almost all subjects felt familiarity with the aizuchi. 

show abstract

A Spoken Dialog System for Chat-Like Conversations Considering Response Timing

Cited by 23 publications

References 14 publications

A probabilistic multimodal approach for predicting listener backchannels

A probabilistic multimodal approach for predicting listener backchannels

Backchannel Prediction for Mandarin Human-Computer Interaction

A spoken dialog system for spontaneous conversations considering response timing and response type

Contact Info

Product

Resources

About