Text, Speech and Dialogue
DOI: 10.1007/978-3-540-74628-7_77
|View full text |Cite
|
Sign up to set email alerts
|

A Spoken Dialog System for Chat-Like Conversations Considering Response Timing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
19
0

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(20 citation statements)
references
References 14 publications
1
19
0
Order By: Relevance
“…Nishimura et al [31] present a unimodal decision-tree approach for producing backchannels based on prosodic features. The system analyzes speech in 100 ms intervals and generates backchannels as well as other paralinguistic cues (e.g., turn taking) as a function of pitch and power contours.…”
Section: Previous Workmentioning
confidence: 99%
“…Nishimura et al [31] present a unimodal decision-tree approach for producing backchannels based on prosodic features. The system analyzes speech in 100 ms intervals and generates backchannels as well as other paralinguistic cues (e.g., turn taking) as a function of pitch and power contours.…”
Section: Previous Workmentioning
confidence: 99%
“…Kitaoka et al used first-order regression coefficients of pitch and power contours to describe patterns and generate response timing [29]. Nishimura et al pointed out that both the last short regions and the longer ones contained information which triggered backchannel responses [30]. Thus, first-order regression coefficients of pitch and power contours in both the last 90ms and the last 500ms of the utterances are adopted to represent the changing trend of prosody.…”
Section: Prosodic Featuresmentioning
confidence: 99%
“…The following features are used to implement the process above 26. Duration from the start of the user's preceding utterance Elapsed time from the end of the previous user utterance Elapsed time from the end of the previous system utterance Pitch/energy contour of the last 100 ms (consisting of three values) Pitch/energy contour of the last 500 ms (consisting of five values) Attribute of the last word in the last recognition results (or current intermediate hypothesis). …”
Section: Response Timing Generationmentioning
confidence: 99%