Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue - SIGdial '08 2008
DOI: 10.3115/1622064.1622066
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing endpointing thresholds using dialogue features in a spoken dialogue system

Abstract: This paper describes a novel algorithm to dynamically set endpointing thresholds based on a rich set of dialogue features to detect the end of user utterances in a dialogue system. By analyzing the relationship between silences in user's speech to a spoken dialogue system and a wide range of automatically extracted features from discourse, semantics, prosody, timing and speaker characteristics, we found that all features correlate with pause duration and with whether a silence indicates the end of the turn, wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
62
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 58 publications
(62 citation statements)
references
References 21 publications
0
62
0
Order By: Relevance
“…In other words, the system bases its turn-taking decisions on a combination of ASR, prosody and silencethresholds, where the length of the threshold differs for different prosodic signals, and where reactions are planned already during the silence. (This is in contrast to Raux & Eskenazi (2008), where context-dependent thresholds are used as well, but only simple end-pointing is performed. )…”
Section: Prosodic Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…In other words, the system bases its turn-taking decisions on a combination of ASR, prosody and silencethresholds, where the length of the threshold differs for different prosodic signals, and where reactions are planned already during the silence. (This is in contrast to Raux & Eskenazi (2008), where context-dependent thresholds are used as well, but only simple end-pointing is performed. )…”
Section: Prosodic Analysismentioning
confidence: 99%
“…Speakers appear to use other knowledge sources, such as prosody, syntax and semantics to detect or even project the end of the utterance. Attempts have been made to incorporate such knowledge sources for turn-taking decisions in spoken dialogue systems (e.g., Ferrer et al, 2002;Raux & Eskenazi, 2008). To do so, incremental dialogue processing is clearly needed.…”
Section: Motivations and Related Workmentioning
confidence: 99%
“…While this model simplifies processing, it fails to account for many aspects of human-human interaction such as hesitations, turn-taking with very short gaps or brief overlaps and backchannels in the middle of utterances (Heldner & Edlund, 2010). More advanced models for turn-taking have been presented, where the system interprets syntactic and prosodic cues to make continuous decisions on when to take the turn or give feedback, resulting in both faster response time and less interruptions (Raux & Eskenazi, 2008;Skantze & Schlangen, 2009;Meena et al, 2014).…”
Section: Turn-taking In Dialogue Systemsmentioning
confidence: 99%
“…The end of a request is determined by a condition EndT urnCond. It can be a long enough silence (Raux and Eskenazi, 2008;Wlodarczak and Wagner, 2013) in the case of vocal services or a carriage return for text systems. A dialogue turn is the time interval during which the user sends a request to the system and gets a response.…”
Section: The Traditional Architecturementioning
confidence: 99%