Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com 2009
DOI: 10.3115/1620754.1620846
|View full text |Cite
|
Sign up to set email alerts
|

A finite-state turn-taking model for spoken dialog systems

Abstract: This paper introduces the Finite-State TurnTaking Machine (FSTTM), a new model to control the turn-taking behavior of conversational agents. Based on a non-deterministic finite-state machine, the FSTTM uses a cost matrix and decision theoretic principles to select a turn-taking action at any time. We show how the model can be applied to the problem of end-of-turn detection. Evaluation results on a deployed spoken dialog system show that the FSTTM provides significantly higher responsiveness than previous appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
72
0
3

Year Published

2011
2011
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 84 publications
(75 citation statements)
references
References 16 publications
0
72
0
3
Order By: Relevance
“…More recent work on engagement with virtual agents uses more elaborate turn-taking models and supports multiparty conversation (Bohus & Horvitz, 2010). Research in spoken dialog systems also attempts to control the timing of turn-taking over the single modality of speech (Raux & Eskenazi, 2009). Although some results on cue usage in unembodied systems can generalize to robots, the timing of controlling actions on embodied machines differs substantially from that of virtual systems.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…More recent work on engagement with virtual agents uses more elaborate turn-taking models and supports multiparty conversation (Bohus & Horvitz, 2010). Research in spoken dialog systems also attempts to control the timing of turn-taking over the single modality of speech (Raux & Eskenazi, 2009). Although some results on cue usage in unembodied systems can generalize to robots, the timing of controlling actions on embodied machines differs substantially from that of virtual systems.…”
Section: Related Workmentioning
confidence: 99%
“…The work in (Raux & Eskenazi, 2009) and (Nakano et al, 2005) are examples of dialogue systems in which speech interruptions in particular are supported. Interruption has also been addressed more indirectly through an approach of behavior switching (Kanda, Ishiguro, Imai, & Ono, 2004).…”
Section: Action Atomicity In Reciprocal Interactionmentioning
confidence: 99%
“…Experiments show that contours of loudness, approximated by normalized per-frame log-energy, should be concatenated with speech activity trajectories in feature space rather than in model space (as in [6]), in order to give models the opportunity to leverage cross-stream correlations; it appears that the most relevant information is found in audio frames which are both speech and very quiet. The absolute reduction in average cross entropy obtained using this approach, on unseen data consisting of 200 telephone conversations, is 0.031 bits per 100 ms frame of audio, a large improvement when compared to past research [7,8].…”
Section: • What Is the Likely Impact Of The Observed Average Cross Enmentioning
confidence: 99%
“…Although studied for many decades [2,3,4,5,6,7,8], these models continue to exhibit an important limitation: their implementation as N -grams circumscribes their direct applicability to only discrete-valued representations of conditioning context. This limitation has made it hard to study the impact of quantities which are continuous-valued (e.g., loudness or pitch), independently of higher-level linguistic landmarks or assumptions.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, however, work on incremental systems has shown that processing smaller 'chunks' of user input can improve the user experience by providing faster responses and allow more flexibility in turn-taking Purver and Otsuka, 2003;Skantze and Hjalmarsson, 2010;Raux and Eskenazi, 2009;Dethlefs et al, 2012b). Incrementality in spoken dialogue systems enables the system designer to model several dialogue phenomena that play a vital role in human conversation (Levelt, 1989), but have so far been absent from most systems.…”
Section: Introductionmentioning
confidence: 99%