Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction

Johansson, Martin; Skantze, Gabriel

doi:10.18653/v1/w15-4642

Cited by 22 publications

(22 citation statements)

References 29 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are some works that investigated other features in speech such as N-gram model [13], dependency structures [14], and the previous turn-taking behaviors [15]. There are also other works that investigated non-verbal features such as respiratory features [16], head pose features [17], and eye-gaze features [18].…”

Section: Turn-taking Predictionmentioning

confidence: 99%

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

et al. 2018

View full text Add to dashboard Cite

We address prediction of turn-taking considering related behaviors such as backchannels and fillers. Backchannels are used by the listeners to acknowledge that the current speaker can hold the turn. On the other hand, fillers are used by the prospective speakers to indicate a will to take a turn. We propose a turntaking model based on multitask learning in conjunction with prediction of backchannels and fillers. The multitask learning of LSTM neural networks shared by these tasks allows for efficient and generalized learning, and thus improves prediction accuracy. Evaluations with two kinds of dialogue corpora of human-robot interaction demonstrate that the proposed multitask learning scheme outperforms the conventional single-task learning.

show abstract

Section: Turn-taking Predictionmentioning

confidence: 99%

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

et al. 2018

View full text Add to dashboard Cite

show abstract

“…Given the end of an IPU, the model has to predict whether the speaker is making a pause and "holding" the turn, or whether the speaker is yielding the turn. Various feature sets and machine learning algorithms have been proposed, and tested on both humanhuman and human-machine dialogue data (Meena et al, 2014;Schlangen, 2006;Neiberg and Gustafson, 2011;Johansson and Skantze, 2015;Ferrer et al, 2002;Kawahara et al, 2012).…”

Section: Turn-taking In Spoken Dialoguementioning

confidence: 99%

Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks

Skantze¹

2017

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Self Cite

View full text Add to dashboard Cite

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turntaking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turntaking decisions in a human-robot interaction scenario.

show abstract

“…These works and others often focus on a robot's ability to handle a specific aspect of multi-party interactions: receiving and responding to multiple requests [15,26], group detection [28,29], speech recognition [10,11], gesture generation [18], body orientation generation [30], gaze generation [4], etc. Relevant studies in multi-party turn-taking [3,14] use hand-crafted features (e.g., whether someone is speaking, head pose, prosody) to determine when the robot should take a turn, but do not incorporate the contents of speech. The closest multi-party work to ours, [15], uses human-human and human-robot data that was manually labeled to learn low-level submodules for how a bartender robot should interact with multiple customers (e.g., classifying user engagement, or saying pre-defined utterances).…”

Section: Related Workmentioning

confidence: 99%

Autonomously Learning One-To-Many Social Interaction Logic from Human-Human Interaction Data

Nanavati

Doering

Brščić

et al. 2020

Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction

View full text Add to dashboard Cite

We envision a future where service robots autonomously learn how to interact with humans directly from human-human interaction data, without any manual intervention. In this paper, we present a data-driven pipeline that: (1) takes in low-level data of a human shopkeeper interacting with multiple customers (28 hours of collected data); (2) autonomously extracts high-level actions from that data; and (3) learns-without manual intervention-how a robotic shopkeeper should respond to customers' actions online. Our proposed system for learning the interaction logic uses neural networks to first learn which customer actions are important to respond to and then learn how the shopkeeper should respond to those important customer actions. We present a novel technique for learning which customer actions are important by first learning the hidden causal relationship between customer and shopkeeper actions. In an offline evaluation, we show that our proposed technique significantly outperforms state-of-the-art baselines, in both which customer actions are important and how to respond to them. CCS CONCEPTS • Computing methodologies → Learning from demonstrations; • Human-centered computing → HCI theory, concepts and models.

show abstract

Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction

Cited by 22 publications

References 29 publications

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks

Autonomously Learning One-To-Many Social Interaction Logic from Human-Human Interaction Data

Contact Info

Product

Resources

About