Prediction and Generation of Backchannel Form for Attentive Listening Systems

Kawahara, Tatsuya; Yamaguchi, Takashi; Inoue, Koji; Takanashi, Katsuya; Ward, Nigel

doi:10.21437/interspeech.2016-118

Cited by 50 publications

(32 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is well-known that backchannels and fillers are related with the turn-taking behavior [1, 2], and prediction of these events, particularly backchannels, has been intensively studied using many kinds of features and machine learning techniques [3,4]. Recently, neural network models such as LSTM are introduced to the problem of turn-taking [5,6].…”

Section: Take a Turn By Using Fillersmentioning

confidence: 99%

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

et al. 2018

Self Cite

View full text Add to dashboard Cite

We address prediction of turn-taking considering related behaviors such as backchannels and fillers. Backchannels are used by the listeners to acknowledge that the current speaker can hold the turn. On the other hand, fillers are used by the prospective speakers to indicate a will to take a turn. We propose a turntaking model based on multitask learning in conjunction with prediction of backchannels and fillers. The multitask learning of LSTM neural networks shared by these tasks allows for efficient and generalized learning, and thus improves prediction accuracy. Evaluations with two kinds of dialogue corpora of human-robot interaction demonstrate that the proposed multitask learning scheme outperforms the conventional single-task learning.

show abstract

Section: Take a Turn By Using Fillersmentioning

confidence: 99%

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although not directly comparable, the performance of the method proposed outperforms the existing ones, in simpler settings of classification. For example, Kawahara et al [9] reported precision and recall values of 0.643 to predict, using more complex linguistic and prosodic features for five classes. Meena et al [11] obtained 84.64% of accuracy in a binary classification (feedback or not) in an artificial task, using a large set of prosodic, syntactic and contextual features.…”

Section: Discussionmentioning

confidence: 99%

“…"mhm" or head nodes). In [9] logistic regression was applied to predict verbal feedback in the context of simulations of counseling sessions (n=8), using prosody and linguistic features from the dialogues, in a 4 binary-classes approach after the end of each IPU (accuracy: 64.3%, precision, recall, and F1-score: 0.643), with a low recall for verbal feedbacks. Ruede et al [17] applied LSTM networks to detect feedbacks based on acoustic features (power and pitch), in different time windows, in the context of telephone conversations (n=2348), with best results of precision=0.305 and recall=0.488 (F1-score: 0.375).…”

Section: Introductionmentioning

confidence: 99%

Evaluating Temporal Predictive Features for Virtual Patients Feedbacks

Penteado

Ochs

Bertrand

et al. 2019

Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents

View full text Add to dashboard Cite

One key challenge to create believable embodied conversational agents (ECA) is to produce engaging behavior-and feedbacks (short verbal, vocal and gestural reactions produced when hearing the main speaker) play an important role. In this paper we propose a machine learning-based model for multimodal feedbacks. The goal is to learn, from a corpus of human-human interactions, when a virtual agent should display a feedback along with its type. And to be feasible, an important aspect is to be able to process them in real time, using reliable features. For this purpose, we used random forests with different features, using annotated corpora of taskoriented interactions. Our case study is the context of training doctors to break bad news to a patient (played by an actor or by the ECA). The performance of the method highlights the capacity to predict verbal and non-verbal feedbacks based on a small number of features characterizing temporal information, in particular, the silence and the position of the last feedback. CCS CONCEPTS • Computing methodologies → Artificial intelligence; Intelligent agents; Feature selection.

show abstract

“…In the context of neuropsychological interviewing, the choice of lexical backchannel items and their frequencies, as well as the prosodic contour, have been shown to relate to the perceived interviewee's performance [12]. Similarly, an effect on naturalness, empathy, and understanding has been found when considering dialogue context and form of backchannels [13].…”

Section: Feedback Tokensmentioning

confidence: 98%

Crowd-Sourced Design of Artificial Attentive Listeners

et al. 2017

View full text Add to dashboard Cite

Feedback generation is an important component of humanhuman communication. Humans can choose to signal support, understanding, agreement or also sceptiscism by means of feedback tokens. Many studies have focused on the timing of feedback behaviours. In the current study, however, we keep the timing constant and instead focus on the lexical form and prosody of feedback tokens as well as their sequential patterns. For this we crowdsourced participant's feedback behaviour in identical interactional contexts in order to model a virtual agent that is able to provide feedback as an attentive/supportive as well as attentive/sceptical listener. The resulting models were realised in a robot which was evaluated by third-party observers.

show abstract

Prediction and Generation of Backchannel Form for Attentive Listening Systems

Cited by 50 publications

References 13 publications

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Evaluating Temporal Predictive Features for Virtual Patients Feedbacks

Crowd-Sourced Design of Artificial Attentive Listeners

Contact Info

Product

Resources

About