Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-118
|View full text |Cite
|
Sign up to set email alerts
|

Prediction and Generation of Backchannel Form for Attentive Listening Systems

Abstract: In human-human dialogue, especially in attentive listening such as counseling, backchannels are important not only for smooth communication but also for establishing rapport. Despite several studies on when to backchannel, most of the current spoken dialogue systems generate the same pattern of backchannels, giving monotonous impressions to users. In this work, we investigate generation of a variety of backchannel forms according to the dialogue context. We first show the feasibility of choosing appropriate ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(32 citation statements)
references
References 13 publications
0
32
0
Order By: Relevance
“…It is well-known that backchannels and fillers are related with the turn-taking behavior [1, 2], and prediction of these events, particularly backchannels, has been intensively studied using many kinds of features and machine learning techniques [3,4]. Recently, neural network models such as LSTM are introduced to the problem of turn-taking [5,6].…”
Section: Take a Turn By Using Fillersmentioning
confidence: 99%
“…It is well-known that backchannels and fillers are related with the turn-taking behavior [1, 2], and prediction of these events, particularly backchannels, has been intensively studied using many kinds of features and machine learning techniques [3,4]. Recently, neural network models such as LSTM are introduced to the problem of turn-taking [5,6].…”
Section: Take a Turn By Using Fillersmentioning
confidence: 99%
“…Although not directly comparable, the performance of the method proposed outperforms the existing ones, in simpler settings of classification. For example, Kawahara et al [9] reported precision and recall values of 0.643 to predict, using more complex linguistic and prosodic features for five classes. Meena et al [11] obtained 84.64% of accuracy in a binary classification (feedback or not) in an artificial task, using a large set of prosodic, syntactic and contextual features.…”
Section: Discussionmentioning
confidence: 99%
“…"mhm" or head nodes). In [9] logistic regression was applied to predict verbal feedback in the context of simulations of counseling sessions (n=8), using prosody and linguistic features from the dialogues, in a 4 binary-classes approach after the end of each IPU (accuracy: 64.3%, precision, recall, and F1-score: 0.643), with a low recall for verbal feedbacks. Ruede et al [17] applied LSTM networks to detect feedbacks based on acoustic features (power and pitch), in different time windows, in the context of telephone conversations (n=2348), with best results of precision=0.305 and recall=0.488 (F1-score: 0.375).…”
Section: Introductionmentioning
confidence: 99%
“…In the context of neuropsychological interviewing, the choice of lexical backchannel items and their frequencies, as well as the prosodic contour, have been shown to relate to the perceived interviewee's performance [12]. Similarly, an effect on naturalness, empathy, and understanding has been found when considering dialogue context and form of backchannels [13].…”
Section: Feedback Tokensmentioning
confidence: 98%