Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1606
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Backchannel Prediction Using Word Embeddings

Abstract: Backchannel responses like "uh-huh", "yeah", "right" are used by the listener in a social dialog as a way to provide feedback to the speaker. In the context of human-computer interaction, these responses can be used by an artificial agent to build rapport in conversations with users. In the past, multiple approaches have been proposed to detect backchannel cues and to predict the most natural timing to place those backchannel utterances. Most of these are based on manually optimized fixed rules, which may fail… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(38 citation statements)
references
References 10 publications
(13 reference statements)
0
34
0
Order By: Relevance
“…It included 246 Cantonese conversations between trained assessors and older adult participants (171 males and 75 females), each being approximately 30 minutes in duration, and with hand transcriptions aligned at the word-level with speech. The scale of this dataset was comparable with some of the largest datasets used for backchanneling studies, such as SwDA [65,77,78]. Participants in this MoCA dataset were aged between 77 and 94, with an average age of 82.9.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…It included 246 Cantonese conversations between trained assessors and older adult participants (171 males and 75 females), each being approximately 30 minutes in duration, and with hand transcriptions aligned at the word-level with speech. The scale of this dataset was comparable with some of the largest datasets used for backchanneling studies, such as SwDA [65,77,78]. Participants in this MoCA dataset were aged between 77 and 94, with an average age of 82.9.…”
Section: Discussionmentioning
confidence: 99%
“…Previous studies formulated the problem in different ways, e.g., focusing on prediction and choosing the same backchannels [34]; bundling multiple binary classifiers to predict different types of backchannels and giving corresponding actions [35]; training a multi-class classifier to predict and act at the same time [35]. Various machine learning algorithms have been applied, such as locally weighted linear regression [87], Hidden Markov Model (HMM) [58,59], Support Vector Machines [52], Long Short-Term Memory networks [28,34,77,78] and hybrid time-delay neural network (TDNN)/HMM system [65]. Feature engineering is another important component for MLbased methods.…”
Section: Model-based Backchanneling Using Machine Learning Algorithmsmentioning
confidence: 99%
“…We now refined this approach by evaluating different methods to add timed word embeddings via word2vec. Comparing the performance using various feature combinations, we observed that adding linguistic features improved the performance over a prediction system that only uses acoustic features [8]. The most commonly used acoustic features in related research are fast and slow voice pitch slopes and pauses of varying lengths.…”
Section: Non-verbal Cues As Feedback For the Usermentioning
confidence: 91%
“…Prior work predicting initiation points uses prosodic features like pitch and frequency variation with bag-of-embeddings to predict backchannels (Ruede et al, 2017a) and turn-completion (Skantze, 2017), and more recently, Ekstedt and Skantze Figure 1: Humans produce overlapping speech with small gaps. By predicting lead to initiation, virtual agents can respond without long waiting periods (2021) finetuned GPT-2 on dialogue datasets to predict turn-completion using only word features.…”
Section: Introductionmentioning
confidence: 99%