Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1124
|View full text |Cite
|
Sign up to set email alerts
|

At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech

Abstract: Recognition of natural emotion in speech is a challenging task. Different methods have been proposed to tackle this complex task, such as acoustic feature brute-forcing or even endto-end learning. Recently, bag-of-audio-words (BoAW) representations of acoustic low-level descriptors (LLDs) have been employed successfully in the domain of acoustic event classification and other audio recognition tasks. In this approach, feature vectors of acoustic LLDs are quantised according to a learnt codebook of audio words.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
90
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3
1

Relationship

4
5

Authors

Journals

citations
Cited by 115 publications
(95 citation statements)
references
References 18 publications
(19 reference statements)
5
90
0
Order By: Relevance
“…The bag-of-words (BoW) approach is known from natural language processing [30], which can be referred to the early description in [12]. Particularly, in the application of speech emotion recognition [21,26], and the area of health care [13,25], the BoAW approach achieved numerous excellent results. Motivated by the success of the BoAW method in aforementioned studies, we propose the badof-behaviour-words (BoBW) approach.…”
Section: Bag-of-behaviour-words Approachmentioning
confidence: 99%
“…The bag-of-words (BoW) approach is known from natural language processing [30], which can be referred to the early description in [12]. Particularly, in the application of speech emotion recognition [21,26], and the area of health care [13,25], the BoAW approach achieved numerous excellent results. Motivated by the success of the BoAW method in aforementioned studies, we propose the badof-behaviour-words (BoBW) approach.…”
Section: Bag-of-behaviour-words Approachmentioning
confidence: 99%
“…Emotion recognition from audiovisual signals usually relies on feature sets whose extraction is based on expertise gained over several decades of research in the domains of speech processing, e. g., Mel Frequency Cepstral Coefficients (MFCCs), and vision computing, e. g., Facial Action Units (FAUs). However, recent advances in the field of representation learning, whose objective is to learn representations of data that are best suited for the recognition task [6], have shown that efficient representations of audiovisual signals can be learnt in the context of emotion [2,59,71].…”
Section: Baseline Featuresmentioning
confidence: 99%
“…Further, we will look into the data imbalance effects of the database and how this could possibly improve robustness. Moreover, we will combine LSTM and GRU networks on the recently proposed Bag-Of-AudioWords approach [30]. Finally, we also plan to do a full endto-end training of the combined feature and posterior models and examine other network architectures, such as variants of the LSTM models or Convolutional Neural Networks.…”
Section: Discussionmentioning
confidence: 99%