ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413876
|View full text |Cite
|
Sign up to set email alerts
|

Compact Graph Architecture for Speech Emotion Recognition

Abstract: We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a Graph Convolution Network (GCN)-based architecture that can perform an accurate graph convolution in contrast to the approximate convolution used in standard GCNs. We evaluated the perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 38 publications
(40 citation statements)
references
References 25 publications
0
21
0
Order By: Relevance
“…Works that use graphs to learn audio representation is limited, but steadily increasing. In a recent work, we have shown that graphs can be used to model audio samples effectively, leading to light-weight yet accurate models for emotion recognition in speech [18]. This work used a simple cycle and line graph to describe a given audio data sample.…”
Section: B Graph Neural Network In Audiomentioning
confidence: 99%
See 2 more Smart Citations
“…Works that use graphs to learn audio representation is limited, but steadily increasing. In a recent work, we have shown that graphs can be used to model audio samples effectively, leading to light-weight yet accurate models for emotion recognition in speech [18]. This work used a simple cycle and line graph to describe a given audio data sample.…”
Section: B Graph Neural Network In Audiomentioning
confidence: 99%
“…Ours w/o SSL 70.5 66.7 SegCNN [41] 64.5 -GA-GRU [42] 63.8 55.4 CNNattn [43] 66.7 -WADAN [44] 64.5 -SpeechGCN [5] 62.3 57.8…”
Section: Fully Supervisedmentioning
confidence: 99%
See 1 more Smart Citation
“…Considering each audio sample as a node in a graph, we cast audio classification as a node labeling task. The motivation behind adopting a graph approach is two-fold: (i) It leads to compact models as compared to commonly used recurrent speech models as noted in recent works [5], [6]; (ii) A graph structure, if properly constructed, can efficiently capture the relationship between the small number of available labeled nodes and a larger number of unlabeled nodes. Extensive experiments with standard benchmarks brings out the advantages A. Shirian is with the Department of Computer Science, University of Warwick, UK.…”
Section: Introductionmentioning
confidence: 99%
“…However, works that use graph approach to learning audio representation is limited. We are aware of only one recent work where an audio sequence has been considered as a line graph to exploit graph signal processing theory to achieve accurate spectral graph convolution [5].…”
Section: Introductionmentioning
confidence: 99%