2021
DOI: 10.48550/arxiv.2110.03435
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Abstract: Detecting emotions directly from a speech signal plays an important role in effective human-computer interactions. Existing speech emotion recognition models require massive computational and storage resources, making them hard to implement concurrently with other machine-interactive tasks in embedded systems. In this paper, we propose an efficient and lightweight fully convolutional neural network for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 17 publications
(29 reference statements)
0
2
0
Order By: Relevance
“…The most common algorithmic framework for automatic SER involves the extraction of features from audio data, in turn used to train classifiers. Despite various issues, including the scarcity of datasets or the presence of inter-individual differences, most studies involve either “traditional” ML classifiers (Support Vector Machines (SVM)) [ 18 , 19 , 20 , 21 , 22 , 23 ], neural networks (Multi-Layer Perceptrons (MLP)) [ 24 ], Long Short Term Memory (LSTM) networks, Deep Belief Networks (DBN) [ 22 ] or Convolutional Neural Networks (CNN) [ 25 , 26 , 27 ], and probabilistic models (Hidden Markov Models (HMM)) [ 6 , 28 ]. Table 1 outlines an overview of some representative works in the field of SER, along with the datasets and emotions used and classification accuracy, showing the prevalence of neural networks and SVM, often favored in other speech-based ML tasks as well [ 29 ].…”
Section: Introductionmentioning
confidence: 99%
“…The most common algorithmic framework for automatic SER involves the extraction of features from audio data, in turn used to train classifiers. Despite various issues, including the scarcity of datasets or the presence of inter-individual differences, most studies involve either “traditional” ML classifiers (Support Vector Machines (SVM)) [ 18 , 19 , 20 , 21 , 22 , 23 ], neural networks (Multi-Layer Perceptrons (MLP)) [ 24 ], Long Short Term Memory (LSTM) networks, Deep Belief Networks (DBN) [ 22 ] or Convolutional Neural Networks (CNN) [ 25 , 26 , 27 ], and probabilistic models (Hidden Markov Models (HMM)) [ 6 , 28 ]. Table 1 outlines an overview of some representative works in the field of SER, along with the datasets and emotions used and classification accuracy, showing the prevalence of neural networks and SVM, often favored in other speech-based ML tasks as well [ 29 ].…”
Section: Introductionmentioning
confidence: 99%
“…Other than being a reliable means to non-empirically quantify voice impairment in diseases that affect phonatory production, voice analysis is also a completely non-invasive, low-cost and pseudo-real-time solution for deploying telemedicine assessments. Voice-based AI solutions have been successfully experimentally investigated and employed in other medical fields such as dysphonia [ 31 , 32 , 33 ], COVID-19 and pulmonary diseases [ 20 , 22 , 34 , 35 ], and even emotion and stress recognition [ 24 , 36 ].…”
Section: Introductionmentioning
confidence: 99%