Sentiment Analysis from Sound Spectrograms via Soft BoVW and Temporal Structure Modelling

Pikramenos, George; Smyrnis, Georgios; Vernikos, Ioannis; Konidaris, Thomas; Spyrou, Evaggelos; Perantonis, Stavros

doi:10.5220/0009174503610369

Cited by 5 publications

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The table also shows the used methods and achieved accuracy by each work. As it is clearly shown in the table, the accuracy that are achieved by the proposed SER system outperforms those are achieved by [9] and [10] with an improvement equals to 13.81%.…”

Section: Results Analysismentioning

confidence: 81%

“…The system showed improvements in the achieved accuracy form the baseline results by 7.85% and 4.5%, for the two dataset, respectively. Pikramenos et al (2020) [10] used Oriented FAST and rotated BRIEF (ORB descriptors) that were extracted from key point locations on the spectrogram image to generate an intermediate representation. First, a method similar to Bag-of-Visual-Words (BoVW) is utilized, where a visual vocabulary is built by clustering the descriptors of key points.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Speech Emotion Recognition Using MELBP Variants of Spectrogram Image

Mohammed¹,

Alia²

2020

IJIES

View full text Add to dashboard Cite

Speech emotion recognition finds many applications in the daily life like conversational agents, human robot interaction, call centres etc. However; the task of emotion recognition from speech signal is not trivial due to the difficulty in determining the effective feature set that can recognize the emotion conveyed within the signal in an accurate manner. Image processing techniques are exploited in this paper to solve speech emotion recognition problem. After converting the signal into 2D spectrogram image representation, four forms of Extended Local Binary Pattern (ELBP) are generated to serve as a source for feature extraction stage. The histograms of multiple blocks from ELBP variants are computed and fed to Deep Belief Network (DBN) for classification purpose. Different tests were performed using Surrey AudioVisual Expressed Emotion (SAVEE) database and the achieved results showed that when using combined vectors of MELBP, the system gives the best accuracy which is 72.14%. The achieved result outperforms state-of-the-art results on the same database.

show abstract

Section: Results Analysismentioning

confidence: 81%

Section: Introductionmentioning

confidence: 99%