Automatic Music Emotion Classification Using Artificial Neural Network Based on Vocal and Instrumental Sound Timbres

Mokhsin, Mudiana; Rosli, Nurlaila Binti; Zambri, Suzana; Ahmad, Nor Diana; Hamidi, Saidatul Rahah

doi:10.3844/jcssp.2014.2584.2592

Cited by 16 publications

(5 citation statements)

References 14 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, we are led to extract features related to the spectrum to enable the classification of diverse snoring sounds. Using the spectrum, eight features are calculated, namely, spectral centroid, spectral spread, spectral flatness, spectral decay point, spectral skewness, spectral slope, spectral entropy, and PR800 [20][21][22][23][24][25][26]. The first seven features are all derived from the spectrum obtained via the Fast Fourier Transform.…”

Section: Frequency-domain Feature Extractionmentioning

confidence: 99%

Improving OSAHS Prevention Based on Multidimensional Feature Analysis of Snoring

Fang,

Liu,

Zhao

et al. 2023

Electronics

View full text Add to dashboard Cite

Obstructive Sleep Apnea–Hypopnea Syndrome (OSAHS), a severe respiratory sleep disorder, presents a significant threat to human health and even endangers life. As snoring is the most noticeable symptom of OSAHS, identifying OSAHS via snoring sound analysis is vital. This study aims to analyze the time-domain and frequency-domain characteristics of snoring sounds to detect OSAHS and its severity. The snoring sounds are extracted and scrutinized from nighttime acoustic signals, with spectral energy ratio features being applied, calculated via the snore detection frequency division method. A variety of time and frequency-domain features are derived from the snoring sounds. A novel Snore Detection Cepstral Coefficient (SDCC) is proposed, based on Mel Frequency Cepstral Coefficients (MFCCs) and snore detection frequency division. Relief-F feature screening is then applied to SDCC and MFCC. Canonical Correlation Analysis (CCA) is utilized on the fusion features obtained as a result, and the results indicate the highest accuracy (97.8%) with Subspace KNN. The optimal classifier with feature combination is used for the snore model of OSASH early warnings all night, effectively recognizing and assessing OSAHS and reflecting the severity of its disease. This result, achieving high accuracy and low computational complexity, shows that the proposed method holds significant promise for developing portable sleep health detection devices.

show abstract

Section: Frequency-domain Feature Extractionmentioning

confidence: 99%

Improving OSAHS Prevention Based on Multidimensional Feature Analysis of Snoring

Fang,

Liu,

Zhao

et al. 2023

Electronics

View full text Add to dashboard Cite

show abstract

“…Some studies tried to evaluate the timbre difference by quantitative methods [75,76]. The major feature is a spectral centroid [77,78]. We calculated spectral centroids and 'microwave ding' was the most distant from the basic sounds (e.g.…”

Section: Natural Sound Selectionmentioning

confidence: 99%

Simultaneous multiple-stimulus auditory brain–computer interface with semi-supervised learning and prior probability distribution tuning

Ogino¹,

Hamada²,

Mitsukura³

2022

J. Neural Eng.

View full text Add to dashboard Cite

Objective. Auditory brain–computer interfaces (BCIs) enable users to select commands based on the brain activity elicited by auditory stimuli. However,existing auditory BCI paradigms cannot increase the number of available commands without decreasing the selection speed, because each stimulus needs to be presented independently and sequentially under the standard oddball paradigm. To solve this problem, we propose a double-stimulus paradigm that simultaneously presents multiple auditory stimuli. Approach. For addition to an existing auditory BCI paradigm, the best discriminable sound was chosen following a subjective assessment. The new sound was located on the right-hand side and presented simultaneously with an existing sound from the left-hand side. A total of six sounds were used for implementing the auditory BCI with a 6 × 6 letter matrix. We employ semi-supervised learning (SSL) and prior probability distribution tuning to improve the accuracy of the paradigm. The SSL method involved updating of the classifier weights, and their prior probability distributions were adjusted using the following three types of distributions: uniform, empirical, and extended empirical (e-empirical). The performance was evaluated based on the BCI accuracy and information transfer rate (ITR). Main results. The double-stimulus paradigm resulted in a BCI accuracy of 67.89 ±11.46% and an ITR of 2.67 ± 1.09 bits/min, in the absence of SSL and with uniform distribution. The proposed combination of SSL with e-empirical distribution improved the BCI accuracy and ITR to 74.59 ± 12.12% and 3.37 ± 1.27 bits/min, respectively. The ERP analysis revealed that contralateral and right-hemispheric dominances contributed to the BCI performance improvement. Significance. Our study demonstrated that a BCI based on multiple simultaneous auditory stimuli, incorporating SSL and e-empirical prior distribution, can increase the number of commands without sacrificing typing speed beyond the acceptable level of accuracy.

show abstract

“…In various research related to MIR, it used various kinds of data mining method for grouping including data classification and clustering such as C4.5 [4], decission tree [5], Support Vector Model [1], Artificial Neural Network [6], Self Organization [7], K-Means [8], [9], etc. Classification process by using this data mining algorithm was initiated with pre-processing stage.…”

Section: Introductionmentioning

confidence: 99%

“…In this stage music part to be used is the refrain, which the part with frequent words and notes repetition and this in the part that most determine mood type include in the music [10]. This part with 30 seconds duration [1], [6] with format of *.wav mono channel is furthermore processed by using signal processing of Fast Fourier Transform (FFT) and nine types…”

Section: Introductionmentioning

confidence: 99%

Design and Analysis System of KNN and ID3 Algorithm for Music Classification based on Mood Feature Extraction

Sudarma¹,

Harsemadi²

2017

IJECE

View full text Add to dashboard Cite

Each of music which has been created, has its own mood which is emitted, therefore, there has been many researches in Music Information Retrieval (MIR) field that has been done for recognition of mood to music. This research produced software to classify music to the mood by using K-Nearest Neighbor and ID3 algorithm. In this research accuracy performance comparison and measurement of average classification time is carried out which is obtained based on the value produced from music feature extraction process. For music feature extraction process it uses 9 types of spectral analysis, consists of 400 practicing data and 400 testing data. The system produced outcome as classification label of mood type those are contentment, exuberance, depression and anxious. Classification by using algorithm of KNN is good enough that is 86.55% at k value = 3 and average processing time is 0.01021. Whereas by using ID3 it results accuracy of 59.33% and average of processing time is 0.05091 second. Keyword:Clasification ID3 KNN Mood Music

show abstract

Automatic Music Emotion Classification Using Artificial Neural Network Based on Vocal and Instrumental Sound Timbres

Cited by 16 publications

References 14 publications

Improving OSAHS Prevention Based on Multidimensional Feature Analysis of Snoring

Improving OSAHS Prevention Based on Multidimensional Feature Analysis of Snoring

Simultaneous multiple-stimulus auditory brain–computer interface with semi-supervised learning and prior probability distribution tuning

Design and Analysis System of KNN and ID3 Algorithm for Music Classification based on Mood Feature Extraction

Contact Info

Product

Resources

About