Deep Neural Network for Musical Instrument Recognition Using MFCCs

Mahanta, Saranga Kingkor; Khilji, Abdullah Faiz Ur Rahman

doi:10.13053/cys-25-2-3946

Cited by 9 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These nonlinear RP embedding features are compared with two of the most popular linear spectral features, the spectrogram and Mel Frequency Cepstral Coefficients (MFCC) 68,69 . While spectrograms of size 432 × 288 are computed from the CMT dataset with 3 s siding windows, 39 dimensional (13 static, 13 delta and 13 double delta) MFCC are computed using a sliding window of 25 ms with an overlap of 10 ms.…”

Section: Choice Of Embedding Dimension and Delay Parameter For Rpmentioning

confidence: 99%

“…These speaker identification experiments use a Deep Neural Network (DNN) on the 39-dim MFCC features derived from the CMT dataset for each of the three modes of speech. The architecture of the DNN used here with ReLU activation function (R) and dropout layers is 512R-1024R-512R-dropout(0.3)-128R-64R-dropout(0.2)-20S, where S is the final softmax layer 69 . The results are given in Table 4.…”

Section: Unimodal Systems With Mfccmentioning

confidence: 99%

See 1 more Smart Citation

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones

Nawas,

Shahina,

Balachandar

et al. 2024

Sci Rep

View full text Add to dashboard Cite

Speech is produced by a nonlinear, dynamical Vocal Tract (VT) system, and is transmitted through multiple (air, bone and skin conduction) modes, as captured by the air, bone and throat microphones respectively. Speaker specific characteristics that capture this nonlinearity are rarely used as stand-alone features for speaker modeling, and at best have been used in tandem with well known linear spectral features to produce tangible results. This paper proposes Recurrent Plot (RP) embeddings as stand-alone, non-linear speaker-discriminating features. Two datasets, the continuous multimodal TIMIT speech corpus and the consonant-vowel unimodal syllable dataset, are used in this study for conducting closed-set speaker identification experiments. Experiments with unimodal speaker recognition systems show that RP embeddings capture the nonlinear dynamics of the VT system which are unique to every speaker, in all the modes of speech. The Air (A), Bone (B) and Throat (T) microphone systems, trained purely on RP embeddings perform with an accuracy of 95.81%, 98.18% and 99.74%, respectively. Experiments using the joint feature space of combined RP embeddings for bimodal (A–T, A–B, B–T) and trimodal (A–B–T) systems show that the best trimodal system (99.84% accuracy) performs on par with trimodal systems using spectrogram (99.45%) and MFCC (99.98%). The 98.84% performance of the B–T bimodal system shows the efficacy of a speaker recognition system based entirely on alternate (bone and throat) speech, in the absence of the standard (air) speech. The results underscore the significance of the RP embedding, as a nonlinear feature representation of the dynamical VT system that can act independently for speaker recognition. It is envisaged that speech recognition too will benefit from this nonlinear feature.

show abstract

Section: Choice Of Embedding Dimension and Delay Parameter For Rpmentioning

confidence: 99%

Section: Unimodal Systems With Mfccmentioning

confidence: 99%

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones

Nawas,

Shahina,

Balachandar

et al. 2024

Sci Rep

View full text Add to dashboard Cite

show abstract

“…MFCC is a feature obtained by simulating the auditory characteristics of the human ear [12], which has a good performance in speech recognition [13]. It is extracted in the following way.…”

Section: Instrument Feature Extractionmentioning

confidence: 99%

Research on Neural Network-based Automatic Music Multi-Instrument Classification Approach

Guo

2024

IJACSA

View full text Add to dashboard Cite

The automatic classification of multi-instruments plays a crucial role in providing services for music retrieval and recommendation. This paper focuses on automatic multiinstrument classification. Firstly, instrument features were analyzed, and Mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive coefficient (PLPC) were extracted from instrument signals. Features were selected using the entropy weight method. The optimal initial weight threshold of a back-propagation neural network (BPNN) was obtained by utilizing the sparrow search algorithm (SSA), achieving a SSA-BPNN classifier. Experiments were conducted using the IRMAS dataset. The results demonstrated that the combination of MFCC and PLPC selected through the entropy weight method achieved the best performance in automatic multi-instrument classification. The method yielded high P value, recall rate, and F1 value, 0.72, 0.71, and 0.71, respectively. Moreover, it outperformed other algorithms such as support vector machine and XGBoost. These results confirm the reliability of the automatic multi-instrument classification method proposed in this paper, making it suitable for practical applications.

show abstract

“…Although this view is not entirely correct, it points out that the sound quality of an instrument is the importance of harmonic amplitudes to the sound quality of musical instruments. In terms of using computer to synthesize piano sound and improving the sound quality of piano sound through computer processing simulation, literature in [10] pointed out that harmonics are an important factor of sound quality, but this paper discusses how to improve the sound quality from the perspective of harmonic amplitude and phase changing with time, piano sound. Study in [11] believes that the harmonic amplitude is an important factor that constitutes the sound quality of the piano, and then uses the method of simulating multiple strings to reasonably adjust the frequency spectrum of the piano to study how to improve the sound of the piano.…”

Section: Related Workmentioning

confidence: 99%

A Piano Single Tone Recognition and Classification Method Based on CNN Model

Geng,

He,

Zhou

2023

IJACSA

View full text Add to dashboard Cite

In order to improve the recognition and classification effect of piano single tone, this paper combines the CNN (Convolutional Neural Networks) model to construct the piano single tone recognition and classification model, and equalizes the uniformly irradiated parabolic tone transmission hardware. In this paper, the analytic method is used to calculate the direction diagram of the tone transmission hardware, and the analytical expression for calculating the gain of the tone transmission hardware is obtained. Moreover, this paper gives the calculation and analytical expression of the hardware gain of the tone transmission in the main lobe, and obtains the calculation method of the relative position of the two tone transmission hardware by using the conversion relationship between the global coordinate and the local coordinate. Finally, the variation law of the received power with the azimuth/elevation angle of the receiving tone transmission hardware and the incident high-power microwave frequency is given. The experimental study shows that the piano single tone recognition and classification method based on CNN model proposed in this paper can play an important role in piano single tone recognition. This article improves the note recognition algorithm for piano music by combining note features with frequency spectrum to obtain note spectrum, which improves the accuracy of audio classification recognition.

show abstract

Deep Neural Network for Musical Instrument Recognition Using MFCCs

Cited by 9 publications

References 0 publications

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones

Research on Neural Network-based Automatic Music Multi-Instrument Classification Approach

A Piano Single Tone Recognition and Classification Method Based on CNN Model

Contact Info

Product

Resources

About