Singer Identification Using MFCC and LPC Coefficients from Indian Video Songs

Ratanpara, Tushar V.; Patel, Neelam

doi:10.1007/978-3-319-13728-5_31

Cited by 8 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After experimenting with different combinations of these features, we have come to the conclusion that the combination of MFCC and LPC works best for the task of language identification. This claim is also supported in [32][33] where this combination has proved to be very effective in improving the performance of the model. Moreover, we have found from our experiments that this combination outperforms some new feature extraction techniques like i-vector, x-vector, fusion of DWT and MFCC feature warping and combination of MFCC with GFCC.…”

Section: B Feature Extractionsupporting

confidence: 55%

A Hybrid Meta-Heuristic Feature Selection Method for Identification of Indian Spoken Languages From Audio Signals

et al. 2020

View full text Add to dashboard Cite

With the recent advancements in the fields of machine learning and artificial intelligence, spoken language identification-based applications have been increasing in terms of the impact they have on the day-to-day lives of common people. Western countries have been enjoying the privilege of spoken language recognition-based applications for a while now, however, they have not gained much popularity in multi-lingual countries like India owing to various complexities. In this paper, we have addressed this issue by attempting to identify different Indian languages based on various well-known features like Mel-Frequency Cepstral Coefficient (MFCC), Linear Prediction Coefficient (LPC), Discrete Wavelet Transform (DWT), Gammatone Frequency Cepstral Coefficient (GFCC) as well as a few deep learning architecture based features like i-vector and x-vector extracted from the audio signals. After comparing the initial results, it is observed that the combination of MFCC and LPC produces the best results. Then we have developed a new nature-inspired feature selection (FS) algorithm by hybridizing Binary Bat Algorithm (BBA) with Late Acceptance Hill-Climbing (LAHC) to select the optimal subset from the said feature vectors in order to reduce the model complexity and help it train faster. Using Random Forest (RF) classifier, we have achieved an accuracy of 92.35% on Indic TTS database developed by IIT-Madras, and an accuracy of 100%

show abstract

Section: B Feature Extractionsupporting

confidence: 55%

A Hybrid Meta-Heuristic Feature Selection Method for Identification of Indian Spoken Languages From Audio Signals

et al. 2020

View full text Add to dashboard Cite

show abstract

“…It can be seen from the picture that each person's voice is very different in the feature expression. The MFCC is widely used as a token of speech information because it conforms to the auditory awareness of the human ear [2]. …”

Section: Mfcc Feature Extractionmentioning

confidence: 99%

Design of BP Speaker Recognition System Based on KPCA-MFCC Parameter Optimization

Miao¹,

Sun²,

Tao³

et al. 2018

Proceedings of the 2018 International Conference on Mechanical, Electrical, Electronic Engineering &Amp; Science (MEEES 2018)

View full text Add to dashboard Cite

Abstract. The recognition of the speaker through machine learning algorithm has become a hot spot of research. On the basis of speaker recognition based on BP and traditional MFCC characteristic parameters, the feature parameters of MFCC are reduced by KPCA algorithm, and the BP neural network algorithm is used as the back-end recognition model to classify the speaker. The improved algorithm is simulated on the MATLAB platform and compared with the traditional PCA algorithm. The experimental results show that the improved algorithm has a great improvement in recognition efficiency and recognition accuracy and has a good research value.

show abstract

“…The pre‐processing should be able to reveal the key‐features of phonemes, in order to exploit the capabilities of the classification phase [1]. The most widely used features for speech recognition, and also applied for different tasks involving speech and music signals, are the mel‐frequency cepstral coefficients (MFCCs) [2]. The MFCC are based on the linear model of voice production and a psycho‐acoustic frequency mapping according to the mel scale [1].…”

Section: Introductionmentioning

confidence: 99%

Multi‐objective optimisation of wavelet features for phoneme recognition

Vignolo

Rufiner

Milone

2016

IET signal process.

View full text Add to dashboard Cite

One of the most important issues in speech applications involves the preprocessing stage, which is meant to produce a manageable set of significant features, exploiting the capabilities of the classification phase [5]. The most widely used features for speech recognition, and also applied for different tasks involving speech and music signals, are the mel-frequency cepstral coefficients (MFCCs) [1,5]. These are based on the linear model of voice production and a psychoacoustic scale [5]. Even though MFCCs provide acceptable performance under laboratory conditions, recognition rates degrade significantly in presence of noise. This has motivated many advances in the development of robust feature extraction approaches, like perceptual linear prediction (PLP) and relative spectra [1]. More recently, speech processing techniques based on computational intelligence tools have been developed. For example, several approaches based on evolutionary computation have been proposed for the search of optimal speech representations [8]. Wavelet based processing provides useful tools for the analysis of nonstationary signals, which have been found suitable for speech feature extraction [6]. In order to build a representation based on the wavelet packet transform (WPT), frequently a particular orthogonal basis is selected among all the available basis [6]. However, for speech recognition there is no evidence showing the convenience of the use of orthogonal basis. Therefore, removing the orthogonality restriction the complete WPT decomposition offers a highly redundant set of coefficients, some of which can be selected to build an optimal representation.The optimisation of wavelet decompositions for feature extraction has been studied in many different ways, though it is still an open challenge in speech processing. For example, the optimisation of wavelet decompositions by means of evolutionary algorithms was proposed for image watermarking [4] and for signal denoising [2]. In [9] we proposed a novel approach for the optimisation of over-complete decompositions from a WPT dictionary based on a multi-objective genetic algorithm (MOGA). The MOGA allows to maximise the classification accuracy while minimising the number of features. For the purpose of obtaining appropriate features for state of the art speech recognizers, a classifier based on hidden Markov models (HMM) is used to estimate the capability of candidate solutions, using on a set of English phonemes. The proposed method, which we refer to as evolutionary wavelet packets (EWP), exploits the benefits provided

show abstract

Singer Identification Using MFCC and LPC Coefficients from Indian Video Songs

Cited by 8 publications

References 9 publications

A Hybrid Meta-Heuristic Feature Selection Method for Identification of Indian Spoken Languages From Audio Signals

A Hybrid Meta-Heuristic Feature Selection Method for Identification of Indian Spoken Languages From Audio Signals

Design of BP Speaker Recognition System Based on KPCA-MFCC Parameter Optimization

Multi‐objective optimisation of wavelet features for phoneme recognition

Contact Info

Product

Resources

About