Mel-frequency cepstral coefficient analysis in speech recognition

On, Chin Kim; Pandiyan, P.; Yaacob, Sazali; Saudi, Azali

doi:10.1109/icoci.2006.5276486

Cited by 41 publications

(21 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In this paper, we use ResNet [26] as our CNN model, the input of the CNN model is mel-frequency cepstral coefficient spectrum (MFCC) [19] of songs, including 500 frames in the time dimension and 12 frequency-bins in the frequency dimension. The output vectors are the 20-dimensional predicted latent feature vector of songs.…”

Section: Music Feature Extractionmentioning

confidence: 99%

A Hybrid Recommendation for Music Based on Reinforcement Learning

Wang

2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

The key to personalized recommendation system is the prediction of users' preferences. However, almost all existing music recommendation approaches only learn listeners' preferences based on their historical records or explicit feedback, without considering the simulation of interaction process which can capture the minor changes of listeners' preferences sensitively. In this paper, we propose a personalized hybrid recommendation algorithm for music based on reinforcement learning (PHRR) to recommend song sequences that match listeners' preferences better. We firstly use weighted matrix factorization (WMF) and convolutional neural network (CNN) to learn and extract the song feature vectors. In order to capture the changes of listeners' preferences sensitively, we innovatively enhance simulating interaction process of listeners and update the model continuously based on their preferences both for songs and song transitions. The extensive experiments on real-world datasets validate the effectiveness of the proposed PHRR on song sequence recommendation compared with the state-of-the-art recommendation approaches.

show abstract

Section: Music Feature Extractionmentioning

confidence: 99%

A Hybrid Recommendation for Music Based on Reinforcement Learning

Wang

2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

show abstract

“…Human ear has been proven to resolve frequencies non-linearly across the audio spectrum, thus filter bank analysis is more desirable because it is spectrally based method [8]. Since MFCC fully simulate the human auditory characteristics without any assumptions, it has been widely used in the field of speech recognition.…”

Section: A Selection Of Speech Featuresmentioning

confidence: 99%

“…Backpropagation neural networks have broad application in classification, approximation, prediction, and control aspects. Based on biological analogy, neural networks try to emulate the human brain's ability to learn from examples or incomplete data and especially to generalize concepts [8]. For the excellent ability to classification of BP NN, it is considered as the classifier in this paper.…”

Section: Endpoint Detection Algorithm Based On Bp Nnmentioning

confidence: 99%

An endpoint detection algorithm based on MFCC and spectral entropy using BP NN

Zhang

2010

2010 2nd International Conference on Signal Processing Systems

View full text Add to dashboard Cite

Endpoint detection is the preliminary job of speech signal processing, it is vital to speech recognition. Most of recent endpoint detection algorithms will give a satisfied result at high SNRs (signal-to-noise ratio), while they might fail in occasion where the noise level is too excessive. In this paper, a novel endpoint detection algorithm based on 12-order MFCC and spectral entropy in the framework of BP NN is presented. It can be shown by the experiments that the proposed method is more reliable and efficient than the traditional ones based on short-term energy at low SNRs.

show abstract

“…Mel-Frequency Cepstral coefficients: MFC analysis has been a popular signal representation method used in many audio classification tasks, especially in speech recognition systems [18]. The basis for the mel-frequency scale is derived from the human perceptual system.…”

Section: Feature Extractionmentioning

confidence: 99%

The SVM Binary Tree Classification Using MRMR and F-Score Feature Selection Algorithms

Vavrek

Juhár

Čižmár

2014

AEI

View full text Add to dashboard Cite

The discrimination between various types of speech and non-speech signals in audio data stream is the fundamental step for further indexing and retrieving. This paper considers some of the basic problems in audio content classification which is the key component in automatic audio retrieval system. It illustrates a potential use of statistical learning algorithm called support vector machine (SVM) for broadcast news (BN) audio classification task. The overall classification architecture uses binary tree SVM (BT-SVM) decision scheme in combination with well known audio features such as, MFCCs and low level MPEG-7 audio descriptors. The important step in creating such classification system is to define the optimal features for each binary SVM classifier. There exist various feature selection algorithms that help to create such feature set. Therefore we decided to implement F-score and Minimum Redundancy Maximum Relevance (MRMR) feature selection algorithms, as an effective search algorithms used in many pattern recognition tasks.

show abstract

Mel-frequency cepstral coefficient analysis in speech recognition

Cited by 41 publications

References 1 publication

A Hybrid Recommendation for Music Based on Reinforcement Learning

A Hybrid Recommendation for Music Based on Reinforcement Learning

An endpoint detection algorithm based on MFCC and spectral entropy using BP NN

The SVM Binary Tree Classification Using MRMR and F-Score Feature Selection Algorithms

Contact Info

Product

Resources

About