On desensitizing the Mel-Cepstrum to spurious spectral components for Robust Speech Recognition

Tyagi, Vivek; Wellekens, C.

doi:10.1109/icassp.2005.1415167

Cited by 42 publications

(22 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The MFCCs are non-parametric representations of the audio signals and are used to model the human auditory perception system [9]. Therefore, MFCCs are useful for audio recognition [14]. This method had made important contributions in music retrieval to date.…”

Section: A) Music Content Representationmentioning

confidence: 99%

Robust and efficient content-based music retrieval system

Lee

Chiang

Lin

et al. 2016

SIP

View full text Add to dashboard Cite

This work proposes a query-by-singing (QBS) content-based music retrieval (CBMR) system that uses I . I N T R O D U C T I O NDigital music data on the Internet are explosively growing. Therefore, applications of content-based music retrieval (CBMR) system are more and more popular. Searching music by a particular melody of a song directly is more convenient than by a name of a song for people. Moreover, according to the survey from the United Nations [1], the 21st century will witness even more rapid population ageing than did the century just past; therefore, it is important to develop an efficient and accurate way to retrieve the music data.A CBMR method is a more effective approach for a music retrieval system than the text-based method. A CBMR system aims to retrieve and query music by acoustic features of music, while a text-based music retrieval system only takes names, lyrics, and ID3 tags of songs into consideration.Query-by-singing (QBS) is a popular method in CBMR. Many approaches based on QBS have been developed currently. Huang [2] proposed a QBS system by extracting the pitches and the volumes of the music. The data

show abstract

Section: A) Music Content Representationmentioning

confidence: 99%

Robust and efficient content-based music retrieval system

Lee

Chiang

Lin

et al. 2016

SIP

View full text Add to dashboard Cite

show abstract

“…Although the MFCC is known to be very efficient in characterizing the human auditory system, the MFCC values are not very robust in the actual environments, and so some researchers have proposed modifications to the basic MFCC algorithm. Especially, Tyagi and Wellekens suggested a method to desensitize the MFCC coefficients to spurious low-energy spectral perturbation and reported enhanced performance for speech recognition [5]. In this regard, it has been observed in the literature that no weights are applied to the MFCCs in all mel-filter bank indexes without taking full consideration of the relative importance of the MFCCs for gender identification [6,7].…”

Section: Introductionmentioning

confidence: 99%

“…Note that direct application of the steepest descent technique can not be allowed due to the constraints on the weights as specified in (5). Hence, we consider the following parameter transformationw…”

Section: Mfcc Weight Optimization Using Mce Trainingmentioning

confidence: 99%

Discriminative weight training-based optimally weighted MFCC for gender identification

Kang¹,

Chang²

2009

IEICE Electron. Express

View full text Add to dashboard Cite

Abstract:In this paper, we apply a discriminative weight training to a support vector machine (SVM) based gender identification. In our approach, the gender decision rule is derived by the SVM incorporating the optimally weighted mel-frequency cepstral coefficient (MFCC) based on a minimum classification error (MCE) method which is different from the previous works in that optimal weights are differently assigned to each MFCC which is considered more realistic. According to the experimental results, the proposed approach is found to be effective for gender identification based on the SVM. Keywords: gender identification, MFCC, MCE Classification: Science and engineering for electronics References[1] C. Neti and S. Roukos, "Phone-context specific gender-dependent acoustic-models for continuous speech recognition," Proc. IEEE Autom. Speech Recognition Understanding Workshop, pp. 192-198, Dec. 1997. [2] D. F. Marston, "Gender adapted speech coding," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 357-360, 12-15, May 1998. [3] H. Harb and L. Chen, "Voice-based gender identification in multimedia applications," J. Intell. Inf. Syst., vol. 24,

show abstract

“…In this paper, we set SNR at 40 dB. Then the training samples are processed to extract log Mel-filterbank (LMFB) [36] features followed by mean normalization. We also augment the LMFB with pitch-related features [37].…”

Section: System Overviewmentioning

confidence: 99%

Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition

Gao

et al. 2016

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

We explore joint training strategies of DNNs for simultaneous dereverberation and acoustic modeling to improve the performance of distant speech recognition. There are two key contributions. First, a new DNN structure incorporating both dereverberated and original reverberant features is shown to effectively improve recognition accuracy over the conventional one using only dereverberated features as the input. Second, in most of the simulated reverberant environments for training data collection and DNN-based dereverberation, the resource data and learning targets are high-quality clean speech. With our joint training strategy, we can relax this constraint by using large-scale diversified real close-talking data as the targets which are easy to be collected via many speech-enabled applications from mobile internet users, and find the scenario even more effective. Our experiments on a Mandarin speech recognition task with 2000-h training data show that the proposed framework achieves relative word error rate reductions of 9.7 and 8.6 % over the multi-condition training systems for the cases of single-channel and multi-channel with beamforming, respectively. Furthermore, significant gains are consistently observed over the pre-processing approach using simply DNN-based dereverberation.

show abstract

On desensitizing the Mel-Cepstrum to spurious spectral components for Robust Speech Recognition

Cited by 42 publications

References 6 publications

Robust and efficient content-based music retrieval system

Robust and efficient content-based music retrieval system

Discriminative weight training-based optimally weighted MFCC for gender identification

Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition

Contact Info

Product

Resources

About