Use of Micro-Modulation Features in Large Vocabulary Continuous Speech Recognition Tasks

Dimitriadis, Dimitrios; Bocchieri, Enrico

doi:10.1109/taslp.2015.2430815

Cited by 10 publications

(13 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It was demonstrated that the joined framework, regardless of the low precision of the various levelled TDNN, accomplishes a WRR reduction of 15% according to cutting-edge HMM framework. The author in [7] investigated the exhibition of SR system with the customary Cepstral features when utilizing the linear feature transforms. This combination of features is used to model the DNN-HMM system.…”

Section: Literature Surveymentioning

confidence: 99%

Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language

2019

IJITEE

View full text Add to dashboard Cite

This paper proposes a framework that is intended to do the comparably accurate recognition of speech and in precise, continuous speech recognition (CSR) based on triphone modelling for Kannada dialect. For designing the proposed framework, the features from the speech data are obtained from the well-known feature extraction technique Mel-frequency cepstral coefficients (MFCC) and from its transformations, like, linear discriminant analysis (LDA) and maximum likelihood linear transforms (MLLT) are obtained from Kannada speech data files. At that point, the system is trained to evaluate the hidden Markov model (HMM) parameters for continuous speech (CS) data. The persistent Kannada speech information is gathered from 2600 speakers (1560 men and 1040women) of the age bunch in the scope of 14 years-80 years. The speech information is acquired from different geographical regions of the Karnataka (one of the 29 states situated in the southern part of India) state under degraded condition. It comprises of 21,551 words that spread 30 locales. The performance evaluation of both monophone and triphone models concerning word error rate (WER) is done and the obtained results are compared with the standard databases such as TIMIT and aurora4. A significant reduction in WER is obtained for triphone models. The speech recognition (SR) rate is verified for both offline and online recognition mode for all the speakers. The results reveal that the recognition rate (RR) for Kannada speech corpus has got a better improvement over the state-of-the-art existing databases.

show abstract

Section: Literature Surveymentioning

confidence: 99%

Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language

2019

IJITEE

View full text Add to dashboard Cite

show abstract

“…Here, we have used a linearly-scaled Gabor filterbank to obtain the subband filtered signals. The AM-FM modulation features corresponding to the i th subband filtered signal are extracted from instantaneous frequency fi(t) and amplitude envelope ai(t), where i=1,2,...., L, and L is the number of subband filtered signals [31], i.e.,…”

Section: Proposed Feature Extractionmentioning

confidence: 99%

Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection

Kamble

Patil

2018

Interspeech 2018

View full text Add to dashboard Cite

Voice-based speaker authentication or Automatic Speaker Verification (ASV) system is now becoming practical reality after several decades of research. However, still this technology is very much susceptible to various spoofing attacks. Among various spoofing attacks, replay is the most challenging attack. In this paper, we propose a novel feature set based on our recently introduced Variable length Energy Separation Algorithm (VESA) during INTERSPEECH 2017. The key idea of this paper is to capture the Instantaneous Amplitude (IA) obtained from the instantaneous energy fluctuations. The replay speech is affected by acoustic environment and distortions of intermediate device. Thus, the noise added in replayed speech is important to detect. The Amplitude Modulations (AM) are more susceptible to noise and multipath interferences that may result due to replay mechanism. The experiments are performed on various dependency index (DI) and lower EER of 6.12 % and 11.94 % is found on dev and eval set, respectively, of ASV Spoof 2017 Challenge database. Furthermore, we compare our results with CQCC, LFCC, MFCC, and VESA-IFCC feature sets. The score-level fusion VESA-IFCC and proposed feature set further reduced the EER to 0.19 % and 7.11 % on dev and eval set, respectively.

show abstract

“…More recently, some of these approaches were revised in the framework of Deep Neural Networks (DNNs) where non-linear modeling is feasible. Networks are trained to extract bottleneck features [5], and combine channels [12], achieving similar or better results compared to beamforming. However, training DNNs on multi-style and multi-channel data [20] is the This research work was supported by the EU under the project I-SUPPORT with grant H2020-643666.…”

Section: Introductionmentioning

confidence: 99%

“…Their fusion exhibits robustness in noise and mismatch training/testing conditions (e.g., in Aurora-4 task), as indicated by the single-channel ASR results in recent works [5], [16]. However, only a few works [19], [15] examine their performance in reverberant environments.…”

Section: Introductionmentioning

confidence: 99%

On the improvement of modulation features using multi-microphone energy tracking for robust distant speech recognition

Rodomagoulakis

Maragos

2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-In this work, we investigate robust speech energy estimation and tracking schemes aiming at improved energybased multiband speech demodulation and feature extraction for multi-microphone distant speech recognition. Based on the spatial diversity of the speech and noise recordings of a multimicrophone setup, the proposed Multichannel, Multiband Demodulation (MMD) scheme includes: 1) energy selection across the microphones that are less affected by noise and 2) cross-signal energy estimation based on the cross-Teager energy operator. Instantaneous modulations of speech resonances are estimated on the denoised energies. Second-order frequency modulation features are measured and combined with MFCCs achieving improved distant speech recognition on simulated and real data recorded in noisy and reverberant domestic environments.

show abstract

Use of Micro-Modulation Features in Large Vocabulary Continuous Speech Recognition Tasks

Cited by 10 publications

References 43 publications

Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language

Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language

Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection

On the improvement of modulation features using multi-microphone energy tracking for robust distant speech recognition

Contact Info

Product

Resources

About