2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178855
|View full text |Cite
|
Sign up to set email alerts
|

Robust excitation-based features for Automatic Speech Recognition

Abstract: In this paper we investigate the use of noise-robust features characterizing the speech excitation signal as complementary features to the usually considered vocal tract based features for Automatic Speech Recognition (ASR). The proposed Excitation-based Features (EBF) are tested in a state-of-theart Deep Neural Network (DNN) based hybrid acoustic model for speech recognition. The suggested excitation features expand the set of periodicity features previously considered for ASR, expecting that these features h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 32 publications
0
4
0
Order By: Relevance
“…Furthermore, the improvements in ASR performance are consistently seen across all the noisy test conditions and with a sophisticated RNN-LM. In addition, the performance achieved is also considerably better than the results such as excitation based features (EB) reported by [33].…”
Section: Experiments and Resultsmentioning
confidence: 62%
See 1 more Smart Citation
“…Furthermore, the improvements in ASR performance are consistently seen across all the noisy test conditions and with a sophisticated RNN-LM. In addition, the performance achieved is also considerably better than the results such as excitation based features (EB) reported by [33].…”
Section: Experiments and Resultsmentioning
confidence: 62%
“…For each dataset, we compare the ASR performance of the proposed approach of learning acoustic representation from raw waveform with acoustic FB (A) with relevance weighting (A-R) and modulation FB (M) with relevance weighting (M-R) denoted as (A-R,M-R), traditional log mel filterbank energy (MFB) features (80 dimension), power normalized filterbank energy (PFB) features [31], mean Hilbert envelope (MHE) features [32], and excitation based (EB) features [33]. We also compare performance with the SincNet method proposed in [11].…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…At the same time it is also noticed that under certain noise conditions, while vocal tract information is lost, fundamental frequency information is still retained. Pitch or fundamental frequency is an excitation source information and excitation source features are robust to noisy conditions [4]. In a study conducted on Japanese Newspaper Article Sentences database, it was noticed that inclusion of F0 information resulted in an absolute improvement of about 9% in noisy speech recognition [5].…”
Section: Introductionmentioning
confidence: 99%
“…In case of noisy speech conditions, the vocal tract information gets severely affected, yet the fundamental frequency (F0) information is largely retained. Drugman et al [8] reported that The data involved in this tonal analysis is described in [10,11].…”
Section: Introductionmentioning
confidence: 99%