1998
DOI: 10.1109/89.650316
|View full text |Cite
|
Sign up to set email alerts
|

Speech analysis and recognition using interval statistics generated from a composite auditory model

Abstract: A modeling approach to auditory speech analysis and recognition is proposed and evaluated, where a composite auditory model is used to generate parallel sets of auditory-nerve instantaneous firing rates (IFR's) along the spatial dimension, followed by a processing stage that constructs from the IFR's an interval statistics in a form called the interpeak interval histogram (IPIH). A speech preprocessor is designed that performs transformation on the auditory IPIH's and interfaces the IPIH-based auditory represe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
7
0
1

Year Published

2001
2001
2016
2016

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(9 citation statements)
references
References 30 publications
1
7
0
1
Order By: Relevance
“…Our study confirms previous reports that timing features derived from an auditory model give a substantial performance benefit compared to MFCCs in Gaussian noises (Kim et al, 1999;Ali et al, 2002;Sheikhzadeh and Deng, 1998;Gajic and Paliwal, 2006). Additionally, we find that the performance benefit is lost in babble.…”
Section: Resultssupporting
confidence: 94%
See 1 more Smart Citation
“…Our study confirms previous reports that timing features derived from an auditory model give a substantial performance benefit compared to MFCCs in Gaussian noises (Kim et al, 1999;Ali et al, 2002;Sheikhzadeh and Deng, 1998;Gajic and Paliwal, 2006). Additionally, we find that the performance benefit is lost in babble.…”
Section: Resultssupporting
confidence: 94%
“…The reason for this might be that their robustness varies in different types of background noise. Many studies have evaluated timing representations using a white noise interferer only (e.g., Ali et al, 2002;Sheikhzadeh and Deng, 1998). The robustness of timing features to more ecologically relevant noise backgrounds, such as multi-talker babble, is less clear, with the few studies that employ babble reporting relatively poor results (e.g., Gajic and Paliwal, 2006).…”
Section: Introductionmentioning
confidence: 98%
“…For example, improvements could be gained by encoding the speech signal with spike timing information, instead of (or in addition to) firing rate. A number of studies (e.g., Kim et al, 1999;Sheikhzadeh and Deng, 1998;Brown et al, 2011) have shown that speech is coded more robustly by timing information in noisy conditions than by average firing rate, or other spectrally based features such as Mel frequency cepstral coefficients. If these principles are investigated further it might be possible to move the speech in noise performance of the model closer to human levels and to develop deeper insights into the underlying mechanisms.…”
Section: Discussionmentioning
confidence: 99%
“…A number of auditory-based approaches for noise-robust ASR have been proposed by different groups. They include models of particular psychoacoustic or physiological effects (e.g., temporal masking [11]), more comprehensive models of "effective" sound processing [15], [30], and complex, physiologically-based inner ear models [31], [14]. We chose to model a single effect, synaptic adaptation.…”
Section: Discussionmentioning
confidence: 99%