2005
DOI: 10.1155/asp.2005.1374
|View full text |Cite
|
Sign up to set email alerts
|

A Physiologically Inspired Method for Audio Classification

Abstract: We explore the use of physiologically inspired auditory features with both physiologically motivated and statistical audio classification methods. We use features derived from a biophysically defensible model of the early auditory system for audio classification using a neural network classifier. We also use a Gaussian-mixture-model (GMM)-based classifier for the purpose of comparison and show that the neural-network-based approach works better. Further, we use features from a more advanced model of the audito… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…• Eigenspace-based features: in this category, we find Rate-scale-frequency (RSF) features, which describe modulation components present in certain frequency bands of the auditory spectrum, and they are based in the same human auditory model that incorporates the noise-robust audio features (NRAF), as described in the work by Ravindran et al [228] (see Section 5.6.1). RFS represent a compact and decorrelated representation (they are derived performing a Principal Component Analysis stage) of the two-dimensional Wavelet transform applied to the audio spectrum; • Electroencephalogram-based features: or EEG-based features for short, these find application in human-centered favorite music estimation, as introduced by Sawata et al [244].…”
Section: Other Domainsmentioning
confidence: 99%
See 1 more Smart Citation
“…• Eigenspace-based features: in this category, we find Rate-scale-frequency (RSF) features, which describe modulation components present in certain frequency bands of the auditory spectrum, and they are based in the same human auditory model that incorporates the noise-robust audio features (NRAF), as described in the work by Ravindran et al [228] (see Section 5.6.1). RFS represent a compact and decorrelated representation (they are derived performing a Principal Component Analysis stage) of the two-dimensional Wavelet transform applied to the audio spectrum; • Electroencephalogram-based features: or EEG-based features for short, these find application in human-centered favorite music estimation, as introduced by Sawata et al [244].…”
Section: Other Domainsmentioning
confidence: 99%
“…• Noise-robust audio features: or NRAF for short, these features incorporate a specific human auditory model based on a three stage process (a first stage of filtering in the cochlea, transduction of mechanical displacement in electrical activity-log compression in the hair cell stage-, and a reduction stage using decorrelation that mimics the lateral inhibitory network in the cochlear nucleus) (see Ravindran et al [228]). …”
mentioning
confidence: 99%
“…For this work, the first eight coefficients are calculated over two successive 250-ms samples (10-msec frame, 31-msec window) and averaged together. Then the first coefficient is discarded to build a seven-coefficient vector describing the sample (Ravindran 2006). A sound source class is an average of vectors created from samples known to contain the sound source.…”
Section: Classification Algorithmmentioning
confidence: 99%
“…Training data are deconstructed into spectro-temporal acoustic features as they would be in real-time in the device, ranging from simple (e.g., overall level or level within frequency channel) to complex feature sets including those based on perceptual models of human hearing (e.g., modulation frequency and depth; mel-frequency cepstral coefficients, etc. ; Ravindran et al., 2005). For example, complex scenes with speech are often classified based on their spectral profile and temporal envelope (Chen et al., 2014; Feldbusch, 1998; Kates, 1995), their statistical amplitude distribution (Wagener et al., 2008), or their characteristic temporal and/or spectral modulation frequencies (Nordqvist & Leijon, 2004; Ostendorf et al., 1998).…”
Section: Introductionmentioning
confidence: 99%