2016
DOI: 10.1007/s40595-016-0071-3
|View full text |Cite
|
Sign up to set email alerts
|

Speech classification using SIFT features on spectrogram images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 19 publications
(5 citation statements)
references
References 38 publications
0
5
0
Order By: Relevance
“…Many speech applications like emotion classification [36][37][38][39][40][41][42], speech classification [43], sound event classification [44,45], speaker recognition [46,47], acoustic scene classification [48] use spectrograms as inputs to derive secondary features. SIFT features extracted from spectrograms are used for speech classification to perform speech classification [43]. Ren et al [44] extracted local binary pattern (LBP) from the logarithm of the Gammatone like spectrogram to do sound event classification.…”
Section: Discriminative Featuresmentioning
confidence: 99%
“…Many speech applications like emotion classification [36][37][38][39][40][41][42], speech classification [43], sound event classification [44,45], speaker recognition [46,47], acoustic scene classification [48] use spectrograms as inputs to derive secondary features. SIFT features extracted from spectrograms are used for speech classification to perform speech classification [43]. Ren et al [44] extracted local binary pattern (LBP) from the logarithm of the Gammatone like spectrogram to do sound event classification.…”
Section: Discriminative Featuresmentioning
confidence: 99%
“…They are nowadays widely used as features in image/audio classification and separation systems [44], [45]. When tested on a sound event or speech classification task, they have shown to provide a significant improvement in classification performance when supplemented with other traditional audio features such as MFCCs and LPCCs [46], [47]. In this paper, we obtain the spectrum of an audio signal by calculating the short-time Fourier transform (STFT) of the audio input.…”
Section: ) Spectrogrammentioning
confidence: 99%
“…Then, features were extracted from these broad phonetic categories and features were the input of back propagation neural network (BPNN) for classification. Gaussian mixture models (GMMs) have been also regarded as the most powerful model for estimating the probabilistic distribution of speech signals associated with each of these HMM states [4]. Meanwhile, the generative training methods of GMM-HMMs based on the expectation maximization (EM) algorithm have been considered for speech recognition [5].…”
Section: Introductionmentioning
confidence: 99%