Proceedings of the 2nd ACM International Workshop on Multimedia Databases 2004
DOI: 10.1145/1032604.1032620
|View full text |Cite
|
Sign up to set email alerts
|

Automatic classification of speech and music using neural networks

Abstract: The importance of automatic discrimination between speech signals and music signals has evolved as a research topic over recent years. The need to classify audio into categories such as speech or music is an important aspect of many multimedia document retrieval systems. Several approaches have been previously used to discriminate between speech and music data. In this paper, we propose the use of the mean and variance of the discrete wavelet transform in addition to other features that have been used previous… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2007
2007
2018
2018

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 10 publications
(9 reference statements)
0
7
0
Order By: Relevance
“…LPC have been applied also in audio segmentation and general purpose audio retrieval, like in the works by Khan et al [68,69].…”
Section: Autoregression-based Frequency Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…LPC have been applied also in audio segmentation and general purpose audio retrieval, like in the works by Khan et al [68,69].…”
Section: Autoregression-based Frequency Featuresmentioning
confidence: 99%
“…It measures how quickly the power spectrum changes and it can be used to determine the timbre of an audio signal. This feature has been used for speech/music discrimination (like in Jiang et al [60], or in Khan et al [68,69]), musical instrument classification (Benetos et al [10]), music genre classification (Li et al [40], Lu et al [12], Tzanetakis and Cook [28], Wang et al [9]) and environmental sound recognition (see Peltonen et al [18]). • Spectral peaks: this feature was defined by Wang [8] as constellation maps that show the most relevant energy bin components in the time-frequency signal representation.…”
Section: Stft-based Frequency Featuresmentioning
confidence: 99%
“…Kahn and al. Earlier, [22] proposed the wavelet parameterization for speech/music detection. But he used only two values per frame to perform speech/music classification: the mean and the variance of the discrete wavelet transform coefficients.…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…Nevertheless, some systems use other speech/music classifiers, such as Multi-Layer Perceptron [22], [24], Maximum A Posteriori classifier [42], k-Nearest Neighbors [42], and different hybrid systems: MLP/SVM (Support Vector Machine) [14], MLP/HMM [1].…”
Section: Introductionmentioning
confidence: 99%
“…Although they are relatively simple to calculate, they can be representative of the feature sequence. Except for mean and variance, which are of high importance (see [30,31]), we also make use of three percentiles. They reflect upon the value below of which a certain percent of observations may be found.…”
Section: Computation Of Short-term Statisticsmentioning
confidence: 99%