Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conferenc
DOI: 10.1109/icics.2003.1292679
|View full text |Cite
|
Sign up to set email alerts
|

A fast and robust speech/music discrimination approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0
1

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(19 citation statements)
references
References 9 publications
0
18
0
1
Order By: Relevance
“…• Energy features, such as root mean square (RMS) of short time energy [25], 4 Hz modulation energy [5,27], percentage of ''low-energy'' frames [5,27,30], silence frame ratio (SFR) [16], noise frame ratio (NFR) [22] and subband energy distribution (SED) [23], etc. ;…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…• Energy features, such as root mean square (RMS) of short time energy [25], 4 Hz modulation energy [5,27], percentage of ''low-energy'' frames [5,27,30], silence frame ratio (SFR) [16], noise frame ratio (NFR) [22] and subband energy distribution (SED) [23], etc. ;…”
Section: Introductionmentioning
confidence: 99%
“…The most popular classifiers include Gaussian likelihood ratio (GLR) [27], Bayesian information criterion (BIC) [24], Bayesian MAP classifier [30], Gaussian mixture model (GMM) [22,27], hidden Markov model (HMM) [18], multi-layer perceptron (MLP) [18,21] and other neutral networks [18], K-nearest neighbor (KNN) [21,23] and support vector machines (SVM) [6,16,20,21,23]. Pikrakis et al [26] proposed a multi-stage speech/music discriminator based on dynamic programming and Bayesian networks with a high accuracy for speech/music discrimination in radio recordings.…”
Section: Introductionmentioning
confidence: 99%
“…The values here considered for the length of the analysis and texture windows (23.22 ms and 1 s, respectively) are widely used in other related works dealing with SMD (El-Maleh et al, 2000; Tzanetakis and Cook, 2002;Burred and Lerch, 2004;Wang et al, 2003). Although other different values are possible, the chosen values provide a good trade-off between accuracy, complexity and delay.…”
Section: Classical Features For Speech/music Discriminationmentioning
confidence: 96%
“…최근 들어 음성, 음악, 배경 음악, 효과음 등 다양한 형태 의 사운드가 포함된 오디오 신호를 특정 카테고리로 판별 (audio discrimination)하거나 여러 가지 카테고리로 분류 (audio classification)하는 연구가 진행되고 있다 [1][2][3][4]. .…”
Section: 서 론 1)unclassified