A fast and robust speech/music discrimination approach

Wang, W.Q.; Gao, Wei; Ying, Dongwen

doi:10.1109/icics.2003.1292679

Cited by 34 publications

(19 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…• Energy features, such as root mean square (RMS) of short time energy [25], 4 Hz modulation energy [5,27], percentage of ''low-energy'' frames [5,27,30], silence frame ratio (SFR) [16], noise frame ratio (NFR) [22] and subband energy distribution (SED) [23], etc. ;…”

Section: Introductionmentioning

confidence: 99%

“…The most popular classifiers include Gaussian likelihood ratio (GLR) [27], Bayesian information criterion (BIC) [24], Bayesian MAP classifier [30], Gaussian mixture model (GMM) [22,27], hidden Markov model (HMM) [18], multi-layer perceptron (MLP) [18,21] and other neutral networks [18], K-nearest neighbor (KNN) [21,23] and support vector machines (SVM) [6,16,20,21,23]. Pikrakis et al [26] proposed a multi-stage speech/music discriminator based on dynamic programming and Bayesian networks with a high accuracy for speech/music discrimination in radio recordings.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

et al. 2010

View full text Add to dashboard Cite

Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-densitybased features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

et al. 2010

View full text Add to dashboard Cite

show abstract

“…The values here considered for the length of the analysis and texture windows (23.22 ms and 1 s, respectively) are widely used in other related works dealing with SMD (El-Maleh et al, 2000; Tzanetakis and Cook, 2002;Burred and Lerch, 2004;Wang et al, 2003). Although other different values are possible, the chosen values provide a good trade-off between accuracy, complexity and delay.…”

Section: Classical Features For Speech/music Discriminationmentioning

confidence: 96%

Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination

Ruiz-Reyes

García-Galán

Muñoz

2010

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

“…최근 들어 음성, 음악, 배경 음악, 효과음 등 다양한 형태 의 사운드가 포함된 오디오 신호를 특정 카테고리로 판별 (audio discrimination)하거나 여러 가지 카테고리로 분류 (audio classification)하는 연구가 진행되고 있다 [1][2][3][4]. .…”

Section: 서 론 1)unclassified

Implementation of Music Signals Discrimination System for FM Broadcasting

Kang¹

2009

The KIPS Transactions:PartB

View full text Add to dashboard Cite

This paper proposes a Gaussian mixture model(GMM)-based music discrimination system for FM broadcasting. The objective of the system is automatically archiving music signals from audio broadcasting programs that are normally mixed with human voices, music songs, commercial musics, and other sounds. To improve the system performance, make it more robust and to accurately cut the starting/ending-point of the recording, we also added a post-processing module. Experimental results on various input signals of FM radio programs under PC environments show excellent performance of the proposed system. The fixed-point simulation shows the same results under 3MIPS computational power.

show abstract

A fast and robust speech/music discrimination approach

Cited by 34 publications

References 9 publications

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination

Implementation of Music Signals Discrimination System for FM Broadcasting

Contact Info

Product

Resources

About