2010
DOI: 10.1007/s00530-010-0205-x
|View full text |Cite
|
Sign up to set email alerts
|

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Abstract: Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-densitybased features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hiera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
1

Year Published

2012
2012
2021
2021

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 33 publications
(23 citation statements)
references
References 26 publications
0
20
1
Order By: Relevance
“…Frame-based features have also been proposed for segmenting and classifying BN audio into broad classes. As an example, two pitchdensity-based features are proposed in [23], the authors use short-time energy (STE) in [1,24,25], and harmonic features are used in [26][27][28]. The frame-based features can be directly used in the classifier.…”
Section: General Description Of Audio Segmentation Systemsmentioning
confidence: 99%
“…Frame-based features have also been proposed for segmenting and classifying BN audio into broad classes. As an example, two pitchdensity-based features are proposed in [23], the authors use short-time energy (STE) in [1,24,25], and harmonic features are used in [26][27][28]. The frame-based features can be directly used in the classifier.…”
Section: General Description Of Audio Segmentation Systemsmentioning
confidence: 99%
“…More precisely, these features with other extended sets of features have been proposed for segmenting and classifying BN audio into broad classes. Among others, two pitch-density-based features are proposed in [11], short-time energy (STE) is used in [12][13][14], and harmonic features are used in [15][16][17]. The previously mentioned features are short-term characteristics because they are extracted within short periods of time (between 10 and 30 ms), usually known in the literature as frame-based features.…”
Section: Introductionmentioning
confidence: 99%
“…These story boundaries are certainly sentence boundaries. Therefore, we use an SVM binary tree approach [13] to detect music regions and whether …”
Section: Speaker Turn and Musicmentioning
confidence: 99%