Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Xie, Lei; Zhang, Fu; Feng, Wei; Luo, Yong

doi:10.1007/s00530-010-0205-x

Cited by 33 publications

(23 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Frame-based features have also been proposed for segmenting and classifying BN audio into broad classes. As an example, two pitchdensity-based features are proposed in [23], the authors use short-time energy (STE) in [1,24,25], and harmonic features are used in [26][27][28]. The frame-based features can be directly used in the classifier.…”

Section: General Description Of Audio Segmentation Systemsmentioning

confidence: 99%

Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

Castán

Tavarez

López-Otero

et al. 2015

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes the database, the metric, the systems and the results for the Albayzín-2014 audio segmentation campaign. In contrast to previous evaluations where the task was the segmentation of non-overlapping classes, Albayzín-2014 evaluation proposes the delimitation of the presence of speech, music and/or noise that can be found simultaneously. The database used in the evaluation was created by fusing different media and noises in order to increase the difficulty of the task. Seven segmentation systems from four different research groups were evaluated and combined. Their experimental results were analyzed and compared with the aim of providing a benchmark and showing up the promising directions in this field.

show abstract

Section: General Description Of Audio Segmentation Systemsmentioning

confidence: 99%

Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

Castán

Tavarez

López-Otero

et al. 2015

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…More precisely, these features with other extended sets of features have been proposed for segmenting and classifying BN audio into broad classes. Among others, two pitch-density-based features are proposed in [11], short-time energy (STE) is used in [12][13][14], and harmonic features are used in [15][16][17]. The previously mentioned features are short-term characteristics because they are extracted within short periods of time (between 10 and 30 ms), usually known in the literature as frame-based features.…”

Section: Introductionmentioning

confidence: 99%

Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

Castán

Giménez

Miguel

et al. 2014

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matrices and obtains the scores by computing the log-likelihood ratio for the class model to a non-class model over fixed-length windows. Afterwards, these scores are smoothed to yield longer contiguous segments of the same class by means of different back-end systems. Unlike previous solutions, our proposal does not make use of specific acoustic features and does not need a hierarchical structure. The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. The technique is compared to a hierarchical system with specific acoustic features achieving a significant error reduction.

show abstract

“…These story boundaries are certainly sentence boundaries. Therefore, we use an SVM binary tree approach [13] to detect music regions and whether …”

Section: Speaker Turn and Musicmentioning

confidence: 99%

Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features

Xie

2014

2014 IEEE China Summit &Amp; International Conference on Signal and Information Processing (ChinaSIP)

Self Cite

View full text Add to dashboard Cite

In this paper, we explore the use of prosodic features in sentence boundary detection in Chinese broadcast news. The prosodic features include speaker turn, music, pause duration, pitch, energy and speaking rate. Specifically, considering the Chinese tonal effects in pitch trajectory, we propose to use tone-normalized pitch features. Experiments using decision trees demonstrate that the tone-normalized pitch features show superior performance in sentence boundary detection in Chinese broadcast news. Furthermore, feature combination is able to achieve apparent performance improvement by intuitive feature interactive rules formed in the decision tree. Pause duration and a tone-normalized pitch feature contribute the most part of the feature usage in the best-performing decision tree.

show abstract

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Cited by 33 publications

References 26 publications

Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features

Contact Info

Product

Resources

About