Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features

Doğan, Ebru Apaydın; Sert, Mustafa; Yazıcı, Adnan

doi:10.1109/mmedia.2009.35

Cited by 11 publications

(13 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[1][2][3][4][5][6][7][8][9] Concerning the audio data, the automatic analysis of the audio signals can offer the users useful information. In the case of broadcast news, automatic processing is related to tasks such as sound recognition, 10,11 speaker recognition, 12 anchor detection, 13 role detection, [14][15][16] story boundary detection, 2,17,18 summary construction from anchor talking, 9,19 channel's quality detection, 20 sound event detection, 21,22 non-linguistic humanproduced sounds detection, 5,6,[23][24][25] audio type segmentation in sport games, 4,26,27 highlight scene extraction from sports games, 3 violence scene detection, 28 music characteristics classification, 29,30 jingle detection, 1 commercial block detection, 8 voice activity detection, 31 language recognition, 32 emotion recognition 33 and speech recognition. 34 Sound recognition is the cornerstone of analysis as typically precedes the other stages.…”

Section: Introductionmentioning

confidence: 99%

“…The most commonly used are the Gaussian mixture models and the hidden Markov models. 10,11,14,26,37,40 Also widely used are the support vector machines, 11,14,38,39,41 the artificial neural networks, 10 the k-nearest neighbor algorithm, 14,38 the decision trees, 10,38 the genetic algorithms, 2 the fuzzy logic 42 and boosting techniques. 41,43 Related architectures incorporate fusion frameworks among recognition models 28,44 and combination of model-based and distance based algorithms.…”

Section: Introductionmentioning

confidence: 99%

“…13,26,27,39,40 Postprocessing schemes can improve the overall recognition accuracy. Among the postprocessing schemes are (i) transformation of the feature matrix, 23,[44][45][46] (ii) correction of logical errors based on empirical rules, 11 (iii) isolation of the segments of interest in cases where the post-processing is focused on specific classes 10,11,13,38,40,47 and (iv) merging of sound events and separation of them in a post-processing stage. 28 The structure of the analysis of sounds categorizes the task to different classes.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

Theodorou

Mporas

Lazaridis

et al. 2017

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

Aiming to an automatic sound recognizer for radio broadcasting events, a methodology of clustering the audio feature space using the discrimination ability of the audio descriptors as a criterion, is investigated in this work. From a given and close set of audio events, commonly found in broadcast news transmissions, a large set of audio descriptors is extracted and their data-driven ranking of relevance is clustered, providing a more robust feature selection. The clusters of the feature space are feeding machine learning algorithms implemented as classification models during the experimental evaluation. This methodology showed that support vector machines provide significantly good results, considering the achieved accuracy due to their ability of coping well in high dimensionality experimental conditions.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

Theodorou

Mporas

Lazaridis

et al. 2017

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

show abstract

“…SVM was used also in [8], where it was applied to transform domain indexing by using a non-standard audio codec in a music genreclassification application. As a popular classifier, SVM was also used, along with the Hidden Markov Models (HMMs), in [9] to classify audio content into five non-silent classes. In [9], a unique HMM-model is trained for each non-silent class using MPEG-7 features.…”

Section: Introductionmentioning

confidence: 99%

“…As a popular classifier, SVM was also used, along with the Hidden Markov Models (HMMs), in [9] to classify audio content into five non-silent classes. In [9], a unique HMM-model is trained for each non-silent class using MPEG-7 features. Training set encapsulated 50% of the entire dataset in achieving the reported accuracy rates, which are, however, highly dependent on the selection of the SVM parameters, which is a well-known fact in the field.…”

Section: Introductionmentioning

confidence: 99%

Content-based audio classification using collective network of binary classifiers

Mäkinen

Kıranyaz

Gabbouj

2011

2011 IEEE Workshop on Evolving and Adaptive Intelligent Systems (EAIS)

View full text Add to dashboard Cite

Abstract-In this paper, a novel collective network of binary classifiers (CNBC) framework is presented for content-based audio classification. The topic has been studied in several publications before, but in many cases the number of different classification categories is quite limited and needed to be fixed a priori. We focus our efforts to increase both the classification accuracy and the number of classes, as well as to create a scalable network design, which allows introducing new audio classes incrementally. The approach is based on dividing a major classification problem into several networks of binary classifiers (NBCs), where each NBC adapts its internal topology according to the classification problem at hand, by using evolutionary Artificial Neural Networks (ANNs). In the current work, feedforward ANNs, or the so-called Multilayer Perceptrons (MLPs), are evolved within an architecture space, where a stochastic optimization is applied to seek for the optimal classifier configuration and parameters. The performance evaluations of the proposed framework over an 8-class benchmark audio database demonstrate its scalability and notable potential, as classification error rates of less than 9% are achieved.

show abstract

A flexible and scalable audio information retrieval system for mixed-type audio signals

Doğan

Sert

Yazıcı

2011

Int. J. Intell. Syst.

Self Cite

View full text Add to dashboard Cite

The content-based classification and retrieval of real-world audio clips is one of the challenging tasks in multimedia information retrieval. Although the problem has been well studied in the last two decades, most of the current retrieval systems cannot provide flexible querying of audio clips due to the mixed-type form (e.g., speech over music and speech over environmental sound) of audio information in real world. We present here a complete, scalable, and extensible contentbased classification and retrieval system for mixed-type audio clips. The system gives users an opportunity for flexible querying of audio data semantically by providing four alternative ways, namely, querying by mixed-type audio classes, querying by domain-based fuzzy classes, querying by temporal information and temporal relationships, and querying by example (QBE). In order to reduce the retrieval time, a hash-based indexing technique is introduced. Two kinds of experiments were conducted on the audio tracks of the TRECVID news broadcasts to evaluate the performance of the proposed system. The results obtained from our experiments demonstrate that the Audio Spectrum Flatness feature in MPEG-7 standard performs better in music audio samples compared to other kinds of audio samples and the system is robust under different conditions. C 2011 Wiley Periodicals, Inc.

show abstract

Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features

Cited by 11 publications

References 10 publications

Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

Content-based audio classification using collective network of binary classifiers

A flexible and scalable audio information retrieval system for mixed-type audio signals

Contact Info

Product

Resources

About