Content-based classification and retrieval of audio

Zhang, Tong; Kuo, C.‐C. Jay

doi:10.1117/12.325703

Cited by 41 publications

(24 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, concluding remarks and future plans are described in Section 6. For more details about contents covered by Sections 3 and 4, we refer to [3] and [4], respectively.…”

Section: Introductionmentioning

confidence: 99%

<title>Audio-guided audiovisual data segmentation, indexing, and retrieval</title>

Zhang

Kuo

1998

SPIE Proceedings

Self Cite

View full text Add to dashboard Cite

While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e. speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based on morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes such as applause, explosion, bird's sound, etc. This fine-level classification and indexing step is based on time-frequency analysis of audio signals and the use of hidden Markov model (HMM) as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90% for the coarse-level classification, and higher than 85% for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.

show abstract

“…Finally, concluding remarks and future plans are described in Section 6. For more details about contents covered by Sections 3 and 4, we refer to [3] and [4], respectively.…”

Section: Introductionmentioning

confidence: 99%

<title>Audio-guided audiovisual data segmentation, indexing, and retrieval</title>

Zhang

Kuo

1998

SPIE Proceedings

Self Cite

View full text Add to dashboard Cite

show abstract

“…Usually, music data is given in the form of-possibly compressedwave records, the audio data. Hence, feature extraction from audio data has become a hot topic recently (Liu, Wang, & Chen, 1998;Zhang & Kuo, 1998;Guo & Li, 2003;Tzanetakis, 2002). Several specialized extraction methods have shown their performance on some task and data set.…”

Section: Introductionmentioning

confidence: 99%

Automatic Feature Extraction for Classifying Audio Data

2005

View full text Add to dashboard Cite

Abstract. Today, many private households as well as broadcasting or film companies own large collections of digital music plays. These are time series that differ from, e.g., weather reports or stocks market data. The task is normally that of classification, not prediction of the next value or recognizing a shape or motif. New methods for extracting features that allow to classify audio data have been developed. However, the development of appropriate feature extraction methods is a tedious effort, particularly because every new classification task requires tailoring the feature set anew. This paper presents a unifying framework for feature extraction from value series. Operators of this framework can be combined to feature extraction methods automatically, using a genetic programming approach. The construction of features is guided by the performance of the learning classifier which uses the features. Our approach to automatic feature extraction requires a balance between the completeness of the methods on one side and the tractability of searching for appropriate methods on the other side. In this paper, some theoretical considerations illustrate the trade-off. After the feature extraction, a second process learns a classifier from the transformed data. The practical use of the methods is shown by two types of experiments: classification of genres and classification according to user preferences.

show abstract

“…In addition to traditional timbre methods that apply only to isolated musical instrument notes, MPEG-7 also represents noise textures, environmental sounds, music recordings, melodic sequences, vocal utterances and sounds containing mixtures of sources. For some recent work in the area of sound indexing and retrieval, see Wold, Blum, Keislar and Wheaton (1996), Boreczky and Wilcox (1998), Martin and Kim (1998) and Zhang and Kuo (1998).…”

Section: Introductionmentioning

confidence: 99%

General sound classification and similarity in MPEG-7

Casey¹

2001

Org. Sound

View full text Add to dashboard Cite

We introduce a system for generalised sound classification and similarity using a machine-learning framework. Applications of the system include automatic classification of environmental sounds, musical instruments, music genre and human speakers. In addition to classification, the system may also be used for computing similarity metrics between a target sound and other sounds in a database. We discuss the use of hidden Markov models for representing the temporal evolution of audio spectra and present results of testing the system on classification and retrieval tasks. The system has been incorporated into the MPEG-7 international standard for multimedia content description and is therefore publicly available in the form of a set of standardised interfaces and software reference tools for developers and researchers.

show abstract

Content-based classification and retrieval of audio

Cited by 41 publications

References 0 publications

<title>Audio-guided audiovisual data segmentation, indexing, and retrieval</title>

<title>Audio-guided audiovisual data segmentation, indexing, and retrieval</title>

Automatic Feature Extraction for Classifying Audio Data

General sound classification and similarity in MPEG-7

Contact Info

Product

Resources

About