Abstract-Although there is a wide variety of low-level audio features for content-based audio indexing and retrieval, they m ay lack the discrimination power needed for accurate description of the aural content, leading into a poor content-based retrieval performance. Furthermore, manual selection of features among a vast collection may easily lead into sub-optimal solutions. In this paper, we propose an evolutionary feature synthesis technique, which co-exists with a feature selection scheme. The synthesis process seeks for the optimal linear / nonlinear operators and feature weights from a pre-defined search space, so as to synthesize a highly discriminative set of new (artificial) features from the set of selected features. The evolutionary search process in the multi-dimensional solution space is based on multi-dimensional particle swarm optimization (MD PSO) algorithm, along with a fractional global best formation (FGBF) technique. Unlike in many existing feature generation approaches found in the literature, the dimension of the synthesized feature vector is also optimized during the process. The synthesized features by the proposed approach are compared with original audio descriptors in an extensive set of retrieval tasks. The experimental results clearly demonstrate a crucial improvement of up to 15-25% in the retrieval performance. Moreover, the proposed synthesis technique surpasses the performance of the artificial neural networks for retrieving accurate audio content.Index Terms-Content based retrieval, Evolutionary computation, Feature extraction, Particle swarm optimization.
I. INTRODUCTIONThe features can be numerical or nominal scalars or vectors describing some specific characteristics of the data, such as, in case of audio signals, tonality or fundamental frequency. The accuracy of the content-based audio retrieval is strongly dependent on the description quality and discrimination capability of the features. Unfortunately, despite the enormous number of different feature extraction methods available in the literature, most of them have significant limitations and drawbacks in describing the audio content, so that the current audio content search techniques cannot really cope up with the human perceptual auditory system. This lack of semantic representation, also known as the "semantic gap", has led researchers to search and acquire several promising ideas for improving the discrimination power of the low-level (primitive) features. Another reason is the so-called curse of dimensionality phenomenon [2], which basically states that in high dimensions the data becomes too sparse for any decent statistical or structural analysis. In the feature selection scheme, the feature vector dimension is lowered by selectively choosing an expressive and compact set of features among a possibly much larger original set. Evolutionary algorithms, such as genetic algorithms (GA) [3] and genetic programming (GP) [4], are 978-1-4673-2821-0/13/$31.00 ©2013 IEEE popular among many other feature selection approaches...