Music listeners frequently use words to describe music. Personalized music recommendation systems such as Last.fm and Pandora rely on manual annotations (tags) as a mechanism for querying and navigating large music collections. A well-known issue in such recommendation systems is known as the cold-start problem: it is not possible to recommend new songs/tracks until those songs/tracks have been manually annotated. Automatic tag annotation based on content analysis is a potential solution to this problem and has recently been gaining attention. We describe how stacked generalization can be used to improve the performance of a state-of-the-art automatic tag annotation system for music based on audio content analysis and report results on two publicly available datasets.
This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian Information Criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method consists of three modules. In the first module, a measure based on second-order statistics is used; in the second module, the Euclidean distance and T 2 Hotelling statistic are applied; and in the third module, the BIC is utilized. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. A comparison between the performance of the three systems is made based on t-statistics.
The predominant melodic source, frequently the singing voice, is an important component of musical signals. In this paper we describe a method for extracting the predominant source and corresponding melody from "real-world" polyphonic music. The proposed method is inspired by ideas from Computational Auditory Scene Analysis. We formulate predominant melodic source tracking and formation as a graph partitioning problem and solve it using the normalized cut which is a global criterion for segmenting graphs that has been used in Computer Vision. Sinusoidal modeling is used as the underlying representation. A novel harmonicity cue which we term Harmonically Wrapped Peak Similarity is introduced. Experimental results supporting the use of this cue are presented. In addition we show results for automatic melody extraction using the proposed approach.
The leading voice is an important feature of musical piece and can often be considered as the dominant harmonic source. We propose in this paper a new scheme for the purpose of efficient dominant harmonic source separation. This is achieved by considering a new harmonicity cue which is first compared with state-of-the-art cues using a generic evaluation methodology. The proposed separation scheme is then compared to a generic Computational Auditory Scene Analysis framework. Computational speed-up and performance comparison is done using source separation and music information retrieval tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.