In this work we propose approaches to effectively transfer knowledge from weakly labeled web audio data. We first describe a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data. Our model trains efficiently from audios of variable lengths; hence, it is well suited for transfer learning. We then propose methods to learn representations using this model which can be effectively used for solving the target task. We study both transductive and inductive transfer learning tasks, showing the effectiveness of our methods for both domain and task adaptation. We show that the learned representations using the proposed CNN model generalizes well enough to reach human level accuracy on ESC-50 sound events dataset and sets state of art results on this dataset. We further use them for acoustic scene classification task and once again show that our proposed approaches suit well for this task as well. We also show that our methods are helpful in capturing semantic meanings and relations as well. Moreover, in this process we also set state-of-art results on Audioset dataset using balanced training set.
This paper focuses on the automatic extraction of beat structure from a musical piece. A novel statistical approach to modeling beat se quences based on the application of Hidden Markov Models (HMM) is introduced. The resulting beat labels are obtained by running the Viterbi decoder and subsequent lattice rescoring. For the observation vectors we propose a new feature set that is based on the impulsive and harmonic components of the reassigned spectrogram. Different components of observation vectors have been investigated for their efficiency. The main advantage of the proposed approach is the ab sence of imposed deterministic rules. All the parameters are learned from the training data, and the experimental results show the effi ciency of the proposed schema.
This paper addresses feature extraction for automatic chord recognition systems. Most chord recognition systems use chroma features as a front-end and some kind of classifier (HMM, SVM or template matching). The vast majority of feature extraction approaches are based on mapping frequency bins from spectrum or constant-Q spectrum to chroma bins. In this work a set of new chroma features that are based on the time-frequency reassignment (TFR) technique is investigated. The proposed feature set was evaluated on the commonly used Beatles dataset and proved to be efficient for the chord recognition task, outperforming standard chroma.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.