Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.
BackgroundThe field of psychiatry has seen significant progress in recent years due to worldwide contributions. National productivity, however, in the field of psychiatry is still unclear. In our study, we investigated contributions of individual nations to the field of psychiatry.MethodsThe Web of Science was used to perform a search from 2011 to 2015 on the subject category “psychiatry”. The total number of articles, citations and the per capita numbers were obtained to analyze the contributions of different countries.ResultsIn psychiatry journals from 2011 to 2015, 84,760 articles were published worldwide. The most productive world areas were North America, East Asia, Europe and Oceania. The percentage of articles published in high-income countries was 87.77%, middle-income countries published 12.07%, and lower-income published 0.16%. Most articles were published by the United States (32.68%); the United Kingdom was next (8.59%), which was followed by Germany (6.77%), Australia (5.87%), and Canada (4.9%). The country with the highest number of citations (243,394) was the United States. A positive correlation was found between the population/GDP and the number of publications (P < 0.01). Australia ranked the highest when normalized to population size, and the Netherlands and Norway were next. The Netherlands ranked highest, followed by Israel and Australia when adjusted for GDP.ConclusionsThe authorship of most of the psychiatry articles was from high-income countries and few papers came from low-income countries. The most productive country was the United States. However, when normalized to population size and GDP, some European and Oceania countries were most productive.Electronic supplementary materialThe online version of this article (doi:10.1186/s13033-017-0127-5) contains supplementary material, which is available to authorized users.
Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and alignment algorithms for the task. More recently, further breakthroughs are achieved employing neural network approaches. In this paper, we propose a novel Convolutional Neural Network (CNN) architecture based on the characteristics of the cover song task. We first train the network through classification strategies; the network is then used to extract music representation for cover song identification. A scheme is designed to train robust models against tempo changes. Experimental results show that our approach outperforms state-of-the-art methods on all public datasets, improving the performance especially on the large dataset.
Mood annotation of music is challenging as it concerns not only audio content but also extra-musical information. It is a representative research topic about how to traverse the wellknown semantic gap. In this paper, we propose a new music-mood-specific ontology. Novel ontology-based semantic reasoning methods are applied to effectively bridge content-based information with web-based resources. Also, the system can automatically discover closely relevant semantics for music mood and thus a novel weighting method is proposed for mood propagation. Experiments show that the proposed method outperforms purely contentbased methods and significantly enhances the mood prediction accuracy. Furthermore, evaluations show the system's accuracy could be promisingly increased with the enrichment of metadata.
We present in this paper ByteCover, which is a new feature learning method for cover song identification (CSI). Byte-Cover is built based on the classical ResNet model, and two major improvements are designed to further enhance the capability of the model for CSI. In the first improvement, we introduce the integration of instance normalization (IN) and batch normalization (BN) to build IBN blocks, which are major components of our ResNet-IBN model. With the help of the IBN blocks, our CSI model can learn features that are invariant to the changes of musical attributes such as key, tempo, timbre and genre, while preserving the version information. In the second improvement, we employ the BN-Neck method to allow a multi-loss training and encourage our method to jointly optimize a classification loss and a triplet loss, and by this means, the inter-class discrimination and intra-class compactness of cover songs, can be ensured at the same time. A set of experiments demonstrated the effectiveness and efficiency of ByteCover on multiple datasets, and in the Da-TACOS dataset, ByteCover outperformed the best competitive system by 18.0%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.