Abstract-Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music transcription easier and more accurate. In this paper, we present a convolutional neural network framework for predominant instrument recognition in real-world polyphonic music. We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length. To obtain the audio-excerpt-wise result, we aggregate multiple outputs from sliding windows over the test audio. In doing so, we investigated two different aggregation methods: one takes the average for each instrument and the other takes the instrument-wise sum followed by normalization. In addition, we conducted extensive experiments on several important factors that affect the performance, including analysis window size, identification threshold, and activation functions for neural networks to find the optimal set of parameters. Using a dataset of 10k audio excerpts from 11 instruments for evaluation, we found that convolutional neural networks are more robust than conventional methods that exploit spectral features and source separation with support vector machines. Experimental results showed that the proposed convolutional network architecture obtained an F1 measure of 0.602 for micro and 0.503 for macro, respectively, achieving 19.6% and 16.4% in performance improvement compared with other state-of-the-art algorithms.
Feature learning for music applications has recently received considerable attention from many researchers. This paper reports on the sparse feature learning algorithm for musical instrument identification, and in particular, focuses on the effects of the frame sampling techniques for dictionary learning and the pooling methods for feature aggregation. To this end, two frame sampling techniques are examined that are fixed and proportional random sampling. Furthermore, the effect of using onset frame was analyzed for both of proposed sampling methods. Regarding summarization of the feature activation, a standard deviation pooling method is used and compared with the commonly used max- and average-pooling techniques. Using more than 47 000 recordings of 24 instruments from various performers, playing styles, and dynamics, a number of tuning parameters are experimented including the analysis frame size, the dictionary size, and the type of frequency scaling as well as the different sampling and pooling methods. The results show that the combination of proportional sampling and standard deviation pooling achieve the best overall performance of 95.62% while the optimal parameter set varies among the instrument classes.
Objectives The current study sought to evaluate whether nursing narratives can be used to predict postoperative length of hospital stay (LOS) following curative surgery for ovarian cancer. Methods A total of 33 patients, aged over 65 years, underwent curative surgery for newly diagnosed ovarian cancer between 2008 and 2012. Based on the median postoperative LOS, patients were divided into two groups: long-stay (>12 days; n = 13) and short-stay (≤12 days; n = 20). Patterns in nursing narratives were examined and compared through a quantitative analysis. Specifically, the total number (TN) of narratives pertaining to care and the standardized number (SN), which was calculated by dividing the TN by the LOS, were compared. Experts evaluated the relevance of the phrases extracted. LOS was then predicted using machine learning techniques. Results The median postoperative LOS was 18 days (interquartile range [IQR]: 16–24 days) in the long-stay group and 9.5 days (IQR: 8–11.25 days) in the short-stay group. In the long-stay group, surgery duration was longer. Overall, patients in the long-stay group showed a higher volume of nursing narratives compared with patients in the short-stay group (SN: 68 vs. 46, p = 0.021). Thirty-two of the most frequently used nursing narratives were selected from 998 uniquely defined nursing narratives. Multiple t-tests were used to compare the TN and real standardized number (RSN; minimum p < 0.1). Mean and standard deviation of classification results of long-short term memory recurrent neural networks for long and short stays were 0.7774 (0.105), 0.745 (0.098), 0.739 (0.107), and 0.765 (0.115) for F1-measure, precision, recall, and area under the receiver operating characteristic, respectively. Agreement between the differential narratives as assessed by statistical methods and the expert response was low (52.6% agreement; McNemar's test p = 0.012). Conclusions Statistical tests showed that nursing narratives that utilized the words “urination,” “food supply,” “bowel mobility,” or “pain” were related to hospital stay in elderly females with ovarian cancer. Additionally, machine learning effectively predicted LOS. Summary The current study sought to determine whether elements of nursing narratives could be used to predict postoperative LOS among elderly ovarian cancer patients. Results indicated that nursing narratives that used the words “urination,” “food supply,” “bowel mobility,” and “pain” significantly predicted postoperative LOS in the study population. Additionally, it was found that machine learning could effectively predict LOS based on quantitative characteristics of nursing narratives.
In woodwind instruments such as a flute, producing a higher-pitched tone than a standard tone by increasing the blowing pressure is called overblowing, and this allows several distinct fingerings for the same notes. This article presents a method that attempts to learn acoustic features that are more appropriate than conventional features such as mel-frequency cepstral coefficients (MFCCs) in detecting the fingering from a flute sound using unsupervised feature learning. To do so, we first extract a spectrogram from the audio and convert it to a mel scale. Then, we concatenate four consecutive mel-spectrogram frames to include short temporal information and use it as a front end for the sparse filtering algorithm. The learned feature is then max-pooled, resulting in a final feature vector for the classifier that has extra robustness. We demonstrate the advantages of the proposed method in a twofold manner: we first visualize and analyze the differences in the learned features between the tones generated by standard and overblown fingerings. We then perform a quantitative evaluation through classification tasks on six selected pitches with up to five different fingerings that include a variety of octave-related and non-octave-related fingerings. The results confirm that the learned features using the proposed method significantly outperform the conventional MFCCs and the residual noise spectrum in every experimental condition for the classification tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.