There are insufficient datasets of singing files that are adequately annotated. One of the available datasets that includes a variety of vocal techniques (n = 17) and several singers (m = 20) with several WAV files (p = 3560) is the VocalSet dataset. However, although several categories, including techniques, singers, tempo, and loudness, are in the dataset, they are not annotated. Therefore, this study aims to annotate VocalSet to make it a more powerful dataset for researchers. The annotations generated for the VocalSet audio files include fundamental frequency contour, note onset, note offset, the transition between notes, note F0, note duration, Midi pitch, and lyrics. This paper describes the generated dataset and explains our approaches to creating and testing the annotations. Moreover, four different methods to define the onset/offset are compared.
This article provides an investigation into four different pitch detection algorithms: pYin, Praat, Phase Lock Loops (PLL)-based, and an Extended Complex Kalman Filter approach. The first two algorithms compute the pitch on a frame-by-frame basis while the latter two work on a sample-bysample basis. Only in recent years has there been a noticeable increase in the number of papers applying pitch detection techniques to sung phrases. This investigation is done on a dataset contained 76 files of singers. To create a ground truth from the data an alternative approach using the Spear analysis program is applied. The algorithms are compared using a new Singing Data Analyser tool. It was observed that the pYin and Praat are the most reliable algorithms while the PLL and Kalman filter algorithms are very dependent on the userselected parameters.
This paper introduces a new method for detecting onsets, offsets, and transitions of the notes in real-time solo singing performances. It identifies the onsets and offsets by finding the transitions from one note to another by considering trajectory changes in the fundamental frequencies. The accuracy of our approach is compared with eight well-known algorithms. It was tested with two datasets that contained 130 files of singing. The total duration of the datasets was more than seven hours and had more than 41,000 onset annotations. The analysis metrics used include the Average, the F-Measure Score, and ANOVA. The proposed algorithm was observed to determine onsets and offsets more accurately than the other algorithms. Additionally, unlike the other algorithms, the proposed algorithm can detect the transitions between notes.
Pitch detection is usually one of the fundamental steps in audio signal processing. However, it is common for pitch detectors to estimate a portion of the fundamental frequencies incorrectly, especially in real-time environments and when applied to singing. Therefore, the estimated pitch contour usually has errors. To remove these errors, a contour smoother algorithm should be employed. However, because none of the current contour-smoother algorithms has been explicitly designed to be applied to contours generated from singing, they are often unsuitable for this purpose. Therefore, this article aims to introduce a new smoother algorithm that rectifies this. The proposed smoother algorithm is compared with 15 other smoother algorithms over approximately 2700 pitch contours. Four metrics were used for the comparison. According to all the metrics, the proposed algorithm could smooth the contours more accurately than other algorithms. A distinct conclusion is that smoother algorithms should be designed according to the contour type and the result’s final applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.