A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios. In our approach, wavelet coefficients and forward prediction errors are combined with auditory spectral features and then processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. We compare results with state-of-the-art methods on a dataset that includes Bello, Glover and ISMIR 2004 Ballroom sets, and we conclude that our approach significantly outperforms existing methods in terms of F -Measure. For pitched non percussive music an absolute improvement of 7.5% is reported.
The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point. This paper compares conventional event-based and segment-based criteria against the Polyphonic Sound Detection Score (PSDS)'s intersectionbased criterion, over a selection of systems from DCASE 2020 Challenge Task 4. It shows that, by relying on collars, the conventional event-based criterion introduces different strictness levels depending on the length of the sound events, and that the segment-based criterion may lack precision and be application dependent. Alternatively, PSDS's intersection-based criterion overcomes the dependency of the evaluation on sound event duration and provides robustness to labelling subjectivity, by allowing valid detections of interrupted events. Furthermore, PSDS enhances the comparison of SED systems by measuring sound event modelling performance independently from the systems' operating points.
Novelty detection consists in recognising events that deviate from normality. This paper presents the implementation of a real-time statistical novelty detector on the BeagleBoard-xM. The application processes an incoming audio signal, extracts Power Normalized Cepstral Coefficients and determines whether a novelty sound is present or not based on a statistical model of normality. The novelty detector has been implemented as a standalone graphical application capable of running in real-time on the BeagleBoard-xM platform. Experiments have been conducted to assess the performance of the solution in terms of both detection performance and of real-time capabilities. The results demonstrate that the system is able to operate in real-time on the BeagleBoard-xM with a real-time factor equal to 8.10%, and an F-Measure equal to 77.41%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.