This paper presents a computationally efficient method for polyphonic pitch estimation. The method employs the Fast Resonator Time-Frequency Image (RTFI) as the basic time-frequency analysis tool. The approach is composed of two main stages. First, a preliminary pitch estimation is obtained by means of a simple peak-picking procedure in the pitch energy spectrum. Such spectrum is calculated from the original RTFI energy spectrum according to harmonic grouping principles. Then the incorrect estimations are removed according to spectral irregularity and knowledge of the harmonic structures of the music notes played on commonly used music instruments. The new approach is compared with a variety of other frame-based polyphonic pitch estimation methods, and results demonstrate the high performance and computational efficiency of the approach.
This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the specific individual is conducted in the second stage. The note recognition stage is independent of piano individual, in which one CNN is used to detect onsets and another is used to estimate the probabilities of pitches at each detected onset. Hence, candidate pitches at candidate onsets are obtained in the first stage. During the note verification, templates for the specific piano are generated to model the attack of note per pitch. Then, the spectrogram of the segment around candidate onset is factorized using attack templates of candidate pitches. In this way, not only the pitches are picked up by note activations, but the onsets are revised. Experiments show that CNN outperforms other types of neural networks in both onset detection and pitch estimation, and the combination of two CNNs yields better performance than a single CNN in note recognition. We also observe that note verification further improves the performance of transcription. In the transcription of a specific piano, the proposed system achieves 82% on note-wise F-measure, which outperforms the state-of-the-art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.