Polyphonic piano note transcription with recurrent neural networks

Böck, Sebastian; Schedl, Markus

doi:10.1109/icassp.2012.6287832

Cited by 103 publications

(90 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Other onset detection methods that have performed well in MIREX evaluations include the use of psychoacoustically motivated features [26], transient peak classification [114] and pitch-based features [129]. A data-driven approach using supervised learning, where various neural network architectures have been utilised, has given the best results in several MIREX evaluations, including the most recent one (2012) [17,47,79]. Finally, Degara et al [31] exploit rhythmic regularity in music using a probabilistic framework to improve onset detection, showing that the integration of onset detection with higher-level rhythmic processing is advantageous.…”

Section: Other Transcription Subtasksmentioning

confidence: 99%

Automatic music transcription: challenges and future directions

et al. 2013

View full text Add to dashboard Cite

This is the unspecified version of the paper.This version of the publication may differ from the final published version. Abstract Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects. Permanent repository link

show abstract

Section: Other Transcription Subtasksmentioning

confidence: 99%

Automatic music transcription: challenges and future directions

et al. 2013

View full text Add to dashboard Cite

show abstract

“…A moving average or a moving median is usually preferred over a fixed threshold as it can follow the dynamics of a sound (Duxbury et al, 2003;Böck et al, 2012). Additionally, some methods for controlling the salience of a peak are often applied (Dixon, 2006).…”

Section: 1mentioning

confidence: 99%

“…The opposite outcome (low recall and high precision) is expected for too high a threshold value, overshooting many relevant peaks. The harmonic mean of precision and recall, known as the F-measure, is therefore often reported as a "balanced" result of the onset detection procedure (Dixon, 2006;Böck et al, 2012).…”

Section: 1mentioning

confidence: 99%

“…The popularity of machine learning applications for onset detection is growing rapidly with some excellent results reported in recent research. Neural networks are the tool of choice (Lacoste and Eck, 2007;Böck et al, 2012), although other data-driven techniques have also been used (Davy and Godsill, 2002). The input data usually consist of a time-frequency representation of the sound signal, mapped non-linearly in the frequency domain according to a perceptual model.…”

Section: 3mentioning

confidence: 99%

“…The input data usually consist of a time-frequency representation of the sound signal, mapped non-linearly in the frequency domain according to a perceptual model. Böck et al (2012) used a bank of triangular filters positioned at critical bands of the Bark scale to filter the STFT magnitude spectra, computed with three different window lengths in parallel. In this way, the redundancy resulting from unnecessarily high frequency resolution of the STFT in the upper frequency range may be avoided.…”

Section: 3mentioning

confidence: 99%

See 2 more Smart Citations

Note onset detection in musical signals via neural–network–based multi–ODF fusion

Stasiak

Mońko

Niewiadomski

2016

International Journal of Applied Mathematics and Computer Science

View full text Add to dashboard Cite

The problem of note onset detection in musical signals is considered. The proposed solution is based on known approaches in which an onset detection function is defined on the basis of spectral characteristics of audio data. In our approach, several onset detection functions are used simultaneously to form an input vector for a multi-layer non-linear perceptron, which learns to detect onsets in the training data. This is in contrast to standard methods based on thresholding the onset detection functions with a moving average or a moving median. Our approach is also different from most of the current machinelearning-based solutions in that we explicitly use the onset detection functions as an intermediate representation, which may therefore be easily replaced with a different one, e.g., to match the characteristics of a particular audio data source. The results obtained for a database containing annotated onsets for 17 different instruments and ensembles are compared with state-of-the-art solutions.

show abstract

Tonal Analysis

2022

An Introduction to Audio Content Analysis

View full text Add to dashboard Cite

Polyphonic piano note transcription with recurrent neural networks

Cited by 103 publications

References 8 publications

Automatic music transcription: challenges and future directions

Automatic music transcription: challenges and future directions

Note onset detection in musical signals via neural–network–based multi–ODF fusion

Tonal Analysis

Contact Info

Product

Resources

About