2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6287832
|View full text |Cite
|
Sign up to set email alerts
|

Polyphonic piano note transcription with recurrent neural networks

Abstract: In this paper a new approach for polyphonic piano note onset transcription is presented. It is based on a recurrent neural network to simultaneously detect the onsets and the pitches of the notes from spectral features. Long Short-Term Memory units are used in a bidirectional neural network to model the context of the notes. The use of a single regression output layer instead of the often used one-versus-all classification approach enables the system to significantly lower the number of erroneous note detectio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
76
0
2

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 103 publications
(90 citation statements)
references
References 8 publications
0
76
0
2
Order By: Relevance
“…Other onset detection methods that have performed well in MIREX evaluations include the use of psychoacoustically motivated features [26], transient peak classification [114] and pitch-based features [129]. A data-driven approach using supervised learning, where various neural network architectures have been utilised, has given the best results in several MIREX evaluations, including the most recent one (2012) [17,47,79]. Finally, Degara et al [31] exploit rhythmic regularity in music using a probabilistic framework to improve onset detection, showing that the integration of onset detection with higher-level rhythmic processing is advantageous.…”
Section: Other Transcription Subtasksmentioning
confidence: 99%
“…Other onset detection methods that have performed well in MIREX evaluations include the use of psychoacoustically motivated features [26], transient peak classification [114] and pitch-based features [129]. A data-driven approach using supervised learning, where various neural network architectures have been utilised, has given the best results in several MIREX evaluations, including the most recent one (2012) [17,47,79]. Finally, Degara et al [31] exploit rhythmic regularity in music using a probabilistic framework to improve onset detection, showing that the integration of onset detection with higher-level rhythmic processing is advantageous.…”
Section: Other Transcription Subtasksmentioning
confidence: 99%
“…A moving average or a moving median is usually preferred over a fixed threshold as it can follow the dynamics of a sound (Duxbury et al, 2003;Böck et al, 2012). Additionally, some methods for controlling the salience of a peak are often applied (Dixon, 2006).…”
Section: 1mentioning
confidence: 99%
“…The opposite outcome (low recall and high precision) is expected for too high a threshold value, overshooting many relevant peaks. The harmonic mean of precision and recall, known as the F-measure, is therefore often reported as a "balanced" result of the onset detection procedure (Dixon, 2006;Böck et al, 2012).…”
Section: 1mentioning
confidence: 99%
See 2 more Smart Citations