ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683582
|View full text |Cite
|
Sign up to set email alerts
|

Deep Polyphonic ADSR Piano Note Transcription

Abstract: We investigate a late-fusion approach to piano transcription, combined with a strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM). The network architecture under consideration is compact in terms of its number of parameters and easy to train with gradient descent. The network outputs are fused over time in the final stage to obtain note segmentations, with an HMM whose transition probabilities are chosen based on a model of attack, decay, sustain, release (ADSR) envelopes, commonly use… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 40 publications
(42 citation statements)
references
References 14 publications
(19 reference statements)
0
39
1
Order By: Relevance
“…Most hyperparameters, such as minibatch size (64), choice of optimizer (gradient descent with momentum (0.9) and Nesterov correction [16]), parameter initialization (drawn from a uniform distribution according to [17], colloquially called "Glorot, Uniform") as well as kernel sizes were chosen based on prior knowledge that they lead to acceptable performance for the smaller model described in [10]. The number of feature maps for the scaled up version of the model was chosen as a trade off between capacity and training time.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Most hyperparameters, such as minibatch size (64), choice of optimizer (gradient descent with momentum (0.9) and Nesterov correction [16]), parameter initialization (drawn from a uniform distribution according to [17], colloquially called "Glorot, Uniform") as well as kernel sizes were chosen based on prior knowledge that they lead to acceptable performance for the smaller model described in [10]. The number of feature maps for the scaled up version of the model was chosen as a trade off between capacity and training time.…”
Section: Methodsmentioning
confidence: 99%
“…The latter is conditioned on the onset predictions from the first network as an additional input, but there is no shared representation and there are no shared parameters for the two tasks. The approaches in [9], [10] introduce an additional note offset prediction task, but differ in their parameter sharing strategy. No parameters are shared in [9] (Figure 1a), and three separate networks are trained to predict onsets, intermediate note frames and offsets respectively.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…To obtain data to develop our model, we create a synthesized dataset 2 using scores collected from the MuseScore website 3 . We do this as a starting point and because there is a lack of AMT datasets that provide score ground truth on both physical and musical time.…”
Section: Datamentioning
confidence: 99%
“…A large part of work in AMT falls under the tasks of multi-pitch detection and onset/offset detection (e.g. [2,3]), which are often referred to jointly as note tracking.…”
Section: Introductionmentioning
confidence: 99%