2021
DOI: 10.3390/electronics10070851
|View full text |Cite
|
Sign up to set email alerts
|

Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks

Abstract: In this work, we propose considering the information from a polyphony for multi-pitch estimation (MPE) in piano music recordings. To that aim, we propose a method for local polyphony estimation (LPE), which is based on convolutional neural networks (CNNs) trained in a supervised fashion to explicitly predict the degree of polyphony. We investigate two feature representations as inputs to our method, in particular, the Constant-Q Transform (CQT) and its recent extension Folded-CQT (F-CQT). To evaluate the perfo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…We draw inspiration from [22], [41], where the bottleneck is trained with an auxiliary task: predicting the activity (voicing) in monophonic pitch estimation scenarios. Since inactive frames are rare in our multi-pitch scenario, we propose an alternative: As an auxiliary task, we predict the local degree of polyphony, i. e., the number of active pitches in the center frame (0-23), which turned out to be a useful side information for MPE [13], [45]. We derive this information from the bottleneck of the U-net, post-processed by a small, two-layer CNN with 24-class softmax output.…”
Section: D+e Golden and Navy Paths)mentioning
confidence: 99%
“…We draw inspiration from [22], [41], where the bottleneck is trained with an auxiliary task: predicting the activity (voicing) in monophonic pitch estimation scenarios. Since inactive frames are rare in our multi-pitch scenario, we propose an alternative: As an auxiliary task, we predict the local degree of polyphony, i. e., the number of active pitches in the center frame (0-23), which turned out to be a useful side information for MPE [13], [45]. We derive this information from the bottleneck of the U-net, post-processed by a small, two-layer CNN with 24-class softmax output.…”
Section: D+e Golden and Navy Paths)mentioning
confidence: 99%
“…On the validation set V [v > rangespeed [1]] � rangespeed [1] Most likely label x + y For j � 1: popsize e algorithm evaluates g(x − x′) If newvalue_max (j) > value_max (j)…”
Section: Example Application and Analysismentioning
confidence: 99%
“…With the rapid growth of the number of digital music, the piano playing pitch recognition algorithm performs personalized identification by analyzing the historical behavior of pitch-on-demand music [ 1 ]. As a hotspot of development in the new century, the neural network has gained more and more attention and application depending on its advantages in nonlinearity, self-learning, robustness, and self-adaptation [ 2 4 ].…”
Section: Introductionmentioning
confidence: 99%
“…They investigate several architectural choices of a U-net deep neural network architecture and outperform existing methods for bass transcription with careful adapting parameters and training methodology. For the task of multi-pitch estimation, the transcription of all fundamental frequencies in a music recording, Taenzer et al [5] show that an estimate of polyphony, i.e., the number of concurrently playing voices, can refine the predictions of a multi-pitch estimator. Hernandez-Olivan et al [6] also work on music transcription and not only quantify the impact of timbre and onset envelope on the transcription accuracy but also present a model with improved performance by taking into account this extra information.…”
mentioning
confidence: 99%