Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization

Nakamura, Eijiro; Benetos, Emmanouil; Yoshii, Kazuyoshi; Dixon, Simon

doi:10.1109/icassp.2018.8461914

Cited by 43 publications

(44 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A dataset that best fits this task is the recently published ASAP dataset [18], which will be investigated as future work. We collect scores in MusicXML format, convert them to MIDI files, and synthesize audio files using four piano models using the Native Instruments Kontakt Player 4 . The scores we collect cover various key and time signatures, tempos, modes and polyphony levels, but do not contain grace notes, triplets, arpeggios, trios or other complex playing techniques.…”

Section: Datamentioning

confidence: 99%

“…The recent literature has mainly focused on two approaches for complete transcription: 1) traditional methods transcribe music audio step by step in the order of subtasks [4,5], and 2) end-to-end L. Liu is a research student at the UKRI Centre for Doctoral Training in Artificial Intelligence and Music, supported jointly by the China Scholarship Council and Queen Mary University of London.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Joint Multi-Pitch Detection and Score Transcription for Polyphonic Piano Music

Liu

Morfi

Benetos

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Research on automatic music transcription has largely focused on multi-pitch detection; there is limited discussion on how to obtain a machine-or human-readable score transcription. In this paper, we propose a method for joint multi-pitch detection and score transcription for polyphonic piano music. The outputs of our system include both a piano-roll representation (a descriptive transcription) and a symbolic musical notation (a prescriptive transcription). Unlike traditional methods that further convert MIDI transcriptions into musical scores, we use a multitask model combined with a Convolutional Recurrent Neural Network and Sequence-to-sequence models with attention mechanisms. We propose a Reshaped score representation that outperforms a LilyPond representation in terms of both prediction accuracy and time/memory resources, and compare different input audio spectrograms. We also create a new synthesized dataset for score transcription research. Experimental results show that the joint model outperforms a single-task model in score transcription.

show abstract

Section: Datamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Joint Multi-Pitch Detection and Score Transcription for Polyphonic Piano Music

Liu

Morfi

Benetos

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…To evaluate the proposed method, we calculated the pitch error rate Ep, the extra note rate Ee, the missing note rate Em, the onset-time error rate Eon, the offset-time error rate E of f , and the overall error rate E all [17] by comparing transcribed and corrected sequences with the ground-truth sequences. The musical naturalness was evaluated in terms of the rate of diatonic notes R dn because the majority of notes should be on a scale.…”

Section: Experimental Conditionsmentioning

confidence: 99%

Statistical Correction of Transcribed Melody Notes Based on Probabilistic Integration of a Music Language Model and a Transcription Error Model

Hiramatsu

Shibata

Nishikimi

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper describes a statistical post-processing method for automatic singing transcription that corrects pitch and rhythm errors included in a transcribed note sequence. Although the performance of frame-level pitch estimation has been improved drastically by deep learning techniques, note-level transcription of singing voice is still an open problem. Inspired by the standard framework of statistical machine translation, we formulate a hierarchical generative model of a transcribed note sequence that consists of a music language model describing the pitch and onset transitions of a true note sequence and a transcription error model describing the addition of deletion, insertion, and substitution errors to the true sequence. Because the length of the true sequence might be different from that of the observed transcribed sequence, the most likely sequences with possible different lengths are estimated with Viterbi decoding and the most likely length is then selected with a sophisticated language model based on a long short-term memory (LSTM) network. The experimental results show that the proposed method can correct musically unnatural transcription errors.

show abstract

“…[23,24]. Especially in piano transcription, results of multi-pitch detection contain a significant amount of spurious notes (false positives), which often make the transcription results unplayable [25]. By integrating the present piano-score model and an acoustic model (instead of the edit model) and applying the method for optimization developed in this study, one can impose constraints on performance difficulty of transcription results and reduce these spurious notes.…”

Section: C O N C L U S I O Nmentioning

confidence: 99%

Statistical piano reduction controlling performance difficulty

Nakamura

Yoshii

2018

SIP

Self Cite

View full text Add to dashboard Cite

We present a statistical-modeling method for piano reduction, i.e. converting an ensemble score into piano scores, that can control performance difficulty. While previous studies have focused on describing the condition for playable piano scores, it depends on player's skill and can change continuously with the tempo. We thus computationally quantify performance difficulty as well as musical fidelity to the original score, and formulate the problem as optimization of musical fidelity under constraints on difficulty values. First, performance difficulty measures are developed by means of probabilistic generative models for piano scores and the relation to the rate of performance errors is studied. Second, to describe musical fidelity, we construct a probabilistic model integrating a prior piano-score model and a model representing how ensemble scores are likely to be edited. An iterative optimization algorithm for piano reduction is developed based on statistical inference of the model. We confirm the effect of the iterative procedure; we find that subjective difficulty and musical fidelity monotonically increase with controlled difficulty values; and we show that incorporating sequential dependence of pitches and fingering motion in the piano-score model improves the quality of reduction scores in high-difficulty cases.

show abstract

Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization

Cited by 43 publications

References 17 publications

Joint Multi-Pitch Detection and Score Transcription for Polyphonic Piano Music

Joint Multi-Pitch Detection and Score Transcription for Polyphonic Piano Music

Statistical Correction of Transcribed Melody Notes Based on Probabilistic Integration of a Music Language Model and a Transcription Error Model

Statistical piano reduction controlling performance difficulty

Contact Info

Product

Resources

About