2018
DOI: 10.5334/tismir.12
|View full text |Cite
|
Sign up to set email alerts
|

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Abstract: This work addresses the problem of matching musical audio directly to sheet music, without any higherlevel abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
77
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 53 publications
(77 citation statements)
references
References 16 publications
0
77
0
Order By: Relevance
“…The approach presented by Dorfer et al (2017a) and further exended in the present article goes beyond that of Dorfer et al (2016) in several respects. Most importantly, the original network required both sheet music and audio as input at the same time, in order to then decide which location in the sheet image best matches the current audio excerpt.…”
Section: Introductionmentioning
confidence: 59%
See 3 more Smart Citations
“…The approach presented by Dorfer et al (2017a) and further exended in the present article goes beyond that of Dorfer et al (2016) in several respects. Most importantly, the original network required both sheet music and audio as input at the same time, in order to then decide which location in the sheet image best matches the current audio excerpt.…”
Section: Introductionmentioning
confidence: 59%
“…In the present work we continue the work of Dorfer et al (2017a) and extend it with the following new contributions, which we hope will greatly facilitate and accelerate future music alignment and retrieval research in the MIR community.…”
Section: Introductionmentioning
confidence: 77%
See 2 more Smart Citations
“…Concerning the third typical case of music stakeholders, many datasets exist for MIR tasks, but they often lack the ability of interoperation. For instance, [16] contains source-separated and mixed audio and video tracks, MIDI scores and frame-level transcriptions; in [12] a dataset containing audio recordings, music scores and sheet music images is used; another interesting multimodal dataset containing time-aligned notes, audio and lyrics is presented in [18]; audio recordings, notes and expressive markings were recently collected in [15]. To date, each of these datasets used its own format for representing the synchronization of music along the various modalities.…”
Section: Applicability To Digital Libraries Repositories and Datasetsmentioning
confidence: 99%