Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Dorfer, Matthias; Hajič, Jan; Arzt, Andreas; Frostel, Harald; Widmer, Gerhard

doi:10.5334/tismir.12

Cited by 53 publications

(77 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The approach presented by Dorfer et al (2017a) and further exended in the present article goes beyond that of Dorfer et al (2016) in several respects. Most importantly, the original network required both sheet music and audio as input at the same time, in order to then decide which location in the sheet image best matches the current audio excerpt.…”

Section: Introductionmentioning

confidence: 59%

“…In the present work we continue the work of Dorfer et al (2017a) and extend it with the following new contributions, which we hope will greatly facilitate and accelerate future music alignment and retrieval research in the MIR community.…”

Section: Introductionmentioning

confidence: 77%

“…The central idea of this approach is to circumvent the problematic definition of mid-level features by replacing it (on both sides) with a learned transformation of audio and sheet music data to a common vector space. Dorfer et al (2017a) demonstrated how to utilize this methodology for two sheet music-related real-world applications: (1) piece identification via cross-modality retrieval from audio queries, and (2) audio-to-sheet music alignment using Dynamic Time Warping (DTW) in the learned joint embedding space.…”

Section: Introductionmentioning

confidence: 99%

“…Contribution 1: A New, Large, Open Multimodal Dataset. First experiments by Dorfer et al (2017a) already indicated that the general approach seems to scale very well with the amount of training data available, and that it is important to have as diverse a dataset as possible to arrive at a robust model. (To that end they also applied various data augmentation strategies.)…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Dorfer¹,

Hajič²,

Arzt³

et al. 2018

Transactions of the International Society for Music Information Retrieval

Self Cite

View full text Add to dashboard Cite

This work addresses the problem of matching musical audio directly to sheet music, without any higherlevel abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio-sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet music of high complexity (e.g., nearly the complete solo piano works by Frederic Chopin) and commercial recordings by famous concert pianists. Our results suggest that the proposed method, in combination with the large-scale dataset, yields retrieval models that successfully generalize to data way beyond the synthetic training data used for model building.

show abstract

Section: Introductionmentioning

confidence: 59%

Section: Introductionmentioning

confidence: 77%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Dorfer¹,

Hajič²,

Arzt³

et al. 2018

Transactions of the International Society for Music Information Retrieval

Self Cite

View full text Add to dashboard Cite

show abstract

“…Concerning the third typical case of music stakeholders, many datasets exist for MIR tasks, but they often lack the ability of interoperation. For instance, [16] contains source-separated and mixed audio and video tracks, MIDI scores and frame-level transcriptions; in [12] a dataset containing audio recordings, music scores and sheet music images is used; another interesting multimodal dataset containing time-aligned notes, audio and lyrics is presented in [18]; audio recordings, notes and expressive markings were recently collected in [15]. To date, each of these datasets used its own format for representing the synchronization of music along the various modalities.…”

Section: Applicability To Digital Libraries Repositories and Datasetsmentioning

confidence: 99%

On the Adoption of Standard Encoding Formats to Ensure Interoperability of Music Digital Archives: The IEEE 1599 Format

Ludovico

Baratè

Simonetta

et al. 2019

6th International Conference on Digital Libraries for Musicology

View full text Add to dashboard Cite

With this paper, we want to stimulate the discussion about technologies for inter-operation between various music datasets and collections. Among the many standards for music representation, IEEE 1599 is the only one which was born with the exact purpose of representing the heterogeneous structures of music documents, granting full synchronization of all the different aspects of music (audio recordings, sheet music images, symbolic representations, musicological analysis, etc). We propose the adoption of IEEE 1599 as an interoperability framework between different collections for advanced music experience, musicological applications, and Music Information Retrieval (MIR). In the years to come, the format will undergo a review process aimed at providing an updated/improved version. It is now the perfect time, for all the stakeholders, to come together and discuss how the format can evolve to better support their requirements, enhancing its descriptive strength and available tools. Moreover, this standard can be profitably applied to any field that requires multi-layer and synchronized descriptions. CCS CONCEPTS • Applied computing → Sound and music computing; • Information systems → Digital libraries and archives; • General and reference → Computing standards, RFCs and guidelines.

show abstract

Alignment

2022

An Introduction to Audio Content Analysis

View full text Add to dashboard Cite

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Cited by 53 publications

References 16 publications

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

On the Adoption of Standard Encoding Formats to Ensure Interoperability of Music Digital Archives: The IEEE 1599 Format

Alignment

Contact Info

Product

Resources

About