This work addresses the problem of matching musical audio directly to sheet music, without any higherlevel abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio-sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet music of high complexity (e.g., nearly the complete solo piano works by Frederic Chopin) and commercial recordings by famous concert pianists. Our results suggest that the proposed method, in combination with the large-scale dataset, yields retrieval models that successfully generalize to data way beyond the synthetic training data used for model building.
Did Ludwig van Beethoven (1770-1827) re-use material when composing his piano sonatas? What repeated patterns are distinctive of Beethoven's piano sonatas compared, say, to those of Frédéric Chopin (1810-1849)? Traditionally, in preparation for essays on topics such as these, music analysts have undertaken inter-opus pattern discovery-informally or systematically-which is the task of identifying two or more related note collections (or phenomena derived from those collections, such as chord sequences) that occur in at least two different movements or pieces of music. More recently, computational methods have emerged for tackling the inter-opus pattern discovery task, but often they make simplifying and problematic assumptions about the nature of music. Thus a gulf exists between the flexibility music analysts employ when considering two note collections to be related, and what algorithmic methods can achieve. By unifying contributions from the two main approaches to computational pattern discovery-viewpoints and the geometric method-via the technique of symbolic fingerprinting, the current chapter seeks to reduce this gulf. Results from six experiments are summarized that investigate questions related to borrowing, resemblance, and distinctiveness across 21 Beethoven piano sonata movements. Among these results, we found 2-3 bars of material that occurred across two sonatas, an andante theme that appears varied in an imitative minuet, patterns with leaps that are distinctive of Beethoven compared to Chopin, and two potentially new examples of what Meyer and Gjerdingen call schemata. The chapter does not solve the problem of inter-opus pattern discovery, but it can act as a platform for research that will further reduce the gap between what music informaticians do, and what musicologists find interesting.
This work addresses the problem of matching musical audio directly to sheet music, without any higherlevel abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio-sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet music of high complexity (e.g., nearly the complete solo piano works by Frederic Chopin) and commercial recordings by famous concert pianists. Our results suggest that the proposed method, in combination with the large-scale dataset, yields retrieval models that successfully generalize to data way beyond the synthetic training data used for model building.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.