2019
DOI: 10.1109/tmm.2018.2856090
|View full text |Cite
|
Sign up to set email alerts
|

Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications

Abstract: We introduce a dataset for facilitating audio-visual analysis of music performances. The dataset comprises 44 simple multi-instrument classical music pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece, we provide the musical score in MIDI format, the audio recordings of the individual tracks, the audio and video recording of the assembled mixture, and ground-truth annotation files including frame-level and note-level transcriptions. We describe our metho… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
110
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 124 publications
(129 citation statements)
references
References 43 publications
0
110
0
Order By: Relevance
“…2. Results in terms of SDR, SIR, and SAR averaged and reported by the number of instruments in the testing set of URMP [17] dataset.…”
Section: Resultsmentioning
confidence: 99%
“…2. Results in terms of SDR, SIR, and SAR averaged and reported by the number of instruments in the testing set of URMP [17] dataset.…”
Section: Resultsmentioning
confidence: 99%
“…Therefore, we compose two novel datasets to train and evaluate our models, and they are a Subset of URMP (Sub-URMP) dataset and a ImageNet Image-Sound (INIS) dataset. Sub-URMP dataset is composed from the original URMP dataset [11]. It contains 13 music instrument categories.…”
Section: Datasetsmentioning
confidence: 99%
“…To explore this new problem space, we compose two datasets, e.g., Sub-URMP and INIS. The Sub-URMP dataset consists of paired images and sounds extracted from 107 single-instrument musical performance videos of 13 kinds of instruments in the University of Rochester Musical Performance (URMP) dataset [11]. In total 17,555 images are extracted and each image is paired with a halfsecond long sound clip.…”
Section: Introductionmentioning
confidence: 99%
“…There are some similar works that generate images condition on sounds, such as [19] [20]. In these works, they use different dataset called Sub-URMP [19] [21] which is composed of sounds of musical performances with monotonous background and similar composition in images. By using different training scenario, they achieve the goal of generating images which depict a single person with an instrument correspond to input sound.…”
Section: Related Workmentioning
confidence: 99%