Proceedings of the 19th ACM International Conference on Multimedia 2011
DOI: 10.1145/2072298.2071949
|View full text |Cite
|
Sign up to set email alerts
|

Document dependent fusion in multimodal music retrieval

Abstract: In this paper, we propose a novel multimodal fusion framework, document dependent fusion (DDF), which derives the optimal combination strategy for each individual document in the fusion process. For each document, we derive a document weight vector by estimating the descriptive abilities of its different modalities. The document weight vector also enables our framework to be easily integrated with existing multimodal fusion schemes, and achieve a better combination strategy for each document given a query. Exp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 6 publications
(10 reference statements)
0
1
0
Order By: Relevance
“…The first category of methods utilized Long Short‐Term Memory networks (LSTMs), mainly due to their capabilities in modeling sequential data and capturing the temporal dependencies. For instance, Li et al [LMD18] developed a deep neural network system that translates MIDI note data and metric structures into a real‐time skeleton sequence of a pianist playing a keyboard instrument. Their approach combined Convolutional Neural Networks (CNNs) and LSTMs to generate human‐like piano performances.…”
Section: Musical Performance Synthesismentioning
confidence: 99%
“…The first category of methods utilized Long Short‐Term Memory networks (LSTMs), mainly due to their capabilities in modeling sequential data and capturing the temporal dependencies. For instance, Li et al [LMD18] developed a deep neural network system that translates MIDI note data and metric structures into a real‐time skeleton sequence of a pianist playing a keyboard instrument. Their approach combined Convolutional Neural Networks (CNNs) and LSTMs to generate human‐like piano performances.…”
Section: Musical Performance Synthesismentioning
confidence: 99%