Document dependent fusion in multimodal music retrieval

Li, Zhonghua; Zhang, Bingjun; Wang, Ye

doi:10.1145/2072298.2071949

Cited by 2 publications

(1 citation statement)

References 6 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first category of methods utilized Long Short‐Term Memory networks (LSTMs), mainly due to their capabilities in modeling sequential data and capturing the temporal dependencies. For instance, Li et al [LMD18] developed a deep neural network system that translates MIDI note data and metric structures into a real‐time skeleton sequence of a pianist playing a keyboard instrument. Their approach combined Convolutional Neural Networks (CNNs) and LSTMs to generate human‐like piano performances.…”

Section: Musical Performance Synthesismentioning

confidence: 99%

Virtual Instrument Performances (VIP): A Comprehensive Review

Kyriakou,

de la Campa Crespo,

Panayiotou

et al. 2024

Computer Graphics Forum

View full text Add to dashboard Cite

Driven by recent advancements in Extended Reality (XR), the hype around the Metaverse, and real‐time computer graphics, the transformation of the performing arts, particularly in digitizing and visualizing musical experiences, is an ever‐evolving landscape. This transformation offers significant potential in promoting inclusivity, fostering creativity, and enabling live performances in diverse settings. However, despite its immense potential, the field of Virtual Instrument Performances (VIP) has remained relatively unexplored due to numerous challenges. These challenges arise from the complex and multi‐modal nature of musical instrument performances, the need for high precision motion capture under occlusions including the intricate interactions between a musician's body and fingers with instruments, the precise synchronization and seamless integration of various sensory modalities, accommodating variations in musicians' playing styles, facial expressions, and addressing instrument‐specific nuances. This comprehensive survey delves into the intersection of technology, innovation, and artistic expression in the domain of virtual instrument performances. It explores musical performance multi‐modal databases and investigates a wide range of data acquisition methods, encompassing diverse motion capture techniques, facial expression recording, and various approaches for capturing audio and MIDI data (Musical Instrument Digital Interface). The survey also explores Music Information Retrieval (MIR) tasks, with a particular emphasis on the Musical Performance Analysis (MPA) field, and offers an overview of various works in the realm of Musical Instrument Performance Synthesis (MIPS), encompassing recent advancements in generative models. The ultimate aim of this survey is to unveil the technological limitations, initiate a dialogue about the current challenges, and propose promising avenues for future research at the intersection of technology and the arts.

show abstract

Section: Musical Performance Synthesismentioning

confidence: 99%