Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space

Zhao, Sicheng; Li, Yaxian; Yao, Xingxu; Nie, Weizhi; Xu, Pengfei; Yang, Jufeng; Keutzer, Kurt

doi:10.1145/3394171.3413776

Cited by 21 publications

(8 citation statements)

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, in [4] the music mood is denoted as the Valence-Arousal value as 2D vectors, which is then mapped to a specific RGB value on a color wheel. In [5], [6], and [7], the moods of both music and images are extracted and compared. Therefore, they allow for the selection of the most relavant music-image pairs within a finite library of music and images.…”

Section: Music Mood Visualizationmentioning

confidence: 99%

“…Inspired by this encoder-decoder-based structure, we also provided a similar solution in [9]. We re-implemented the image-music mapping model proposed in [6] to construct a dataset with corresponding music and landscape pairs. Following that, we implemented a similar encoder-decoder structure with ResNet50 [13] and StyleGAN3 [14] pre-trained on a landscape dataset [15].…”

Section: Music Mood Visualizationmentioning

confidence: 99%

See 1 more Smart Citation

Visual Signatures for Music Mood and Timbre

Wang,

Sourin

2024

Preprint

View full text Add to dashboard Cite

Existing research on music visualization has primarily focused on creating animated visual illustrations to accompany the music being played based on fundamental attributes such as sound frequency or music structure, whereas the higher-level features, including mood and timbre, are mostly overlooked. In this paper, we propose visual signatures to describe the higher-level attributes of music, where the content and the color palette of the visual signatures are controlled by the music mood and timbre, respectively. We expect that the users with different cultural and educational backgrounds will be able to easily interpret the meaning of sound with the proposed visual signatures. In our work, we used a contrastive learning neural network for mood classification and an audio Transformer for timbre classification. The performance of the music classification models is examined by their accuracy, while multiple generated images are displayed to showcase the feasibility of visual signatures.

show abstract

Section: Music Mood Visualizationmentioning

confidence: 99%

Section: Music Mood Visualizationmentioning

confidence: 99%

Visual Signatures for Music Mood and Timbre

Wang,

Sourin

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…The system can achieve the retrieval between Chinese folk music and Chinese folk image based on their involved emotions. Chen et al [172] and Zhao et al [173] designed a system that computes the emotional similarity between music and images. With this system, users can generate the mood-aware music slide shows from their personal album photos.…”

Section: Entertainment Assistantmentioning

confidence: 99%

Affective Image Content Analysis: Two Decades Review and New Perspectives

Zhao

Yao

Yang

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

Self Cite

View full text Add to dashboard Cite

Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.

show abstract

“…Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. [28] There is a team which propose to musicalize images based on their emotions. The extract visual features inspired by the concept of principles-of-art can recognize image emotions.…”

Section: Photo and Music With Emotionmentioning

confidence: 99%