Topic-Oriented Text Features Can Match Visual Deep Models of Video Memorability

Kleinlein, Ricardo; Jiménez, Cristina Luna; Arias-Cuadrado, David; Ferreiros, Javier; Fernández-Martínez, Fernando

doi:10.3390/app11167406

Cited by 5 publications

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…1. Game Zone: Among the mainstream video divisions, the game partition exhibits the highest indicator weighting results in the creator and platform dimensions, suggesting that game videos in this category can generate more tangible benefits and value for creators and platforms [49]. However, compared to other partitions, the weighted results of indicators in the user dimension are average, indicating a relatively lower value contribution of game short videos to users.…”

Section: Discussionmentioning

confidence: 99%

Value Assessment of UGC Short Videos through Element Mining and Data Analysis

Fang,

Ni,

Zhang

2023

Applied Sciences

View full text Add to dashboard Cite

UGC short videos play a crucial role in sharing information and disseminating content in the era of new information technology. Accurately assessing the value of UGC short videos is highly significant for the sustainable development of self-media platforms and the secure governance of cyberspace. This study proposes a method for assessing the value of UGC short videos from the perspective of element mining and data analysis. The method involves three steps. Firstly, the text clustering algorithm and topic mapping visualization technology are utilized to identify elements for assessing the value of UGC short videos and construct an assessment index system. Secondly, structured data indexes are quantified using platform data statistics, while unstructured data indexes are quantified using the LSTM fine-grained sentiment analysis model. Lastly, the VIKOR model, incorporating an improved gray correlation coefficient, is employed to effectively evaluate the value of UGC short videos. The empirical results indicate that the value of current domestic UGC short videos is primarily associated with three dimensions: the creators, the platforms, and the users. It encompasses 11 value elements, including fan popularity, economic returns of creation, and frequency of interaction. Additionally, we assess the value of short videos within the mainstream partitions of the Bilibili platform and generate a value radar chart. Our findings reveal that short videos in game partitions generate higher revenue for creators and platforms but may neglect users’ needs for knowledge, culture, and other content. Conversely, short videos in the knowledge, food, and music partitions demonstrate specific distinctions in fulfilling users’ requirements. Ultimately, we offer personalized recommendations for the future development of high-value UGC short videos within the mainstream partitions.

show abstract

Section: Discussionmentioning

confidence: 99%

Value Assessment of UGC Short Videos through Element Mining and Data Analysis

Fang,

Ni,

Zhang

2023

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…For instance, Opal (Liu et al, 2022c) enables structured search for visual concepts, Generative Disco (Liu et al, 2023a) facilitates text-tovideo generation for music visualisation, and Reel-Framer (Wang et al, 2023) aids in transforming written news stories into engaging video narratives for journalists. Nonetheless, despite their success at generating creative imagery, they still struggle to visualise figurative language effectively (Kleinlein et al, 2022;Chakrabarty et al, 2023;Akula et al, 2023). Furthermore, research by Chakrabarty et al (2023);Akula et al (2023) reveals that DALL•E 2 outperforms Stable Diffusion in representing figurative language.…”

Section: Text-to-image Generationmentioning

confidence: 99%

“…In advertising, they frequently serve as persuasive tools to evoke positive attitudes (Phillips and McQuarrie, 2004;McQuarrie and Mick, 1999;Jahameh and Zibin, 2023). While humans effortlessly interpret images with metaphorical content (Yosef et al, 2023), state-of-the-art text-to-image models such as DALL.E 2 (Ramesh et al, 2022) and Stable Diffusion (Rombach et al, 2022) still struggle to synthesise meaningful images for such abstract and figurative expressions (Kleinlein et al, 2022;Chakrabarty et al, 2023;Akula et al, 2023).…”

Section: Introductionmentioning

confidence: 99%

ViPE: Visualise Pretty-much Everything

Shahmohammadi,

Ghosh,

Lensch

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Figurative and non-literal expressions are profoundly integrated in human communication.Visualising such expressions allow us to convey our creative thoughts, and evoke nuanced emotions. Recent text-to-image models like Stable Diffusion, on the other hand, struggle to depict non-literal expressions. Recent works primarily deal with this issue by compiling humanly annotated datasets on a small scale, which not only demands specialised expertise but also proves highly inefficient. To address this issue, we introduce ViPE: Visualise Pretty-much Everything. ViPE offers a series of lightweight and robust language models that have been trained on a large-scale set of lyrics with noisy visual descriptions that represent their implicit meaning. The synthetic visual descriptions are generated by GPT3.5 relying on neither human annotations nor images. ViPE effectively expresses any arbitrary piece of text into a visualisable description, enabling meaningful and high-quality image generation. We provide compelling evidence that ViPE is more robust than GPT3.5 in synthesising visual elaborations. ViPE also exhibits an understanding of figurative expressions comparable to human experts, providing a powerful and open-source backbone to many downstream applications such as music video and caption generation.

show abstract

“…Lately, the emphasis has been put on understanding the connection between the global semantics of an image (its visual constituent elements) and memorability. It has been shown that there exists a close correlation between certain topics and average memorability scores [12]. Therefore, even if many factors contribute to the memorability of a given sample, it seems that the main topic of a video (its semantic unit), extracted from text-based sources like captions, may be used as a proxy material to estimate its semantics and tackle the task of predicting memorability.…”

Section: Related Workmentioning

confidence: 99%

“…Recent studies from psychology and neurosciences seem to disagree with the idea that memory is an entirely subjective appraisal, instead suggesting that there are indeed visual elements that are more likely to be stored in memory for later recall [8,15,25]. Memorability is an observer-independent aspect of the visual medium, greatly influenced by the semantics of the scenes it represents [3], which motivates the use of alternative sources to analyse it beyond the purely visual domain, for instance, employing text-based captions that describe a scene [12].…”

Section: Introductionmentioning

confidence: 99%

Video Memorability Prediction From Jointly-learnt Semantic and Visual Features

Martín-Fernández,

Kleinlein,

Luna-Jiménez

et al. 2023

20th International Conference on Content-Based Multimedia Indexing

Self Cite

View full text Add to dashboard Cite

The memorability of a video is defined as an intrinsic property of its visual features that dictates the fraction of people who recall having watched it on a second viewing within a memory game. Still, unravelling what are the key features to predict memorability remains an obscure matter. This challenge is addressed here by fine-tuning text and image encoders using a cross-modal strategy known as Contrastive Language-Image Pre-training (CLIP). The resulting video-level data representations learned include semantics and topic-descriptive information as observed from both modalities, hence enhancing the predictive power of our algorithms. Our proposal achieves in the text domain a significantly greater Spearman Rank Correlation Coefficient (SRCC) than a default pre-trained text encoder (0.575 ± 0.007 and 0.538 ± 0.007, respectively) over the Me-mento10K dataset. A similar trend, although less pronounced, can be noticed in the visual domain. We believe these findings signal the potential benefits that cross-modal predictive systems can extract from being fine-tuned to the specific issue of media memorability. CCS CONCEPTS• Information systems → Multimedia information systems.

show abstract

Topic-Oriented Text Features Can Match Visual Deep Models of Video Memorability

Cited by 5 publications

References 27 publications

Value Assessment of UGC Short Videos through Element Mining and Data Analysis

Value Assessment of UGC Short Videos through Element Mining and Data Analysis

ViPE: Visualise Pretty-much Everything

Video Memorability Prediction From Jointly-learnt Semantic and Visual Features

Contact Info

Product

Resources

About