Image memorability prediction is a recent topic in computer science. First attempts have shown that it is possible to computationally infer from the intrinsic properties of an image the extent to which it is memorable. In this paper, we introduce a fine-tuned deep learning-based computational model for image memorability prediction. The performance of this model significantly outperforms previous work and obtains a 32.78% relative increase compared to the bestperforming model from the state of the art on the same dataset. We also investigate how our model generalizes on a new dataset of 150 images, for which memorability and affective scores were collected from 50 participants. The prediction performance is weaker on this new dataset, which highlights the issue of representativity of the datasets. In particular, the model obtains a higher predictive performance for arousing negative pictures than for neutral or arousing positive ones, recalling how important it is for a memorability dataset to consist of images that are appropriately distributed within the emotional space. CCS Concepts •Computing methodologies → Computer vision; Scene understanding; •Human-centered computing → Human computer interaction (HCI);
Memorability can be regarded as a useful metric of video importance to help make a choice between competing videos. Research on computational understanding of video memorability is however in its early stages. There is no available dataset for modelling purposes, and the few previous attempts provided protocols to collect video memorability data that would be difficult to generalize. Furthermore, the computational features needed to build a robust memorability predictor remain largely undiscovered. In this article, we propose a new protocol to collect long-term video memorability annotations. We measure the memory performances of 104 participants from weeks to years after memorization to build a dataset of 660 videos for video memorability prediction. This dataset is made available for the research community. We then analyze the collected data in order to better understand video memorability, in particular the effects of response time, duration of memory retention and repetition of visualization on video memorability. We finally investigate the use of various types of audio and visual features and build a computational model for video memorability prediction. We conclude that high level visual semantics help better predict the memorability of videos.
Delivering the same digital image to several users is not necessarily providing them the same experience. In this study, we focused on how different affective experiences impact the memorability of an image. Forty-nine participants took part in an experiment in which they saw a stream of images conveying various emotions. One day later, they had to recognize the images displayed the day before and rate them according to the positivity/negativity of the emotional experience the images induced. In order to better appreciate the underlying idiosyncratic factors that affect the experience under test, prior to the test session we collected not only personal information but also results of psychological tests to characterize individuals according to their dominant personality in terms of masculinity-femininity (Bem Sex Role Inventory) and to measure their emotional state. The results show that the way an emotional experience is rated depends on personality rather than biological sex, suggesting that personality could be a mediator in the well-established differences in how males and females experience emotional material. From the collected data, we derive a model including individual factors relevant to characterize the memorability of the images, in particular through the emotional experience they induced.
Adversarial examples mainly exploit changes to input pixels to which humans are not sensitive to, and arise from the fact that models make decisions based on uninterpretable features. Interestingly, cognitive science reports that the process of interpretability for human classification decision relies predominantly on low spatial frequency components. In this paper, we investigate the robustness to adversarial perturbations of models enforced during training to leverage information corresponding to different spatial frequency ranges. We show that it is tightly linked to the spatial frequency characteristics of the data at stake. Indeed, depending on the data set, the same constraint may results in very different level of robustness (up to 0.41 adversarial accuracy difference). To explain this phenomenon, we conduct several experiments to enlighten influential factors such as the level of sensitivity to high frequencies, and the transferability of adversarial perturbations between original and low-pass filtered inputs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.