Image perception can vary considerably between subjects, yet some sights are regarded as aesthetically pleasant more often than others due to their specific visual content, this being particularly true in tourism-related applications. We introduce the ESITUR project, oriented towards the development of ’smart tourism’ solutions aimed at improving the touristic experience. The idea is to convert conventional tourist showcases into fully interactive information points accessible from any smartphone, enriched with automatically-extracted contents from the analysis of public photos uploaded to social networks by other visitors. Our baseline, knowledge-driven system reaches a classification accuracy of 64.84 ± 4.22% telling suitable images from unsuitable ones for a tourism guide application. As an alternative we adopt a data-driven Mixture of Experts (MEX) approach, in which multiple learners specialize in partitions of the problem space. In our case, a location tag is attached to every picture providing a criterion to segment the data by, and the MEX model accordingly defined achieves an accuracy of 85.08 ± 2.23%. We conclude ours is a successful approach in environments in which some kind of data segmentation can be applied, such as touristic photographs.
The electrodermal activity (EDA) is a psychophysiological indicator which can be considered a somatic marker of the emotional and attentional reaction of subjects towards stimuli. EDA measurements are not biased by the cognitive process of giving an opinion or a score to characterize the subjective perception, and group-level EDA recordings integrate the reaction of the whole audience, thus reducing the signal noise. This paper contributes to the fiel of affective video content analysis, extending previous novel work on the use of EDA as ground truth for prediction algorithms. Here, we label short video clips according to the audience's emotion (high vs. low) and attention (increasing vs. decreasing), derived from EDA records. Then, we propose a set of low-level audiovisual descriptors and train binary classifier that predict the emotion and attention with 75% and 80% accuracy, respectively. These results, along with those of previous works, reinforce the usefulness of such low-level audiovisual descriptors to model video in terms of the induced affective response.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.