SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="d1e563" altimg="si1.svg"><mml:msup><mml:mrow/><mml:mrow><mml:mo>∘</mml:mo></mml:mrow></mml:msup></mml:math> videos

Berdun, Edurne Bernal; Serrano, Daniel Martin; Pérez, Diego Gutiérrez; Corcoy, Belen Masia

doi:10.1016/j.cag.2022.06.002

Cited by 13 publications

(6 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…in video games or other interactive environments, in order to improve user experience and performance; this can be done, for instance, by helping the user conduct the specific task if they are taking too long. Further, existing approaches that focus on modeling and predicting visual behavior in immersive environments [6,28,46] are primarily trained on datasets containing viewing data from free exploration tasks, making them less effective in modeling behaviors related to other tasks. Looking ahead, we believe that different gaze prediction models could leverage our insights and incorporate behavioral priors, be fine-tuned, or be directly trained with task-dependent gaze data.…”

Section: Discussionmentioning

confidence: 99%

“…Since then, several studies have gathered large datasets of free-viewing behavior [9,54]. Subsequent works have leveraged them to model visual attention, usually based on mechanisms such as visual saliency [6,47] or scanpath prediction [2,46]. More recently, these models have also incorporated auditory cues to account for multimodal attention [10,66].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Task-Dependent Visual Behavior in Immersive Environments: A Comparative Study of Free Exploration, Memory and Visual Search

Malpica,

Martin,

Serrano

et al. 2023

IEEE Trans. Visual. Comput. Graphics

Self Cite

View full text Add to dashboard Cite

Fig. 1: We present a within-subjects systematic study of visual behavior in VR while conducting tasks with different cognitive loads: free exploration, memory, and visual search. We have designed three different scenes (left), and captured head and gaze information in 3D (center ) from 37 participants performing the three different tasks in each of the scenes. Our analysis reveals significant differences on viewing behavior depending on the task: Free exploration (purple) yields longer, more spread fixations; memory (green) also exhibits long fixations, albeit closer in space; last, visual search (orange) elicits shorter fixations closer in space until the target element is found. The rightmost figure illustrates these different behaviors with data collected in our study.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Task-Dependent Visual Behavior in Immersive Environments: A Comparative Study of Free Exploration, Memory and Visual Search

Malpica,

Martin,

Serrano

et al. 2023

IEEE Trans. Visual. Comput. Graphics

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, as VR environments are often dynamic, these models may not be sufficient for certain applications. To address this, some recent works have focused on attention prediction in 360 • videos [3,9,12]. Nevertheless, all these models only take visual stimuli as input, and therefore they do not take into account the potential influence of sound in VR environments [30].…”

Section: Analyzing and Predicting Viewing Behavior In Vrmentioning

confidence: 99%

“…We showcase this application scenario by evaluating the performance of recent state-of-the-art audiovisual 360 • saliency predictors on D-SAV360, specifically those proposed by Cokelek et al [10] and Chao et al [8]. We chose to implement Cokelek et al's method on top of SST-Sal [3], as it requires a video saliency predictor as a base architecture. Please refer to Section S.8 in the supplementary for implementation details.…”

Section: Applications Of Our Datasetmentioning

confidence: 99%

D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos

Bernal-Berdun,

Martin,

Malpica

et al. 2023

IEEE Trans. Visual. Comput. Graphics

Self Cite

View full text Add to dashboard Cite

Fig. 1: We present D-SAV360, the most extensive dataset of viewing behavior on 360 • ambisonic videos to date. We have collected gaze and head data from 87 different participants viewing 85 dynamic 360 • videos with directional ambisonic sound, leading to a total of 4,609 scanpaths, larger than previously available datasets of comparable scope. We have thoroughly analyzed this gathered data, and provide valuable insights about viewing behavior and the importance of factors such as viewing conditions, gender, or the type of content shown. We additionally discuss potential applications for our dataset, including benchmarking of audiovisual saliency models, scanpath prediction, or stitching quality assessment, among others. Our dataset is available at https://graphics.unizar.es/projects/D-SAV360.

show abstract

“…O terceiro grau de liberdade pode ser ajustado para colocar conteúdo de interesse à frente do panorama. Isso pode ser feito manualmente, onde o usuário ajusta a rotação de acordo com seus interesses pessoais, ou até de modo automático, através do uso de técnicas que identificam regiões "interessantes" na imagem esférica usando técnicas de saliência visual [Bernal-Berdun et al 2022].…”

Section: Correção De Orientaçãounclassified

Processamento de Imagens Omnidirecionais e Aplicações

Silveira,

Jung

2023

Escola De Computação PPGC/UFRGS 50 Anos: Transformando Desafios Em Oportunidades Para O Futuro

View full text Add to dashboard Cite

Omnidirectional images and videos have been widely disseminated due to the popularization of devices for capture and visualization. Unlike images captured with perspective projection, omnidirectional media are defined on the surface of a sphere, having a field of view of 360 • × 180 • . Thus, they store the light intensities in the entire region around the capture point, with high potential for use in applications involving immersive augmented, mixed and virtual reality experiences. Although defined in the spherical domain, omnidirectional images are often mapped to a (multi)planar representation, which results in distorted images and degrades the performance of most traditional visual computing algorithms designed to work in the plane. This chapter reviews the spherical camera model, the most common capture devices, and popular (multi)planar representations of omnidirectional media. It also lists the main challenges of omnidirectional visual computing, focusing on the deep learning paradigm, and discusses potential applications. ResumoImagens e vídeos omnidirecionais têm sido amplamente difundidos devido à popularização de dispositivos para captura e visualização. Ao contrário das imagens capturadas com projeção em perspectiva, as mídias omnidirecionais são definidas sobre a superfície de uma esfera, tendo um campo de visão de 360 • × 180 • . Assim, elas armazenam as intensidades de luz em toda região em torno do ponto de captura, com alto potencial de uso em aplicações que envolvem experiências imersivas de realidade aumentada, mista e virtual. Embora definidas no domínio esférico, as imagens omnidirecionais muitas vezes são mapeadas para uma representação (multi) planar, o que resulta em imagens distorcidas Vídeo com a apresentação do capítulo: https://youtu.be/rqLWrSRm-Y0

show abstract

SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360∘ videos

Cited by 13 publications

References 42 publications

Task-Dependent Visual Behavior in Immersive Environments: A Comparative Study of Free Exploration, Memory and Visual Search

Task-Dependent Visual Behavior in Immersive Environments: A Comparative Study of Free Exploration, Memory and Visual Search

D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos

Processamento de Imagens Omnidirecionais e Aplicações

Contact Info

Product

Resources

About