2023
DOI: 10.1109/tmm.2021.3139743
|View full text |Cite
|
Sign up to set email alerts
|

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Abstract: This work focuses on the persistent monitoring problem, where a set of targets moving based on an unknown model must be monitored by an autonomous mobile robot with a limited sensing range. To keep each target's position estimate as accurate as possible, the robot needs to adaptively plan its path to (re-)visit all the targets and update its belief from measurements collected along the way. In doing so, the main challenge is to strike a balance between exploitation, i.e., re-visiting previously-located targets… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(18 citation statements)
references
References 94 publications
(50 reference statements)
0
16
0
Order By: Relevance
“…By implementing psychophysically uncovered mechanisms of attentional and oculomotor control, ScanDy allows to generate sequences of eye movements for any visual scene. Recent years have shown a growing interest in the simulation of time-ordered fixation sequences for static scenes (Tatler, Brockmole, and Carpenter, 2017; Malem-Shinitski et al, 2020; Schwetlick, Rothkegel, et al, 2020; Schwetlick, Backhaus, and Engbert, 2022; Kucharsky et al, 2021; Kümmerer, Bethge, and Wallis, 2022), as well as the frame-wise prediction of where humans tend to look on average when observing a dynamic scene (Molin, Etienne-Cummings, and Niebur, 2015; Min and Corso, 2019; Droste, Jiao, and Noble, 2020; Wang, Liu, et al, 2021). We are currently not aware of another computational model that is able to simulate time-resolved gaze positions for the full duration of dynamic scenes, analogous to human eye tracking data.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…By implementing psychophysically uncovered mechanisms of attentional and oculomotor control, ScanDy allows to generate sequences of eye movements for any visual scene. Recent years have shown a growing interest in the simulation of time-ordered fixation sequences for static scenes (Tatler, Brockmole, and Carpenter, 2017; Malem-Shinitski et al, 2020; Schwetlick, Rothkegel, et al, 2020; Schwetlick, Backhaus, and Engbert, 2022; Kucharsky et al, 2021; Kümmerer, Bethge, and Wallis, 2022), as well as the frame-wise prediction of where humans tend to look on average when observing a dynamic scene (Molin, Etienne-Cummings, and Niebur, 2015; Min and Corso, 2019; Droste, Jiao, and Noble, 2020; Wang, Liu, et al, 2021). We are currently not aware of another computational model that is able to simulate time-resolved gaze positions for the full duration of dynamic scenes, analogous to human eye tracking data.…”
Section: Discussionmentioning
confidence: 99%
“…With the availability of larger datasets in recent years (Marszalek, Laptev, and Schmid, 2009; Wang, Shen, et al, 2018), video saliency detection has also become a popular task in computer vision. Deep neural network (DNN) architectures, which include the temporal information in videos either through temporal recurrence (Linardos et al, 2019; Droste, Jiao, and Noble, 2020) or by using 3D convolutional networks (Min and Corso, 2019; Jain et al, 2021; Wang, Liu, et al, 2021), clearly outperform mechanistic models from computational neuroscience and psychology in predicting where humans tend to look. This boost in performance can be explained by the capabilities of these networks not just to encode information on low-level features like color or edges.…”
Section: Introductionmentioning
confidence: 99%
“…By implementing psychophysically uncovered mechanisms of attentional and oculomotor control, ScanDy allows to generate sequences of eye movements for any visual scene. Recent years have shown a growing interest in the simulation of time-ordered fixation sequences for static scenes [ 14 17 , 22 , 23 ], as well as the frame-wise prediction of where humans tend to look on average when observing a dynamic scene [ 33 , 37 , 38 , 40 ]. We are currently not aware of another computational model that is able to simulate time-resolved gaze positions for the full duration of dynamic scenes, analogous to human eye tracking data.…”
Section: Discussionmentioning
confidence: 99%
“…With the availability of larger datasets in recent years [ 34 , 35 ], video saliency detection has also become a popular task in computer vision. Deep neural network (DNN) architectures, which include the temporal information in videos either through temporal recurrence [ 36 , 37 ] or by using 3D convolutional networks [ 38 – 40 ], clearly outperform mechanistic models from computational neuroscience and psychology in predicting where humans tend to look. This boost in performance can be explained by the capabilities of these networks not just to encode information on low-level features like color or edges.…”
Section: Introductionmentioning
confidence: 99%
“…This model jointly modified the GATs and the self-attention mechanism that fully dynamically focused and integrated spatial, temporal and periodic correlations. Wang et al [30] proposed a novel spatial-temporal self-attention 3D network (STSANet) for video prediction, which integrated self-attention into 3D convolutional network to perceive contextual contents in semantic and spatiotemporal subspaces and narrows semantic and spatiotemporal gaps during saliency feature fusion. Chaabane et al [31] used an adapted self-attention convolutional neural network to highlight the temporal evolution of land cover areas through the construction of a spatiotemporal map.…”
Section: B Attention Mechanism In Time Series Data Predictionmentioning
confidence: 99%