2021
DOI: 10.1109/tpami.2019.2924417
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting Video Saliency Prediction in the Deep Learning Era

Abstract: Predicting where people look in static scenes, a.k.a visual saliency, has received significant research interest recently. However, relatively less effort has been spent in understanding and modeling visual attention over dynamic scenes. This work makes three contributions to video saliency research. First, we introduce a new benchmark, called DHF1K (Dynamic Human Fixation 1K), for predicting fixations during dynamic scene free-viewing, which is a long-time need in this field. DHF1K consists of 1K high-quality… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
112
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 240 publications
(116 citation statements)
references
References 90 publications
(123 reference statements)
0
112
0
Order By: Relevance
“…Accordingly, we defined two visual tasks participants had to perform: the first condition was an FV task while the second relates to a surveillance viewing task (Task). The former task is rather common in eye-tracking tests [32,37,43,45,50,71]. Observers were simply asked to observe visual video stimuli without performing any task.…”
Section: Visual Tasks To Performmentioning
confidence: 99%
See 1 more Smart Citation
“…Accordingly, we defined two visual tasks participants had to perform: the first condition was an FV task while the second relates to a surveillance viewing task (Task). The former task is rather common in eye-tracking tests [32,37,43,45,50,71]. Observers were simply asked to observe visual video stimuli without performing any task.…”
Section: Visual Tasks To Performmentioning
confidence: 99%
“…While it is now rather easy to find eye tracking data on typical images [35,[37][38][39][40][41][42][43][44][45] or videos [46][47][48][49][50], and that there are many UAV content datasets [7,[51][52][53][54][55][56][57][58][59][60][61][62], it turns out to be extremely difficult to find eye-tracking data on UAV content. This is even truer when we consider dynamic salience, which refers to salience for video content.…”
mentioning
confidence: 99%
“…In a technical context, a pattern can involve repeating sequences of data with time, and patterns can be utilized to predict trends and specific featural configurations in images to recognize objects. Many recognition approaches involving the use of the support vector machine (SVM) (Junoh et al, 2012), artificial neural network (ANN) (Petrosino & Maddalena, 2012), deep learning (Wang et al, 2019), and other rule-based classification systems have been developed. Performing classification using an ANN is a supervised practical strategy that has achieved satisfactory results in many classification tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Rather than simply remapping the color histogram or normalizing an image for nearby luminance, automatic mechanisms are thought to depend on factors such as co-orientation, co-linearity, co-circularity, co-planarity, junctions, feature grouping, and transparency issues, such as smoke and rain ( Adelson, 2000 ; Anderson, 1997 ; Li, Song, Xu, Hu, Roe, & Li, 2019 ; Zemach & Rudd, 2007 ; Zucker, David, Dobbins, & Iverson, 1988 ). Biologically driven models are increasingly capable of explaining visual illusions ( Blakeslee & McCourt, 2004 ; Li, 2011 ) and predicting gaze patterns based on saliency ( Kümmerer, Wallis, Gatys, & Bethge, 2017 ; Wang, Shen, Xie, Cheng, Ling, & Borji, 2019 ), but they are data-limited to SDR images and require HDR experimentation to extend their generalizability to real-world vision.…”
Section: Introductionmentioning
confidence: 99%