2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00559
|View full text |Cite
|
Sign up to set email alerts
|

Gaze Prediction in Dynamic 360° Immersive Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
127
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 218 publications
(127 citation statements)
references
References 24 publications
0
127
0
Order By: Relevance
“…However, these approaches in [4], [5] use naive models and ignore video content's relation to future movement, thus are less accurate. Other existing works such as in [6]- [10] combine both the video content features and orientation of HMD to predict the future head movement. In [6], the authors use a pre-trained saliency model to predict head movement.…”
Section: A Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, these approaches in [4], [5] use naive models and ignore video content's relation to future movement, thus are less accurate. Other existing works such as in [6]- [10] combine both the video content features and orientation of HMD to predict the future head movement. In [6], the authors use a pre-trained saliency model to predict head movement.…”
Section: A Related Workmentioning
confidence: 99%
“…Although the works in [6]- [10] use both video saliency and history head orientation to predict future head movement, these prior works do not directly treat video frame content in much detail. Since VR videos contain various scenes, and further, each scene has different regions of interest for users.…”
Section: A Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared with our setting, their prediction horizon is very short: only 100 ∼ 500 ms. [3] proposed a fixation prediction network that concurrently leverages past FoV locations and video content features to predict the FoV trajectory or tile-based viewing probability maps in the next n frames. In [4], an LSTM is used to encode the history of the FoV scan path and the hidden state features are combined with the visual features to do prediction up to 1 second ahead. A more recent work [5] proposed two deep reinforcement learning models: one offline model is first used to estimate the heatmap of potential FoV for each frame based on the visual features only, an online model is then used to predict the head movement based on the past observed head locations as well as the heatmaps from the offline model.…”
Section: Related Workmentioning
confidence: 99%
“…In related works such as [2] [4], future FoVs are predicted by unrolling a single trained LSTM model. However, such single LSTM model is appropriate only if the input data types and data distribution in the past are the same as that in the future.…”
Section: A Prediction Based On User's Own Pastmentioning
confidence: 99%