2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00699
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Face Video Segmentation via Reinforcement Learning

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 26 publications
(17 citation statements)
references
References 37 publications
0
17
0
Order By: Relevance
“…It is unnecessary to re-compute the whole current frame given that we have already obtained the features of previous frames: we can use the features in the previous frames to accelerate the analysis of the current frame. Such techniques have been studied in the field of video segmentation, and we have adopted the Deep Feature Flow [9] framework following [20], [16] to address temporal redundancy. Specifically, if a current frame is determined as a non-key frame with low resolution, we use FlowNet [8] to obtain the optical flow between the current frame and the last key frame, and the computed optical flow is used to propagate features from the last key frame into the current one such that the performance drop caused by lowresolution frames can be compensated for.…”
Section: B Dynamics Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…It is unnecessary to re-compute the whole current frame given that we have already obtained the features of previous frames: we can use the features in the previous frames to accelerate the analysis of the current frame. Such techniques have been studied in the field of video segmentation, and we have adopted the Deep Feature Flow [9] framework following [20], [16] to address temporal redundancy. Specifically, if a current frame is determined as a non-key frame with low resolution, we use FlowNet [8] to obtain the optical flow between the current frame and the last key frame, and the computed optical flow is used to propagate features from the last key frame into the current one such that the performance drop caused by lowresolution frames can be compensated for.…”
Section: B Dynamics Analysismentioning
confidence: 99%
“…if at = a 1 then if key action 16: Yt ← {bt, ct, st, dt} put together current predictions 28: end for Output: Energy-efficient video analytics results {Y0, Y1, ..., YN } state P sensor,idle is 141.8 mW and that in P sensor,active is 8.27 mW/MP•R + 17.364 mW + 113.03 mW. We use a T exp of 20 ms in the following study.…”
Section: B Experimental Settings 1) Evaluation Platformmentioning
confidence: 99%
“…This is inspired by [3], a work showing that statistical eye information such as pupil size can help to improve the emotion recognition accuracy. Denoting the pupil size information as ps 𝑡 ∈ R 2 , we treat ps 𝑡 as an expert information, and following [60,68], we concatenate this expert information ps 𝑡 with f 𝑡 𝑒 , which can be written as…”
Section: Extracting Eye Featuresmentioning
confidence: 99%
“…A following fully-connected layer FC0 gradually reduces the channels into two representing the probability of the binary action . In addition, inspired by the work of Wang et al [46], we append the likelihood L of historical eye movement types with the input tensor of FC0 layer to supervise the learning during training and help with the inference during testing.…”
Section: Pipeline Of the Tva Networkmentioning
confidence: 99%