2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00735
|View full text |Cite
|
Sign up to set email alerts
|

VRSTC: Occlusion-Free Video Person Re-Identification

Abstract: Video person re-identification (re-ID) plays an important role in surveillance video analysis. However, the performance of video re-ID degenerates severely under partial occlusion. In this paper, we propose a novel network, called Spatio-Temporal Completion network (STCnet), to explicitly handle partial occlusion problem. Different from most previous works that discard the occluded frames, STCnet can recover the appearance of the occluded parts. For one thing, the spatial structure of a pedestrian frame can be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
123
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 210 publications
(123 citation statements)
references
References 38 publications
(64 reference statements)
0
123
0
Order By: Relevance
“…They divided the pedestrian region into five areas based on the empirical ratio [7], and estimated the occlusion status of each area using the micro neural network equipped in the proposed unit. Hou et al [13] believed that occlusion usually occurred in successive frames, and the occluders have different semantic features from the original body parts. So Hou et al calculated the cosine similarity between frame region feature and video region feature as a criteria score, and the area is considered as being occluded when the score is lower than the prescribed threshold.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…They divided the pedestrian region into five areas based on the empirical ratio [7], and estimated the occlusion status of each area using the micro neural network equipped in the proposed unit. Hou et al [13] believed that occlusion usually occurred in successive frames, and the occluders have different semantic features from the original body parts. So Hou et al calculated the cosine similarity between frame region feature and video region feature as a criteria score, and the area is considered as being occluded when the score is lower than the prescribed threshold.…”
Section: Related Workmentioning
confidence: 99%
“…Before feature extraction, we resize each video frame to 128 × 171 pixels. Limited to the 4GB RAM of NVIDIA GTX 1050Ti GPU, the [29] 139.59M 63.3% DenseNet-121 [15] 6.95M 64.9% Zhang et al [45] 0.04M 44.5% Hou et al [13] 23.51M 38.0% R(2+1)D [33] 33.18M 57.4% C3D(single output) [32] 107.36M 74.8% C3D(multiple output) [32] 107.36M 77.7% Our Method(without correction) 59.64M 82.1% Our Method(with correction) 59.64M 84.0%…”
Section: Experiments 51 Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…For small-scale person re-identification, there are relatively few deep learning methods, most of which focus on how to expand the training data [36][37][38][39]. Chen et al [40] proposed a cross-domain architecture that could use an auxiliary training set.…”
Section: Related Workmentioning
confidence: 99%
“…This leads to a first class of problems since, as known, visual features have many weaknesses, including illumination changes, shadows, direction of light, and many others. Another class of problems regards the background clutter [ 28 , 29 ] and occlusions [ 30 , 31 ], which tend, in uncontrolled environments, to lower system performances in term of accuracy. A final class of problems, very important from a practical point of view, is referred to the long-term re-identification and camouflage [ 32 , 33 ].…”
Section: Introductionmentioning
confidence: 99%