VRSTC: Occlusion-Free Video Person Re-Identification

Hou, Ruibing; Ma, Bingpeng; Chang, Hong; Gu, Xinqian; Shan, Shiguang; Chen, Xilin

doi:10.1109/cvpr.2019.00735

Cited by 210 publications

(123 citation statements)

References 38 publications

(64 reference statements)

Supporting

Mentioning

123

Contrasting

Order By: Relevance

“…They divided the pedestrian region into five areas based on the empirical ratio [7], and estimated the occlusion status of each area using the micro neural network equipped in the proposed unit. Hou et al [13] believed that occlusion usually occurred in successive frames, and the occluders have different semantic features from the original body parts. So Hou et al calculated the cosine similarity between frame region feature and video region feature as a criteria score, and the area is considered as being occluded when the score is lower than the prescribed threshold.…”

Section: Related Workmentioning

confidence: 99%

“…Before feature extraction, we resize each video frame to 128 × 171 pixels. Limited to the 4GB RAM of NVIDIA GTX 1050Ti GPU, the [29] 139.59M 63.3% DenseNet-121 [15] 6.95M 64.9% Zhang et al [45] 0.04M 44.5% Hou et al [13] 23.51M 38.0% R(2+1)D [33] 33.18M 57.4% C3D(single output) [32] 107.36M 74.8% C3D(multiple output) [32] 107.36M 77.7% Our Method(without correction) 59.64M 82.1% Our Method(with correction) 59.64M 84.0%…”

Section: Experiments 51 Implementation Detailsmentioning

confidence: 99%

“…If the visibility score of any region is less than the 0.65 set in the original paper, we consider that the frame is occluded. For the task of video person re-identification, Hou et al [13] used the similarity score to determine whether the occlusion occurred. According to their method, each frame are divided into three regions (upper, middle, and lower) and the ResNet-50 [11] trained on our occlusion detection dataset was utilized to extract the features of each region, called region feature.…”

Section: Comparison With the State-of-the-artmentioning

confidence: 99%

See 2 more Smart Citations

Occlusion Detection for Automatic Video Editing

Liao

Duan

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Videos have become the new preference comparing with images in recent years. However, during the recording of videos, the cameras are inevitably occluded by some objects or persons that pass through the cameras, which would highly increase the workload of video editors for searching out such occlusions. In this paper, for releasing the burden of video editors, a frame-level video occlusion detection method is proposed, which is a fundamental component of automatic video editing. The proposed method enhances the extraction of spatial-temporal information based on C3D yet only using around half amount of parameters, with an occlusion correction algorithm for correcting the prediction results. In addition, a novel loss function is proposed to better extract the characterization of occlusion and improve the detection performance. For performance evaluation, this paper builds a new large scale dataset, containing 1,000 video segments from seven different real-world scenarios, which could be available at: https://junhua-liao.github.io/Occlusion-Detection/. All occlusions in video segments are annotated frame by frame with bounding-boxes so that the dataset could be utilized in both frame-level occlusion detection and precise occlusion location. The experimental results illustrate that the proposed method could achieve good performance on video occlusion detection compared with the state-of-the-art approaches. To the best of our knowledge, * Both authors contributed equally to this research.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Experiments 51 Implementation Detailsmentioning

confidence: 99%

Section: Comparison With the State-of-the-artmentioning

confidence: 99%

See 1 more Smart Citation

Occlusion Detection for Automatic Video Editing

Liao

Duan

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…For small-scale person re-identification, there are relatively few deep learning methods, most of which focus on how to expand the training data [36][37][38][39]. Chen et al [40] proposed a cross-domain architecture that could use an auxiliary training set.…”

Section: Related Workmentioning

confidence: 99%

Uncertainty-optimized deep learning model for small-scale person re-identification

Zhao

Chen

Zang

et al. 2019

Sci. China Inf. Sci.

View full text Add to dashboard Cite

show abstract

“…This leads to a first class of problems since, as known, visual features have many weaknesses, including illumination changes, shadows, direction of light, and many others. Another class of problems regards the background clutter [ 28 , 29 ] and occlusions [ 30 , 31 ], which tend, in uncontrolled environments, to lower system performances in term of accuracy. A final class of problems, very important from a practical point of view, is referred to the long-term re-identification and camouflage [ 32 , 33 ].…”

Section: Introductionmentioning

confidence: 99%

Bodyprint—A Meta-Feature Based LSTM Hashing Model for Person Re-Identification

Avola

Cinque

Fagioli

et al. 2020

Sensors

View full text Add to dashboard Cite

Person re-identification is concerned with matching people across disjointed camera views at different places and different time instants. This task results of great interest in computer vision, especially in video surveillance applications where the re-identification and tracking of persons are required on uncontrolled crowded spaces and after long time periods. The latter aspects are responsible for most of the current unsolved problems of person re-identification, in fact, the presence of many people in a location as well as the passing of hours or days give arise to important visual appearance changes of people, for example, clothes, lighting, and occlusions; thus making person re-identification a very hard task. In this paper, for the first time in the state-of-the-art, a meta-feature based Long Short-Term Memory (LSTM) hashing model for person re-identification is presented. Starting from 2D skeletons extracted from RGB video streams, the proposed method computes a set of novel meta-features based on movement, gait, and bone proportions. These features are analysed by a network composed of a single LSTM layer and two dense layers. The first layer is used to create a pattern of the person’s identity, then, the seconds are used to generate a bodyprint hash through binary coding. The effectiveness of the proposed method is tested on three challenging datasets, that is, iLIDS-VID, PRID 2011, and MARS. In particular, the reported results show that the proposed method, which is not based on visual appearance of people, is fully competitive with respect to other methods based on visual features. In addition, thanks to its skeleton model abstraction, the method results to be a concrete contribute to address open problems, such as long-term re-identification and severe illumination changes, which tend to heavily influence the visual appearance of persons.

show abstract

VRSTC: Occlusion-Free Video Person Re-Identification

Cited by 210 publications

References 38 publications

Occlusion Detection for Automatic Video Editing

Occlusion Detection for Automatic Video Editing

Uncertainty-optimized deep learning model for small-scale person re-identification

Bodyprint—A Meta-Feature Based LSTM Hashing Model for Person Re-Identification

Contact Info

Product

Resources

About