Multi-glimpse LSTM with color-depth feature fusion for human detection

Li, Hengduo; Liu, Jun; Zhang, Guyue; Gao, Yuan; Wu, Yirui

doi:10.1109/icip.2017.8296412

Cited by 12 publications

(10 citation statements)

References 20 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the approach introduced in this work does not hinge on scene-specific a priori knowledge and provides an approximation to the full posterior distribution. In contrast to recent data-driven CNN architectures [22], [28]- [30], [33] our method requires no training data and the detection confidence can be quantified more precisely by approximating the posterior distribution. To the best of our knowledge, variational mean-field inference in combination with a generative scene model has not yet been applied to the problem of people detection in overlapping depth images.…”

Section: Discussionmentioning

confidence: 99%

“…In contrast to our proposed method, those approaches focus on integrated systems counting the number of persons crossing a certain virtual line, providing people detection only implicitly and in a rather small area. Recent CNN architectures [28]- [30] are successfully applied to single view depth image people detection leveraging many labeled images for training. Since in our top-view setup position changes of people lead to drastically varying appearances (compared to the classical frontal or profile view), those approaches need to be re-trained with a domain-specific large-scale data set.…”

Section: B Depth-based Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Joint Probabilistic People Detection in Overlapping Depth Images

2020

View full text Add to dashboard Cite

Privacy-preserving high-quality people detection is a vital computer vision task for various indoor scenarios, e.g. people counting, customer behavior analysis, ambient assisted living or smart homes. In this work a novel approach for people detection in multiple overlapping depth images is proposed. We present a probabilistic framework utilizing a generative scene model to jointly exploit the multi-view image evidence, allowing us to detect people from arbitrary viewpoints. Our approach makes use of meanfield variational inference to not only estimate the maximum a posteriori (MAP) state but to also approximate the posterior probability distribution of people present in the scene. Evaluation shows state-of-the-art results on a novel data set for indoor people detection and tracking in depth images from the top-view with high perspective distortions. Furthermore it can be demonstrated that our approach (compared to the the monoview setup) successfully exploits the multi-view image evidence and robustly converges in only a few iterations. INDEX TERMS Depth sensor indoor surveillance, depth sensor networks, generative scene model, joint multi-view person detection, mean-field variational inference, multi-camera person detection, people detection in top-view, vertical top-view pedestrian detection.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: B Depth-based Approachesmentioning

confidence: 99%

Joint Probabilistic People Detection in Overlapping Depth Images

2020

View full text Add to dashboard Cite

show abstract

“…Only a few detectors have considered the use of both RGB and depth (RGB-D) images as inputs to their networks [4,5], which are more robust against illumination and texture variations. In [4], a ResNet detector was used to detect upper body parts in an operating room.…”

Section: Person Detection Using Deep Learning Approaches With Rgb Andmentioning

confidence: 99%

“…In [5], a long short-term memory (LSTM) network was used to detect head-tops. The first layer employed the headtop detection technique presented in [18], where for each possible head-top pixel, a set of bounding boxes were generated from both RGB and depth images.…”

Section: Person Detection Using Deep Learning Approaches With Rgb Andmentioning

confidence: 99%

“…Deep networks have the potential to be used in USAR to autonomously extract features directly from sensory data. While they have been applied to human body part detection in structured environments, such as operating rooms [4], office buildings [5], and outdoor urban settings [6][7][8][9], they have not been considered for cluttered USAR environments. In USAR, victim identification needs to take place in environments that are unknown, without any a priori information available regarding victim locations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using Deep Learning to Find Victims in Unknown Cluttered Urban Search and Rescue Environments

et al. 2020

View full text Add to dashboard Cite

Purpose of Review We investigate the first use of deep networks for victim identification in Urban Search and Rescue (USAR). Moreover, we provide the first experimental comparison of single-stage and two-stage networks for body part detection, for cases of partial occlusions and varying illumination, on a RGB-D dataset obtained by a mobile robot navigating cluttered USAR-like environments. Recent Findings We considered the single-stage detectors Single Shot Multi-box Detector, You Only Look Once, and RetinaNet and the two-stage Feature Pyramid Network detector. Experimental results show that RetinaNet has the highest mean average precision (77.66%) and recall (86.98%) for detecting victims with body part occlusions in different lighting conditions. Summary End-to-end deep networks can be used for finding victims in USAR by autonomously extracting RGB-D image features from sensory data. We show that RetinaNet using RGB-D is robust to body part occlusions and low-lighting conditions and outperforms other detectors regardless of the image input type.

show abstract