Exploiting fusion architectures for multispectral pedestrian detection and segmentation

Guan, Dayan; Cao, Yanan; Yang, Jiangxin; Tisse, Christel-Loïc

doi:10.1364/ao.57.00d108

Cited by 26 publications

(19 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3 Our method achieves significantly higher detection accuracy compared with the state-of-the-art multispectral pedestrian detectors [27,24,16,15,31]. Moreover, this efficient framework can process more than 30 images per second on a single NVIDIA Geforce Titan X GPU to facilitate real-time applications in autonomous vehicles.…”

Section: Introductionmentioning

confidence: 92%

“…However, the use of anchor boxes will cause severe imbalance between positive and negative training samples [37] and involve complex hyperparameter settings (e.g., box size, aspect ratio, stride, and intersection-over-union threshold) [29]. Our method is very different from the existing anchor box based multispectral pedestrian detectors [27,24,32,16,15,31] in two major aspects. Firstly, we make use of the ground truth bounding boxes (manually annotated) to generate coarse boxlevel segmentation masks, which are utilized to replace the anchor bounding boxes for the training of two-stream deep neural networks to learn human-relative characteristic features.…”

Section: Related Workmentioning

confidence: 98%

“…Recently, researchers explore illumination information of a scene and proposed illumination-aware weighting mechanism to boost multispectral pedestrian detection performances [16,32]. Guan et al [15] presented a unified multispectral fusion framework for joint training of semantic segmentation and target detection. More accurate detection results were obtained by infusing the multispectral semantic segmentation masks as supervision for learning human-related features.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection

Cao

Guan

et al. 2019

ISPRS Journal of Photogrammetry and Remote Sensing

Self Cite

View full text Add to dashboard Cite

Effective fusion of complementary information captured by multi-modal sensors (visible and infrared cameras) enables robust pedestrian detection under various surveillance situations (e.g. daytime and nighttime). In this paper, we present a novel box-level segmentation supervised learning framework for accurate and real-time multispectral pedestrian detection by incorporating features extracted in visible and infrared channels. Specifically, our method takes pairs of aligned visible and infrared images with easily obtained bounding box annotations as input and estimates accurate prediction maps to highlight the existence of pedestrians. It offers two major advantages over the existing anchor box based multispectral detection methods. Firstly, it overcomes the hyperparameter setting problem occurred during the training phase of anchor box based detectors and can obtain more accurate detection results, especially for small and occluded pedestrian instances. Secondly, it is capable of generating accurate detection results using smallsize input images, leading to improvement of computational efficiency for real-time autonomous driving applications. Experimental results on KAIST multispectral dataset show that our proposed method outperforms state-of-the-art approaches in terms of both accuracy and speed.

show abstract

Section: Introductionmentioning

confidence: 92%

Section: Related Workmentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection

Cao

Guan

et al. 2019

ISPRS Journal of Photogrammetry and Remote Sensing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Researchers also paid attention to the main difference between visible and infrared images, and proposed illumination-aware weighting mechanism to give extra information to detectors [10,17]. Guan et al [9] presented a unified multispectral fusion framework, which infuses the multispectral semantic segmentation masks as supervision for learning human-related features, getting more accurate detection results. Li et al [16] designed a cascaded multispectral classification network to distinguish hard negatives sample from pedestrian and human-like instances.…”

Section: Related Workmentioning

confidence: 99%

“…Inspired by the multi-task framework for joint training of multispectral pedestrian detection and semantic segmentation [9], we combine the visible and thermal pedestrian detection supervision module with the box-level segmentation supervised deep neural networks [3] to build multispectral pedestrian detector, as illustrated in Fig. 3.…”

Section: Multispectral Pedestrian Detectormentioning

confidence: 99%

Unsupervised Domain Adaptation for Multispectral Pedestrian Detection

Guan

Luo

Cao

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Self Cite

View full text Add to dashboard Cite

Multimodal information (e.g., visible and thermal) can generate robust pedestrian detections to facilitate around-the-clock computer vision applications, such as autonomous driving and video surveillance. However, it still remains a crucial challenge to train a reliable detector working well in different multispectral pedestrian datasets without manual annotations. In this paper, we propose a novel unsupervised domain adaptation framework for multispectral pedestrian detection, by iteratively generating pseudo annotations and updating the parameters of our designed multispectral pedestrian detector on target domain. Pseudo annotations are generated using the detector trained on source domain, and then updated by fixing the parameters of detector and minimizing the cross entropy loss without back-propagation. Training labels are generated using the pseudo annotations by considering the characteristics of similarity and complementarity between wellaligned visible and infrared image pairs. The parameters of detector are updated using the generated labels by minimizing our defined multi-detection loss function with backpropagation. The optimal parameters of detector can be obtained after iteratively updating the pseudo annotations and parameters. Experimental results show that our proposed unsupervised multimodal domain adaptation method achieves significantly higher detection performance than the approach without domain adaptation, and is competitive with the supervised multispectral pedestrian detectors.

show abstract

Deep Visible and Thermal Image Fusion with Cross-Modality Feature Selection for Pedestrian Detection

Shao

Shi

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Exploiting fusion architectures for multispectral pedestrian detection and segmentation

Cited by 26 publications

References 9 publications

Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection

Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection

Unsupervised Domain Adaptation for Multispectral Pedestrian Detection

Deep Visible and Thermal Image Fusion with Cross-Modality Feature Selection for Pedestrian Detection

Contact Info

Product

Resources

About