Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator for Static Video Surveillance

Hattori, Haruo; Lee, Namhoon; Boddeti, Vishnu Naresh; Beainy, Fares; Kanade, Takeo

doi:10.1007/s11263-018-1077-3

Cited by 33 publications

(14 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The PET 2006 dataset includes multi-view camera sequences containing left-luggage scenarios at a train station in which the scene complexity increases. To evaluate the pedestrian detection performance, we used only a single viewpoint so that the evaluation would be performed under the same conditions as that for the comparison algorithms [36].…”

Section: Expeimental Resultsmentioning

confidence: 99%

“…To verify the effectiveness of the soft target training scheme, we compared its performance with that of six state-of-theart methods: (1) DPM [8]; (2) the Faster R-CNN approach, which shares full-image convolutional features with the detection network [12]; (3) the scene pose CNN network (SPN), which generates a scene-specific pedestrian detector and pose estimator [36]; (4) YOLO 9000, which is a realtime CNN-based object detection system over 9000 object categories [14]; (5) teacher RFs consisting of 300 trees (teacher RF); and (6) proposed S-RF consisting of 50 trees (proposed S-RF). Faster R-CNN and YOLO 9000 used pretrained model parameters without performing fine-tuning.…”

Section: B Detection Comparison On Pets2006 Datasetmentioning

confidence: 99%

“…In the first experiment using the PETS 2006 dataset, we predicted that two CNN-based methods, Faster R-CNN [12] and YOLO9000 [14], but not SPN [36], would produce a worse detection performance than the other methods as shown in Fig. 3.…”

Section: B Detection Comparison On Pets2006 Datasetmentioning

confidence: 99%

See 2 more Smart Citations

Fast Pedestrian Detection in Surveillance Video Based on Soft Target Training of Shallow Random Forest

Kim

Kwak

2019

IEEE Access

View full text Add to dashboard Cite

In recent years, deep learning algorithms have achieved top performances in object detection tasks. However, in real-time, systems having memory or computing limitations very wide and deep networks with numerous parameters constitute a major obstacle. In this paper, we propose a fast method for detecting pedestrians in surveillance systems having limited memory and processing units. Our proposed method applies a model compression technique based on a teacher-student framework to a random forest (RF) classifier instead of a wide and deep network because a compressed deep network still demands a large memory for a large amount of parameters and processing resources for multiplication. The first objective of the proposed compression method is to train a student shallow RF (S-RF), which can mimic the teacher RF's performance, by using a softened version of the teacher RF's output. Second, a deep network cannot easily detect small and closely located pedestrians in a surveillance video captured from a high perspective because of frequent convolutions and pooling processes. In this paper, adaptive image scaling and region of interest with S-RF were therefore combined to allow fast and accurate pedestrian detection in a low-specification surveillance system. In experiments, our proposed method achieved up to a 2.2 times faster speed and a 2.68 times higher compression rate than teacher RF and a better detection performance than several stateof-the-art methods on the Performance Evaluation of Tracking and Surveillance 2006, Town Centre, and Caltech benchmark datasets.

show abstract

Section: Expeimental Resultsmentioning

confidence: 99%

Section: B Detection Comparison On Pets2006 Datasetmentioning

confidence: 99%

See 1 more Smart Citation

Fast Pedestrian Detection in Surveillance Video Based on Soft Target Training of Shallow Random Forest

Kim

Kwak

2019

IEEE Access

View full text Add to dashboard Cite

show abstract

“…From there, the use of synthetic visual data generated from virtual environments has kept growing. We found works using synthetic data for object detection/recognition [66][67][68][69], object viewpoint recognition [70], re-identification [71], and human pose estimation [72]; building synthetic cities for autonomous driving tasks such as semantic segmentation [44,73], place recognition [74], object tracking [45,75], object detection [76,77], stixel computation [78], and benchmarking different on-board computer vision tasks [47]; building indoor scenes for semantic segmentation [79], as well as normal and depth estimation [80]; generating GT for optical flow, scene flow, and disparity [81,82]; generating augmented reality images to support object detection [83]; simulating adverse atmospheric conditions such as rain or fog [84,85]; even performing procedural generation of videos for human action recognition [86,87]. Moreover, since robotics and autonomous driving rely on sensorimotor models worthy of being trained and tested dynamically, in the last years, the use of simulators has been intensified beyond datasets [48,49,88,89].…”

Section: Related Workmentioning

confidence: 99%

Recognizing New Classes with Synthetic Data in the Loop: Application to Traffic Sign Recognition

Villalonga

Weijer

López

2020

Sensors

View full text Add to dashboard Cite

On-board vision systems may need to increase the number of classes that can be recognized in a relatively short period. For instance, a traffic sign recognition system may suddenly be required to recognize new signs. Since collecting and annotating samples of such new classes may need more time than we wish, especially for uncommon signs, we propose a method to generate these samples by combining synthetic images and Generative Adversarial Network (GAN) technology. In particular, the GAN is trained on synthetic and real-world samples from known classes to perform synthetic-to-real domain adaptation, but applied to synthetic samples of the new classes. Using the Tsinghua dataset with a synthetic counterpart, SYNTHIA-TS, we have run an extensive set of experiments. The results show that the proposed method is indeed effective, provided that we use a proper Convolutional Neural Network (CNN) to perform the traffic sign recognition (classification) task as well as a proper GAN to transform the synthetic images. Here, a ResNet101-based classifier and domain adaptation based on CycleGAN performed extremely well for a ratio ∼ 1 / 4 for new/known classes; even for more challenging ratios such as ∼ 4 / 1 , the results are also very positive.

show abstract

“…Dhome et al used synthetic models to recognize objects from a single image [11]. For pedestrian detection, computer generated pedestrian images were used to train classifiers [5]. 3D simulation has been used for multi-view car detection [1] [31] [6].…”

Section: Related Workmentioning

confidence: 99%

Domain Randomization for Scene-Specific Car Detection and Pose Estimation

Khirodkar

Yoo

2019

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

We address the issue of domain gap when making use of synthetic data to train a scene-specific object detector and pose estimator. While previous works have shown that the constraints of learning a scene-specific model can be leveraged to create geometrically and photometrically consistent synthetic data, care must be taken to design synthetic content which is as close as possible to the real-world data distribution. In this work, we propose to solve domain gap through the use of appearance randomization to generate a wide range of synthetic objects to span the space of realistic images for training. An ablation study of our results is presented to delineate the individual contribution of different components in the randomization process. We evaluate our method on VIRAT, UA-DETRAC, EPFL-Car datasets, where we demonstrate that using scene specific domain randomized synthetic data is better than fine-tuning off-the-shelf models on limited real data.

show abstract

Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator for Static Video Surveillance

Cited by 33 publications

References 63 publications

Fast Pedestrian Detection in Surveillance Video Based on Soft Target Training of Shallow Random Forest

Fast Pedestrian Detection in Surveillance Video Based on Soft Target Training of Shallow Random Forest

Recognizing New Classes with Synthetic Data in the Loop: Application to Traffic Sign Recognition

Domain Randomization for Scene-Specific Car Detection and Pose Estimation

Contact Info

Product

Resources

About