In this paper, we present a novel approach, called Deep MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine object proposal that boosts the vehicle detection. Moreover, the Deep MANTA network is able to localize vehicle parts even if these parts are not visible. In the inference, the network's outputs are used by a real time robust pose estimation algorithm for fine orientation estimation and 3D vehicle localization. We show in experiments that our method outperforms monocular state-of-the-art approaches on vehicle detection, orientation and 3D location tasks on the very challenging KITTI benchmark.
In this paper we present a method for computing the localization of a mobile robot with reference to a learning video sequence. The robot is first guided on a path by a human, while the camera records a monocular learning sequence. Then a 3D reconstruction of the path and the environment is computed off line from the learning sequence. The 3D reconstruction is then used for computing the pose of the robot in real time (30 Hz) in autonomous navigation. Results from our localization method are compared to the ground truth measured with a differential GPS.
In this paper, we tackle the problem of domain adaptation to perform object-classification and detection tasks in video surveillance starting by a generic trained detector. Precisely, we put forward a new transductive transfer learning framework based on a sequential Monte Carlo filter to specialize a generic classifier towards a specific scene. The proposed algorithm approximates iteratively the target distribution as a set of samples (selected from both source and target domains) which feed the learning step of a specialized classifier. The output classifier is applied to pedestrian detection into a traffic scene. We have demonstrated by many experiments, on the CUHK Square Dataset and the MIT Traffic Dataset, that the performance of the specialized classifier outperforms the generic classifier and that the suggested algorithm presents encouraging results.
Due to its ability to learn complex behaviors in high-dimensional state-action spaces, deep reinforcement learning algorithms have attracted much interest in the robotics community. For a practical reinforcement learning implementation, reward signals have to be informative in the sense they have to discriminate certain close states and they must not be too noisy. To address the first issue, prior information, e.g. in the form of a geometric model, or human supervision are often assumed. This paper proposes a method to learn binocular fixations without such prior information. Instead, it uses an informative reward requiring little supervised information. The reward computation is based on an anomaly detection mechanism which uses convolutional autoencoders. These detectors estimate in a weakly supervised way an object's pixellic position. This position estimate is affected by noise, which makes the reward signal noisy. We first show that this affects both the learning speed and the resulting policy. Then, we propose a method to partially remove the noise using regression on the detection change given sensor data. The binocular fixation task is learned in a simulated environment on an object training set with various shapes and colors. The learned policy is compared with another one learned with a highly informative and noiseless reward signal. The tests are carried out on the training set and on a test set of new objects. We observe similar performances, showing that the environmentencoding step can replace the prior information.
Pedestrian safety is a primary traffic issue in urban environment. This article deals with the detection of pedestrians by means of a laser sensor. This sensor, placed on the front of a vehicle collects information about distance distributed according to 4 laser planes. Like a vehicle, a pedestrian constitutes in the vehicle environment an obstacle which must be detected, located, then identified and tracked if necessary. In order to improve the robustness of pedestrian detection using a single laser sensor we propose here a detection system based on the fusion of information located in the 4 laser planes. In this paper, we propose a Parzen kernel method that allows first to isolate the "pedestrian objects" in each plane and then to carry out a decentralized fusion according to the 4 laser planes. Finally, to improve our pedestrian detection algorithm we use a MCMC based PF method allowing a closer obervation of pedestrian random movement dynamics. Many experimental results validate and show the relevance of our pedestrian detection algorithm in regard to a method using only a single-row laser-range scanner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.