Any Motion Detector: Learning Class-agnostic Scene Dynamics from a Sequence of LiDAR Point Clouds

Filatov, Artem; Rykov, Andrey; Murashkin, Viacheslav

doi:10.1109/icra40945.2020.9196716

Cited by 22 publications

(14 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is extended to a multi-task network in order to additionally predict semantic classes and relies on different input data. Filatov et al [9] introduce a recurrent network architecture for the prediction of a velocity grid based on a sequence of lidar point clouds. First, the lidar data is processed with a voxel feature encoding layer [10] to obtain bird's eye view representations, which are aggregated with a convolutional recurrent network layer.…”

Section: Related Workmentioning

confidence: 99%

“…These feature maps are then processed in a feature pyramid and a flow network to predict the scene flow in a 2D grid. Compared to dynamic occupancy grid maps, the work in [9] and [11] focus on the velocity estimation, but do not model freespace and occlusions. Wu et al [13] introduce a spatio-temporal network architecture to predict a 2D grid encoding motion and semantic for each cell, based on 3D lidar point clouds.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A Multi-Task Recurrent Neural Network for End-to-End Dynamic Occupancy Grid Mapping

Schreiber¹,

Belagiannis²,

Gläser³

et al. 2022

Preprint

View full text Add to dashboard Cite

A common approach for modeling the environment of an autonomous vehicle are dynamic occupancy grid maps, in which the surrounding is divided into cells, each containing the occupancy and velocity state of its location. Despite the advantage of modeling arbitrary shaped objects, the used algorithms rely on hand-designed inverse sensor models and semantic information is missing. Therefore, we introduce a multi-task recurrent neural network to predict grid maps providing occupancies, velocity estimates, semantic information and the driveable area. During training, our network architecture, which is a combination of convolutional and recurrent layers, processes sequences of raw lidar data, that is represented as bird's eye view images with several height channels. The multi-task network is trained in an end-to-end fashion to predict occupancy grid maps without the usual preprocessing steps consisting of removing ground points and applying an inverse sensor model. In our evaluations, we show that our learned inverse sensor model is able to overcome some limitations of a geometric inverse sensor model in terms of representing object shapes and modeling freespace. Moreover, we report a better runtime performance and more accurate semantic predictions for our end-to-end approach, compared to our network relying on measurement grid maps as input data.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

A Multi-Task Recurrent Neural Network for End-to-End Dynamic Occupancy Grid Mapping

Schreiber¹,

Belagiannis²,

Gläser³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In this work, we further develop this approach and propose our method for scenarios with moving ego-vehicle. In a concurrent approach, Filatov et al [17] propose a novel architecture to estimate class-agnostic scene dynamics as grid representation using a sequence of lidar point clouds as input data, which is first processed in voxel feature encoding layers [18]. Afterwards these features are aggregated in a convolutional recurrent network layer with ego-motion compensation and the last hidden state is processed in a ResNet18-FPN backbone network to finally predict a segmentation and velocity grid.…”

Section: Related Workmentioning

confidence: 99%

“…input placement and recurrent states shifting, are not sufficient to achieve an ego-motion compensation on its own for our architecture. The solely shifting of the recurrent states, as applied in [16], [17] is only applicable, if all recurrent layers have the same grid cell size. We argue, that this is a strong limitation for the insertion of recurrent layers in fully convolutional network architectures, as most of them use network layers with different scales, e.g.…”

Section: Dynamic Grid Mapping With Moving Ego-vehiclementioning

confidence: 99%

Dynamic Occupancy Grid Mapping with Recurrent Neural Networks

Schreiber

Belagiannis

Gläser

et al. 2021

2021 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

“…Another possibility to represent and estimate motion is based on bird's eye view (BEV). In this way, a point cloud is discretized into grid cells, and motion information is described by encoding each cell with a 2D displacement vector indicating the position into the future of the cell on the ground plane [8,17,39]. This compact representation successfully simplifies scene motion as the motion taking place on the ground plane is the primary concern for autonomous driving, while the motion in the vertical direction is not as much important or useful.…”

Section: Introductionmentioning

confidence: 99%

Self-Supervised Pillar Motion Learning for Autonomous Driving

Luo

Yang

Yuille

2021

Preprint

View full text Add to dashboard Cite

Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments. Recently, there has been a growing interest in estimating class-agnostic motion directly from point clouds. Current motion estimation methods usually require vast amount of annotated training data from self-driving scenes. However, manually labeling point clouds is notoriously difficult, error-prone and time-consuming. In this paper, we seek to answer the research question of whether the abundant unlabeled data collections can be utilized for accurate and efficient motion learning. To this end, we propose a learning framework that leverages free supervisory signals from point clouds and paired camera images to estimate motion purely via self-supervision. Our model involves a point cloud based structural consistency augmented with probabilistic motion masking as well as a cross-sensor motion regularization to realize the desired self-supervision. Experiments reveal that our approach performs competitively to supervised methods, and achieves the state-of-the-art result when combining our self-supervised model with supervised fine-tuning.

show abstract

Any Motion Detector: Learning Class-agnostic Scene Dynamics from a Sequence of LiDAR Point Clouds

Cited by 22 publications

References 15 publications

A Multi-Task Recurrent Neural Network for End-to-End Dynamic Occupancy Grid Mapping

A Multi-Task Recurrent Neural Network for End-to-End Dynamic Occupancy Grid Mapping

Dynamic Occupancy Grid Mapping with Recurrent Neural Networks

Self-Supervised Pillar Motion Learning for Autonomous Driving

Contact Info

Product

Resources

About