2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00319
|View full text |Cite
|
Sign up to set email alerts
|

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

Abstract: In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, these 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms in many cases. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. For this, we adopt sparse tensors [8, 9] and propose the generali… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
1,064
0
3

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 1,337 publications
(1,185 citation statements)
references
References 22 publications
2
1,064
0
3
Order By: Relevance
“…These proposals are then refined and further annotated by humans. We use a pretrained Minkowski indoor semantic segmentation model [42] to predict per-voxel semantic labels ( Fig. 3.2).…”
Section: B Interactive Gibson Assetsmentioning
confidence: 99%
“…These proposals are then refined and further annotated by humans. We use a pretrained Minkowski indoor semantic segmentation model [42] to predict per-voxel semantic labels ( Fig. 3.2).…”
Section: B Interactive Gibson Assetsmentioning
confidence: 99%
“…Fast-and-Furious [15] pro-posed to view the vertical dimension as feature channels and apply 3D convolutions on the remaining three dimensions. MinkowskiNet [4] explicitly used sparse 4D convolution on a 4D occupancy grid. Instead of quantizing the raw point clouds into an occupancy grid, our method directly processes point clouds.…”
Section: Related Workmentioning
confidence: 99%
“…We used a toy version of two recent grid-based deep architectures for dynamic 3D scenes, FaF [15] and MinkNet [4], as well as our MeteorNet. We only allow three layers of neurons within the three network architectures and convolution kernel sizes of no larger than 3.…”
Section: Grids Versus Point Clouds: a Toy Examplementioning
confidence: 99%
See 2 more Smart Citations