Stefan Milz scite author profile

Lidar based 3D object detection is inevitable for autonomous driving, because it directly links to environmental understanding and therefore builds the base for prediction and motion planning. The capacity of inferencing highly sparse 3D data in real-time is an ill-posed problem for lots of other application areas besides automated vehicles, e.g. augmented reality, personal robotics or industrial automation. We introduce Complex-YOLO, a state of the art real-time 3D object detection network on point clouds only. In this work, we describe a network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space. Thus, we propose a specific Euler-Region-Proposal Network (E-RPN) to estimate the pose of the object by adding an imaginary and a real fraction to the regression network. This ends up in a closed complex space and avoids singularities, which occur by single angle estimations. The E-RPN supports to generalize well during training. Our experiments on the KITTI benchmark suite show that we outperform current leading methods for 3D object detection specifically in terms of efficiency. We achieve state of the art results for cars, pedestrians and cyclists by being more than five times faster than the fastest competitor. Further, our model is capable of estimating all eight KITTIclasses, including Vans, Trucks or sitting pedestrians simultaneously with high accuracy. Direct point cloud processing using Multi-Layer-Perceptrons [5] [10] [11] [23][24]

show abstract

WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving

Yogamani

et al. 2019

View full text Add to dashboard Cite

Figure 1: We introduce WoodScape, the first fisheye image dataset dedicated to autonomous driving. It contains four cameras covering 360°accompanied by a HD laser scanner, IMU and GNSS. Annotations are made available for nine tasks, notably 3D object detection, depth estimation (overlaid on front camera) and semantic segmentation as illustrated here. AbstractFisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of its prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, Wood-Scape, named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images. We would like to encourage the community to adapt computer vision models for fisheye camera instead of naïve rectification. 1

show abstract

Substance P and Prostaglandin E2 Release After Shock Wave Application to the Rabbit Femur

Maier¹,

Averbeck

Milz

et al. 2003

Clinical Orthopaedics and Related Research

126

View full text Add to dashboard Cite

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

et al. 2019

View full text Add to dashboard Cite

Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.

show abstract

Lateral ankle ligaments and tibiofibular syndesmosis: 13-MHz high-frequency sonography and MRI compared in 20 patients

Milz¹,

Milz²,

Steinborn³

et al. 1998

Acta Orthopaedica Scandinavica

View full text Add to dashboard Cite

Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning

Milz

Arbeiter

Witt

et al. 2018

View full text Add to dashboard Cite

FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving

Kumar

Hiremath

Bach

et al. 2020

View full text Add to dashboard Cite

Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view (> 180 • ). However, they come at the cost of strong non-linear distortion which require more complex algorithms. In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficult in practice, but self-supervised learning approaches show promising results and could potentially overcome the problem. We present a novel self-supervised scale-aware framework for learning Euclidean distance and ego-motion from raw monocular fisheye videos without applying rectification. While it is possible to perform piece-wise linear approximation of fisheye projection surface and apply standard rectilinear models, it has its own set of issues like re-sampling distortion and discontinuities in transition regions. To encourage further research in this area, we will release this dataset as part of our WoodScape project [1]. We further evaluated the proposed algorithm on the KITTI dataset and obtained state-of-the-art results comparable to other self-supervised monocular methods. Qualitative results on an unseen fisheye video demonstrate impressive performance 1 .

show abstract

UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models

Kumar

Yogamani

Bach

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stefan Milz

Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds

WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving

Substance P and Prostaglandin E2 Release After Shock Wave Application to the Rabbit Femur

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

Lateral ankle ligaments and tibiofibular syndesmosis: 13-MHz high-frequency sonography and MRI compared in 20 patients

Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning

FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving

UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models

Contact Info

Product

Resources

About