Joshua Manela scite author profile

Figure 1: Illustration of on-demand depth sensing with a coarse-to-fine hierarchy on the proposed dataset. Our method (HSM) captures the coarse layout of the scene in 91 milliseconds, finds the far-away car (shown in the red box) in 175 ms, and recovers the details of the car given extra 255 ms. AbstractWe explore the problem of real-time stereo matching on high-res imagery. Many state-of-the-art (SOTA) methods struggle to process high-res imagery because of memory constraints or speed limitations. To address this issue, we propose an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy. Because high-res stereo datasets are relatively rare, we introduce a dataset with high-res stereo pairs for both training and evaluation. Our approach achieved SOTA performance on Middlebury-v3 and KITTI-15 while running significantly faster than its competitors. The hierarchical design also naturally allows for anytime on-demand reports of disparity by capping intermediate coarse results, allowing us to accurately predict disparity for near-range structures with low latency (30ms). We demonstrate that the performance-vs-speed tradeoff afforded by on-demand hierarchies may address sensing needs for time-critical applications such as autonomous driving.

show abstract

Heterogeneous Multisensor Fusion for Mobile Platform Three-Dimensional Pose Estimation

Deilamsalehy

Havens

Manela

2017

View full text Add to dashboard Cite

Precise, robust, and consistent localization is an important subject in many areas of science such as vision-based control, path planning, and simultaneous localization and mapping (SLAM). To estimate the pose of a platform, sensors such as inertial measurement units (IMUs), global positioning system (GPS), and cameras are commonly employed. Each of these sensors has their strengths and weaknesses. Sensor fusion is a known approach that combines the data measured by different sensors to achieve a more accurate or complete pose estimation and to cope with sensor outages. In this paper, a three-dimensional (3D) pose estimation algorithm is presented for a unmanned aerial vehicle (UAV) in an unknown GPS-denied environment. A UAV can be fully localized by three position coordinates and three orientation angles. The proposed algorithm fuses the data from an IMU, a camera, and a two-dimensional (2D) light detection and ranging (LiDAR) using extended Kalman filter (EKF) to achieve accurate localization. Among the employed sensors, LiDAR has not received proper attention in the past; mostly because a two-dimensional (2D) LiDAR can only provide pose estimation in its scanning plane, and thus, it cannot obtain a full pose estimation in a 3D environment. A novel method is introduced in this paper that employs a 2D LiDAR to improve the full 3D pose estimation accuracy acquired from an IMU and a camera, and it is shown that this method can significantly improve the precision of the localization algorithm. The proposed approach is evaluated and justified by simulation and real world experiments.

show abstract

Hierarchical Deep Stereo Matching on High-resolution Images

Yang¹,

Manela²,

Happold³

et al. 2019

Preprint

View full text Add to dashboard Cite

CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection

Hwang¹,

Kretzschmar²,

Manela³

et al. 2022

View full text Add to dashboard Cite

Robust 3D object detection is critical for safe autonomous driving. Camera and radar sensors are synergistic as they capture complementary information and work well under different environmental conditions. Fusing camera and radar data is challenging, however, as each of the sensors lacks information along a perpendicular axis, that is, depth is unknown to camera and elevation is unknown to radar. We propose the camera-radar matching network CramNet, an efficient approach to fuse the sensor readings from camera and radar in a joint 3D space. To leverage radar range measurements for better camera depth predictions, we propose a novel ray-constrained cross-attention mechanism that resolves the ambiguity in the geometric correspondences between camera features and radar features. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle. We demonstrate the effectiveness of our fusion approach through extensive experiments on the RADIATE dataset, one of the few large-scale datasets that provide radar radio frequency imagery. A camera-only variant of our method achieves competitive performance in monocular 3D object detection on the Waymo Open Dataset.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Joshua Manela

Hierarchical Deep Stereo Matching on High-Resolution Images

Heterogeneous Multisensor Fusion for Mobile Platform Three-Dimensional Pose Estimation

Hierarchical Deep Stereo Matching on High-resolution Images

CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection

Contact Info

Product

Resources

About