Srikanth Malla scite author profile

3D multi-object detection and tracking are crucial for traffic scene understanding. However, the community pays less attention to these areas due to the lack of a standardized benchmark dataset to advance the field. Moreover, existing datasets (e.g., KITTI [1]) do not provide sufficient data and labels to tackle challenging scenes where highly interactive and occluded traffic participants are present. To address the issues, we present the Honda Research Institute 3D Dataset (H3D), a large-scale full-surround 3D multi-object detection and tracking dataset collected using a 3D LiDAR scanner. H3D comprises of 160 crowded and highly interactive traffic scenes with a total of 1 million labeled instances in 27,721 frames. With unique dataset size, rich annotations, and complex scenes, H3D is gathered to stimulate research on full-surround 3D multiobject detection and tracking. To effectively and efficiently annotate a large-scale 3D point cloud dataset, we propose a labeling methodology to speed up the overall annotation cycle. A standardized benchmark is created to evaluate fullsurround 3D multi-object detection and tracking algorithms. 3D object detection and tracking algorithms are trained and tested on H3D. Finally, sources of errors are discussed for the development of future algorithms.

show abstract

TITAN: Future Forecast Using Action Priors

Malla

Dariush

Choi

2020

View full text Add to dashboard Cite

LOKI: Long Term and Key Intentions for Trajectory Prediction

Girase

Gang

Malla

et al. 2021

View full text Add to dashboard Cite

DROGON: A Trajectory Prediction Model based on Intention-Conditioned Behavior Reasoning

Choi¹,

Malla²,

Patil³

et al. 2019

Preprint

View full text Add to dashboard Cite

RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting

Yang

et al. 2021

View full text Add to dashboard Cite

Shared Cross-Modal Trajectory Prediction for Autonomous Driving

Choi

et al. 2021

View full text Add to dashboard Cite

NEMO: Future Object Localization Using Noisy Ego Priors

Malla¹,

Dwivedi²,

Dariush³

et al. 2019

Preprint

View full text Add to dashboard Cite

Predictive models for forecasting future behavior of road agents should consider the multi-modal nature and be aware of the uncertainty of their predictions. Particularly from the egocentric view where the motion of other agents is captured with respect to the ego-motion, the uncertainty of ego-motion prediction is critical to determine their interactive reactions and behaviors. Along this line, we propose NEMO (Noisy Ego MOtion priors for future object localization) for future forecast of road agents in the egocentric view. A predictive distribution of future forecast is jointly modeled with the uncertainty of predictions. For this, we divide the problem into two tasks: future ego-motion prediction and future object localization. We first model the multi-modal distribution of future ego-motion with uncertainty estimates. The resulting distribution of egobehavior is used to sample multiple modes of future egomotion. Then, each modality is used as a prior to understand the interactions between the ego-vehicle and target agent. We predict the multi-modal future locations of the target from individual modes of the ego-vehicle, modeling the uncertainty of target's behavior. To this end, we extensively evaluate the proposed framework using the publicly available benchmark dataset (HEV-I) with an addition of Inertial Measurement Unit (IMU) data to it.

show abstract

Shared Cross-Modal Trajectory Prediction for Autonomous Driving

Choi¹,

Choi²,

Malla³

et al. 2020

Preprint

View full text Add to dashboard Cite

We propose a framework for predicting future trajectories of traffic agents in highly interactive environments. On the basis of the fact that autonomous driving vehicles are equipped with various types of sensors (e.g., LiDAR scanner, RGB camera, etc.), our work aims to get benefit from the use of multiple input modalities that are complementary to each other. The proposed approach is composed of two stages. (i) feature encoding where we discover motion behavior of the target agent with respect to other directly and indirectly observable influences. We extract such behaviors from multiple perspectives such as in top-down and frontal view. (ii) cross-modal embedding where we embed a set of learned behavior representations into a single cross-modal latent space. We construct a generative model and formulate the objective functions with an additional regularizer specifically designed for future prediction. An extensive evaluation is conducted to show the efficacy of the proposed framework using two benchmark driving datasets.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Srikanth Malla

The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes

TITAN: Future Forecast Using Action Priors

LOKI: Long Term and Key Intentions for Trajectory Prediction

DROGON: A Trajectory Prediction Model based on Intention-Conditioned Behavior Reasoning

RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting

Shared Cross-Modal Trajectory Prediction for Autonomous Driving

NEMO: Future Object Localization Using Noisy Ego Priors

Shared Cross-Modal Trajectory Prediction for Autonomous Driving

Contact Info

Product

Resources

About