Suvam Patra scite author profile

In this paper, we address the problem of road segmentation and free space detection in the context of autonomous driving. Traditional methods either use 3-dimensional (3D) cues such as point clouds obtained from LIDAR, RADAR or stereo cameras or 2-dimensional (2D) cues such as lane markings, road boundaries and object detection. Typical 3D point clouds do not have enough resolution to detect fine differences in heights such as between road and pavement. Image based 2D cues fail when encountering uneven road textures such as due to shadows, potholes, lane markings or road restoration. We propose a novel free road space detection technique combining both 2D and 3D cues. In particular, we use CNN based road segmentation from 2D images and plane/box fitting on sparse depth data obtained from SLAM as priors to formulate an energy minimization using conditional random field (CRF), for road pixels classification. While the CNN learns the road texture and is unaffected by depth boundaries, the 3D information helps in overcoming texture based classification failures. Finally, we use the obtained road segmentation with the 3D depth data from monocular SLAM to detect the free space for the navigation purposes. Our experiments on KITTI odometry dataset [12], Camvid dataset [7] as well as videos captured by us validate the superiority of the proposed approach over the state of the art.

show abstract

Deep CNN with color lines model for unmarked road segmentation

Yadav

Patra

Arora

et al. 2017

View full text Add to dashboard Cite

EGO-SLAM: A Robust Monocular SLAM for Egocentric Videos

Patra

Gupta²,

Ahmad³

et al. 2019

View full text Add to dashboard Cite

Regardless of the tremendous progress, a truly general purpose pipeline for Simultaneous Localization and Mapping (SLAM) remains a challenge. We investigate the reported failure of state of the art (SOTA) SLAM techniques on egocentric videos [24,40,42]. We find that the dominant 3D rotations, low parallax between successive frames, and primarily forward motion in egocentric videos are the most common causes of failures. The incremental nature of SOTA SLAM, in the presence of unreliable pose and 3D estimates in egocentric videos, with no opportunities for global loop closures, generates drifts and leads to the eventual failures of such techniques. Taking inspiration from batch mode Structure from Motion (SFM) techniques [4,55], we propose to solve SLAM as an SFM problem over the sliding temporal windows. This makes the problem well constrained. Further, as suggested in [4], we propose to initialize the camera poses using 2D rotation averaging, followed by translation averaging before structure estimation using bundle adjustment. This helps in stabilizing the camera poses when 3D estimates are not reliable. We show that the proposed SLAM technique, incorporating the two key ideas works successfully for long, shaky egocentric videos where other SOTA techniques have been reported to fail. Qualitative and quantitative comparisons on publicly available egocentric video datasets validate our results.

show abstract

Divide and conquer: A hierarchical approach to large-scale structure-from-motion

Bhowmick

Patra

Chatterjee

et al. 2017

Computer Vision and Image Understanding

View full text Add to dashboard Cite

Computing Egomotion with Local Loop Closures for Egocentric Videos

Patra¹,

Aggarwal

Arora

et al. 2017

View full text Add to dashboard Cite

Finding the camera pose is an important step in many egocentric video applications. It has been widely reported that, state of the art SLAM algorithms fail on egocentric videos [1,2,3,4]. In this paper, we propose a robust method for camera pose estimation, designed specifically for egocentric videos. In an egocentric video, the camera views the same scene point multiple times as the wearer's head sweeps back and forth. We use this specific motion profile to perform short loop closures aligned with wearer's footsteps. For egocentric videos, depth estimation is usually noisy. In an important departure, we use 2D computations for rotation averaging which do not rely upon depth estimates. The two modification results in much more stable algorithm as is evident from our experiments on various egocentric video datasets for different egocentric applications. The proposed algorithm resolves a long standing problem in egocentric vision and unlocks new usage scenarios for future applications.

show abstract

Underwater Moving Object Detection using an End-to-End Encoder-Decoder Architecture and GraphSage with Aggregator and Refactoring

Kapoor

Patra

Subudhi

et al. 2023

View full text Add to dashboard Cite

High Resolution Point Cloud Generation From Kinect and Hd Cameras Using Graph Cut

Patra¹,

Bhowmick²,

Banerjee³

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Suvam Patra

Divide and Conquer: Efficient Large-Scale Structure from Motion Using Graph Partitioning

A Joint 3D-2D Based Method for Free Space Detection on Roads

Deep CNN with color lines model for unmarked road segmentation

EGO-SLAM: A Robust Monocular SLAM for Egocentric Videos

Divide and conquer: A hierarchical approach to large-scale structure-from-motion

Computing Egomotion with Local Loop Closures for Egocentric Videos

Underwater Moving Object Detection using an End-to-End Encoder-Decoder Architecture and GraphSage with Aggregator and Refactoring

High Resolution Point Cloud Generation From Kinect and Hd Cameras Using Graph Cut

Contact Info

Product

Resources

About