Farzad Husain scite author profile

Abstract-Scene understanding is a necessary prerequisite for robots acting autonomously in complex environments. Low-cost RGB-D cameras such as Microsoft Kinect enabled new methods for analyzing indoor scenes and are now ubiquitously used in indoor robotics. We investigate strategies for efficient pixelwise object class labeling of indoor scenes that combine both pretrained semantic features transferred from a large color image dataset and geometric features, computed relative to the room structures, including a novel distance-from-wall feature, which encodes the proximity of scene points to a detected major wall of the room. We evaluate our approach on the popular NYU v2 dataset. Several deep learning models are tested, which are designed to exploit different characteristics of the data. This includes feature learning with two different pooling sizes. Our results indicate that combining semantic and geometric features yields significantly improved results for the task of object class segmentation.

show abstract

Consistent Depth Video Segmentation Using Adaptive Surface Models

Husain

Dellen

Torras

2015

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Abstract-We propose a new approach for the segmentation of 3-D point clouds into geometric surfaces using adaptive surface models. Starting from an initial configuration, the algorithm converges to a stable segmentation through a new iterative splitand-merge procedure, which includes an adaptive mechanism for the creation and removal of segments. This allows the segmentation to adjust to changing input data along the movie, leading to stable, temporally coherent, and traceable segments. We tested the method on a large variety of data acquired with different range imaging devices, including a structured-light sensor and a time-of-flight camera, and successfully segmented the videos into surface segments. We further demonstrated the feasibility of the approach using quantitative evaluations based on ground-truth data.

show abstract

Realtime tracking and grasping of a moving object from range video

Husain

Colomé

Dellen

et al. 2014

View full text Add to dashboard Cite

Abstract-In this paper we present an automated system that is able to track and grasp a moving object within the workspace of a manipulator using range images acquired with a Microsoft Kinect sensor. Realtime tracking is achieved by a geometric particle filter on the affine group. Based on the tracked output, the pose of a 7-DoF WAM robotic arm is continuously updated using dynamic motor primitives until a distance measure between the tracked object and the gripper mounted on the arm is below a threshold. Then, it closes its three fingers and grasps the object. The tracker works in realtime and is robust to noise and partial occlusions. Using only the depth data makes our tracker independent of texture which is one of the key design goals in our approach. An experimental evaluation is provided along with a comparison of the proposed tracker with state-of-the-art approaches, including the OpenNItracker. The developed system is integrated with ROS and made available as part of IRI's ROS stack.

show abstract

Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain

Husain

Dellen

Torras

2016

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Abstract-Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of realworld data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for contentbased recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.

show abstract

Scene Understanding Using Deep Learning

Husain

Dellen

Torras

2017

View full text Add to dashboard Cite

Deep learning is a type of machine perception method that attempts to model highlevel abstractions in data and encode them into a compact and robust representation. Such representations have found immense usage in applications related to computer vision. In this chapter we introduce two such applications, i.e., semantic segmentation of images and action recognition in videos. These applications are of fundamental importance for human-centered environment perception.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Farzad Husain

Combining Semantic and Geometric Features for Object Class Segmentation of Indoor Scenes

Consistent Depth Video Segmentation Using Adaptive Surface Models

Realtime tracking and grasping of a moving object from range video

Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain

Scene Understanding Using Deep Learning

Contact Info

Product

Resources

About