Farzad Husain scite author profile

Abstract-Scene understanding is a necessary prerequisite for robots acting autonomously in complex environments. Low-cost RGB-D cameras such as Microsoft Kinect enabled new methods for analyzing indoor scenes and are now ubiquitously used in indoor robotics. We investigate strategies for efficient pixelwise object class labeling of indoor scenes that combine both pretrained semantic features transferred from a large color image dataset and geometric features, computed relative to the room structures, including a novel distance-from-wall feature, which encodes the proximity of scene points to a detected major wall of the room. We evaluate our approach on the popular NYU v2 dataset. Several deep learning models are tested, which are designed to exploit different characteristics of the data. This includes feature learning with two different pooling sizes. Our results indicate that combining semantic and geometric features yields significantly improved results for the task of object class segmentation.

show abstract

Consistent Depth Video Segmentation Using Adaptive Surface Models

Husain

Dellen

Torras

2015

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Abstract-We propose a new approach for the segmentation of 3-D point clouds into geometric surfaces using adaptive surface models. Starting from an initial configuration, the algorithm converges to a stable segmentation through a new iterative splitand-merge procedure, which includes an adaptive mechanism for the creation and removal of segments. This allows the segmentation to adjust to changing input data along the movie, leading to stable, temporally coherent, and traceable segments. We tested the method on a large variety of data acquired with different range imaging devices, including a structured-light sensor and a time-of-flight camera, and successfully segmented the videos into surface segments. We further demonstrated the feasibility of the approach using quantitative evaluations based on ground-truth data.

show abstract

Realtime tracking and grasping of a moving object from range video

Husain

Colomé

Dellen

et al. 2014

View full text Add to dashboard Cite

Abstract-In this paper we present an automated system that is able to track and grasp a moving object within the workspace of a manipulator using range images acquired with a Microsoft Kinect sensor. Realtime tracking is achieved by a geometric particle filter on the affine group. Based on the tracked output, the pose of a 7-DoF WAM robotic arm is continuously updated using dynamic motor primitives until a distance measure between the tracked object and the gripper mounted on the arm is below a threshold. Then, it closes its three fingers and grasps the object. The tracker works in realtime and is robust to noise and partial occlusions. Using only the depth data makes our tracker independent of texture which is one of the key design goals in our approach. An experimental evaluation is provided along with a comparison of the proposed tracker with state-of-the-art approaches, including the OpenNItracker. The developed system is integrated with ROS and made available as part of IRI's ROS stack.

show abstract

Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain

Husain

Dellen

Torras

2016

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Abstract-Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of realworld data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for contentbased recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.

show abstract

Scene Understanding Using Deep Learning

Husain

Dellen

Torras

2017

View full text Add to dashboard Cite

Deep learning is a type of machine perception method that attempts to model highlevel abstractions in data and encode them into a compact and robust representation. Such representations have found immense usage in applications related to computer vision. In this chapter we introduce two such applications, i.e., semantic segmentation of images and action recognition in videos. These applications are of fundamental importance for human-centered environment perception.

show abstract

Semantic segmentation priors for object discovery

García

Husain

Schulz

et al. 2016

View full text Add to dashboard Cite

Abstract-Reliable object discovery in realistic indoor scenes is a necessity for many computer vision and service robot applications. In these scenes, semantic segmentation methods have made huge advances in recent years. Such methods can provide useful prior information for object discovery by removing false positives and by delineating object boundaries. We propose a novel method that combines bottom-up object discovery and semantic priors for producing generic object candidates in RGB-D images. We use a deep learning method for semantic segmentation to classify colour and depth superpixels into meaningful categories. Separately for each category, we use saliency to estimate the location and scale of objects, and superpixels to find their precise boundaries. Finally, object candidates of all categories are combined and ranked. We evaluate our approach on the NYU Depth V2 dataset and show that we outperform other state-of-the-art object discovery methods in terms of recall.

show abstract

Recognizing Point Clouds Using Conditional Random Fields

Husain

Dellen

Torras

2014

View full text Add to dashboard Cite

Abstract-Detecting objects in cluttered scenes is a necessary step for many robotic tasks and facilitates the interaction of the robot with its environment. Because of the availability of efficient 3D sensing devices as the Kinect, methods for the recognition of objects in 3D point clouds have gained importance during the last years. In this paper, we propose a new supervised learning approach for the recognition of objects from 3D point clouds using Conditional Random Fields, a type of discriminative, undirected probabilistic graphical model. The various features and contextual relations of the objects are described by the potential functions in the graph. Our method allows for learning and inference from unorganized point clouds of arbitrary sizes and shows significant benefit in terms of computational speed during prediction when compared to a state-of-the-art approach based on constrained optimization.

show abstract

Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations

Dellen

Husain

Torras

2013

View full text Add to dashboard Cite

Abstract:A novel framework for joint segmentation and tracking in depth videos of object surfaces is presented. Initially, the 3D colored point cloud obtained using the Kinect camera is used to segment the scene into surface patches, defined by quadratic functions. The computed segments together with their functional descriptions are then used to partition the depth image of the subsequent frame in a consistent manner with respect to the precedent frame. This way, solutions established in previous frames can be reused which improves the efficiency of the algorithm and the coherency of the segmentations along the movie. The algorithm is tested for scenes showing human and robot manipulations of objects. We demonstrate that the method can successfully segment and track the human/robot arm and object surfaces along the manipulations. The performance is evaluated quantitatively by measuring the temporal coherency of the segmentations and the segmentation covering using ground truth. The method provides a visual front-end designed for robotic applications, and can potentially be used in the context of manipulation recognition, visual servoing, and robot-grasping tasks.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Farzad Husain

Combining Semantic and Geometric Features for Object Class Segmentation of Indoor Scenes

Consistent Depth Video Segmentation Using Adaptive Surface Models

Realtime tracking and grasping of a moving object from range video

Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain

Scene Understanding Using Deep Learning

Semantic segmentation priors for object discovery

Recognizing Point Clouds Using Conditional Random Fields

Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations

Contact Info

Product

Resources

About