Alexey Abramov scite author profile

Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing different manipulations. Employing simple sub-string search algorithms, semantic event chains can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.

show abstract

Categorizing object-action relations from semantic scene graphs

Aksoy

Abramov

Wörgötter

et al. 2010

View full text Add to dashboard Cite

Abstract-In this work we introduce a novel approach for detecting spatiotemporal object-action relations, leading to both, action recognition and object categorization. Semantic scene graphs are extracted from image sequences and used to find the characteristic main graphs of the action sequence via an exact graph-matching technique, thus providing an event table of the action scene, which allows extracting objectaction relations. The method is applied to several artificial and real action scenes containing limited context. The central novelty of this approach is that it is model free and needs a priori representation neither for objects nor actions. Essentially actions are recognized without requiring prior object knowledge and objects are categorized solely based on their exhibited role within an action sequence. Thus, this approach is grounded in the affordance principle, which has recently attracted much attention in robotics and provides a way forward for trial and error learning of object-action relations through repeated experimentation. It may therefore be useful for recognition and categorization tasks for example in imitation learning in developmental and cognitive robotics.

show abstract

Modeling leaf growth of rosette plants using infrared stereo image sequences

Aksoy¹,

Abramov²,

Wörgötter³

et al. 2015

Computers and Electronics in Agriculture

View full text Add to dashboard Cite

a b s t r a c tIn this paper, we present a novel multi-level procedure for finding and tracking leaves of a rosette plant, in our case up to 3 weeks old tobacco plants, during early growth from infrared-image sequences. This allows measuring important plant parameters, e.g. leaf growth rates, in an automatic and non-invasive manner. The procedure consists of three main stages: preprocessing, leaf segmentation, and leaf tracking. Leaf-shape models are applied to improve leaf segmentation, and further used for measuring leaf sizes and handling occlusions. Leaves typically grow radially away from the stem, a property that is exploited in our method, reducing the dimensionality of the tracking task. We successfully tested the method on infrared image sequences showing the growth of tobacco-plant seedlings up to an age of about 30 days, which allows measuring relevant plant growth parameters such as leaf growth rate. By robustly fitting a suitably modified autocatalytic growth model to all growth curves from plants under the same treatment, average plant growth models could be derived. Future applications of the method include plant-growth monitoring for optimizing plant production in green houses or plant phenotyping for plant research.

show abstract

Depth-supported real-time video segmentation with the Kinect

Abramov¹,

Pauwels²,

Papon³

et al. 2012

View full text Add to dashboard Cite

We present a real-time technique for the spatiotemporal segmentation of color/depth movies. Images are segmented using a parallel Metropolis algorithm implemented on a GPU utilizing both color and depth information, acquired with the Microsoft Kinect. Segments represent the equilibrium states of a Potts model, where tracking of segments is achieved by warping obtained segment labels to the next frame using real-time optical flow, which reduces the number of iterations required for the Metropolis method to encounter the new equilibrium state. By including depth information into the framework, true objects boundaries can be found more easily, improving also the temporal coherency of the method. The algorithm has been tested for videos of medium resolutions showing human manipulations of objects. The framework provides an inexpensive visual front end for visual preprocessing of videos in industrial settings and robot labs which can potentially be used in various applications.

show abstract

Real-Time Segmentation of Stereo Videos on a Portable System With a Mobile GPU

Abramov

Pauwels

Papon

et al. 2012

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Abstract-In mobile robotic applications, visual information needs to be processed fast despite resource limitations of the mobile system. Here a novel real-time framework for modelfree spatio-temporal segmentation of stereo videos is presented. It combines real-time optical flow and stereo with image segmentation and runs on a portable system with an integrated mobile GPU. The system performs on-line, automatic and dense segmentation of stereo videos and serves as a visual frontend for preprocessing in mobile robots, providing a condensed representation of the scene which can potentially be utilized in various applications, e.g., object manipulation, manipulation recognition, visual servoing. The method was tested on real-world sequences with arbitrary motions including videos acquired with a moving camera.

show abstract

Multi-lane perception using feature fusion based on GraphSLAM

Abramov

Bayer

Heller

et al. 2016

View full text Add to dashboard Cite

An extensive, precise and robust recognition and modeling of the environment is a key factor for next generations of Advanced Driver Assistance Systems and development of autonomous vehicles. In this paper, a real-time approach for the perception of multiple lanes on highways is proposed. Lane markings detected by camera systems and observations of other traffic participants provide the input data for the algorithm. The information is accumulated and fused using GraphSLAM and the result constitutes the basis for a multilane clothoid model. To allow incorporation of additional information sources, input data is processed in a generic format. Evaluation of the method is performed by comparing real data, collected with an experimental vehicle on highways, to a ground truth map. The results show that ego and adjacent lanes are robustly detected with high quality up to a distance of 120 m. In comparison to serial lane detection, an increase in the detection range of the ego lane and a continuous perception of neighboring lanes is achieved. The method can potentially be utilized for the longitudinal and lateral control of self-driving vehicles. § These authors contributed equally to this work.

show abstract

A modular system architecture for online parallel vision pipelines

Papon

Abramov

Aksoy

et al. 2012

View full text Add to dashboard Cite

We present an architecture for real-time, online vision systems which enables development and use of complex vi sion pipelines integrating any number of algorithms. In dividual algorithms are implemented using modular plug ins, allowing integration of independently developed algo rithms and rapid testing of new vision pipeline configura tions. The architecture exploits the parallelization of graph ics processing units (GPUs) and multi-core systems to speed processing and achieve real-time peiformance. Addition ally, the use of a global memory management system for frame buffering permits complex algorithmicfiow (e.g. feed back loops) in online processing setups, while maintaining the benefits of threaded asynchronous operation of separate algorithms. To demonstrate the system, a typical real-time system setup is described which incorporates plug ins for video and depth acquisition, GPU-based segmentation and opticalfiow, semantic graph generation, and online visual ization of output. Performance numbers are shown which demonstrate the insignificant overhead cost of the archi tecture as well as speed-up over strictly CPU and single threaded implementations.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alexey Abramov

Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds

Learning the semantics of object–action relations by observation

Categorizing object-action relations from semantic scene graphs

Modeling leaf growth of rosette plants using infrared stereo image sequences

Depth-supported real-time video segmentation with the Kinect

Real-Time Segmentation of Stereo Videos on a Portable System With a Mobile GPU

Multi-lane perception using feature fusion based on GraphSLAM

A modular system architecture for online parallel vision pipelines

Contact Info

Product

Resources

About