Road modeling is the first step towards environment perception within driver assistance video-based systems. Typically, lañe modeling allows applications such as lañe departure warning or lañe invasión by other vehicles. In this paper, a new monocular image processing strategy that achieves a robust múltiple lañe model is proposed. The identification of múltiple lañes is done by firstly detecting the own lañe and estimating its geometry under perspective distortion. The perspective analysis and curve fitting allows to hypothesize adjacent lañes assuming some a priori knowledge about the road. The verification of these hypotheses is carried out by a confidence level analysis. Several types of sequences have been tested, with different illumination conditions, presence of shadows and significant curvature, all performing in realtime. Results show the robustness of the system, delivering accurate múltiple lañe road models in most situations.
A more natural, intuitive, user-friendly, and less intrusive Human-Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.
Recently, broadcasted 3D video content has reached households with the first generation of 3DTV. However, few studies have been done to analyze the Quality of Experience (QoE) perceived by the end-users in this scenario. This paper studies the impact of transmission errors in 3DTV, considering that the video is delivered in side-by-side format over a conventional packet-based network. For this purpose, a novel evaluation methodology based on standard single stimulus methods and with the aim of keeping as close as possible the home environment viewing conditions has been proposed. The effects of packet losses in monoscopic and stereoscopic videos are compared from the results of subjective assessment tests. Other aspects were also measured concerning 3D content as naturalness, sense of presence and visual fatigue. The results show that although the final perceived QoE is acceptable, some errors cause important binocular rivalry, and therefore, substantial visual discomfort.
Hand gestures are one of the main alternatives for Human-Computer Interaction. For this reason, a hand gesture recognition system using near-infrared imagery acquired by a Leap Motion sensor is proposed. The recognition system directly characterizes the hand gesture by computing a global image descriptor, called Depth Spatiograms of Quantized Patterns, without any hand segmentation stage. To deal with the high dimensionality of the image descriptor, a Compressive Sensing framework is applied, obtaining a manageable image feature vector that almost preserves the original information. Finally, the resulting reduced image descriptors are analyzed by a set of Support Vectors Machines to identify the performed gesture independently of the precise hand location in the image. Promising results have been achieved using a new hand-based near-infrared database.
A depth-based face recognition algorithm specially adapted to high range resolution data acquired by the new Microsoft Kinect 2 sensor is presented. A novel descriptor called Depth Local Quantized Pattern descriptor has been designed to make use of the extended range resolution of the new sensor. This descriptor is a substantial modification of the popular Local Binary Pattern algorithm. One of the main contributions is the introduction of a quantification step, increasing its capacity to distinguish different depth patterns. The proposed descriptor has been used to train and test a Support Vector Machine classifier, which has proven to be able to accurately recognize different people faces from a wide range of poses. In addition, a new depth-based face database acquired by the new Kinect 2 sensor have been created and made public to evaluate the proposed face recognition system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.