Alexander Vakhitov scite author profile

Abstract-Low textured scenes are well known to be one of the main Achilles heels of geometric computer vision algorithms relying on point correspondences, and in particular for visual SLAM. Yet, there are many environments in which, despite being low textured, one can still reliably estimate line-based geometric primitives, for instance in city and indoor scenes, or in the so-called "Manhattan worlds", where structured edges are predominant. In this paper we propose a solution to handle these situations. Specifically, we build upon ORB-SLAM, presumably the current state-of-the-art solution both in terms of accuracy as efficiency, and extend its formulation to simultaneously handle both point and line correspondences. We propose a solution that can even work when most of the points are vanished out from the input images, and, interestingly it can be initialized from solely the detection of line correspondences in three consecutive frames. We thoroughly evaluate our approach and the new initialization strategy on the TUM RGB-D benchmark and demonstrate that the use of lines does not only improve the performance of the original ORB-SLAM solution in poorly textured frames, but also systematically improves it in sequence frames combining points and lines, without compromising the efficiency.

show abstract

Textured Neural Avatars

Shysheya

et al. 2019

View full text Add to dashboard Cite

Figure 1: We propose a new model for neural rendering of humans. The model is trained for a single person and can produce renderings of this person from novel viewpoints (top) or in the new body pose (bottom) unseen during training. To improve generalization, our model retains explicit texture representation, which is learned alongside the rendering neural network. AbstractWe present a system for learning full-body neural avatars, i.e. deep networks that produce full-body renderings of a person for varying body pose and camera position. Our system takes the middle path between the classical graphics pipeline and the recent deep learning approaches that generate images of humans using image-to-image translation. In particular, our system estimates an explicit twodimensional texture map of the model surface. At the same time, it abstains from explicit shape modeling in 3D. Instead, at test time, the system uses a fully-convolutional network to directly map the configuration of body feature points w.r.t. the camera to the 2D texture coordinates of individual pixels in the image frame. We show that such a system is capable of learning to generate realistic renderings while being trained on videos annotated with 3D poses and foreground masks. We also demonstrate that maintaining an explicit texture representation helps our system to achieve better generalization compared to systems that use direct image-to-image translation.

show abstract

Coordinate-Based Texture Inpainting for Pose-Guided Human Image Generation

et al. 2019

View full text Add to dashboard Cite

We present a new deep learning approach to pose-guided resynthesis of human photographs. At the heart of the new approach is the estimation of the complete body surface texture based on a single photograph. Since the input photograph always observes only a part of the surface, we suggest a new inpainting method that completes the texture of the human body. Rather than working directly with colors of texture elements, the inpainting network estimates an appropriate source location in the input image for each element of the body surface. This correspondence field between the input image and the texture is then further warped into the target image coordinate frame based on the desired pose, effectively establishing the correspondence between the source and the target view even when the pose change is drastic. The final convolutional network then uses the established correspondence and all other available information to synthesize the output image. A fully-convolutional architecture with deformable skip connections guided by the estimated correspondence field is used. We show state-ofthe-art result for pose-guided image synthesis. Additionally, we demonstrate the performance of our system for garment transfer and pose-guided face resynthesis.

show abstract

Accurate and Linear Time Pose Estimation from Points and Lines

Vakhitov

Funke

Moreno-Noguer

2016

View full text Add to dashboard Cite

Abstract. The Perspective-n-Point (PnP) problem seeks to estimate the pose of a calibrated camera from n 3D-to-2D point correspondences. There are situations, though, where PnP solutions are prone to fail because feature point correspondences cannot be reliably estimated (e.g. scenes with repetitive patterns or with low texture). In such scenarios, one can still exploit alternative geometric entities, such as lines, yielding the so-called Perspective-n-Line (PnL) algorithms. Unfortunately, existing PnL solutions are not as accurate and efficient as their point-based counterparts. In this paper we propose a novel approach to introduce 3D-to-2D line correspondences into a PnP formulation, allowing to simultaneously process points and lines. For this purpose we introduce an algebraic line error that can be formulated as linear constraints on the line endpoints, even when these are not directly observable. These constraints can then be naturally integrated within the linear formulations of two state-of-the-art point-based algorithms, the OPnP [45] and the EPnP [24], allowing them to indistinctly handle points, lines, or a combination of them. Exhaustive experiments show that the proposed formulation brings remarkable boost in performance compared to only point or only line based solutions, with a negligible computational overhead compared to the original OPnP and EPnP.

show abstract

StylePeople: A Generative Model of Fullbody Human Avatars

Grigorev

Iskakov

Ianina

et al. 2021

View full text Add to dashboard Cite

Learnable Line Segment Descriptor for Visual SLAM

Vakhitov

Lempitsky

2019

IEEE Access

View full text Add to dashboard Cite

Traditionally, the indirect visual motion estimation and simultaneous localization and mapping (SLAM) systems were based on point features. In recent years, several SLAM systems that use lines as primitives were suggested. Despite the extra robustness and accuracy brought by the line segment matching, the line segment descriptors used in such systems were hand-crafted, and therefore sub-optimal. In this paper, we suggest applying descriptor learning to construct line segment descriptors optimized for matching tasks. We show how such descriptors can be constructed on top of a deep yet lightweight fully-convolutional neural network. The coefficients of this network are trained using an automatically collected dataset of matching and non-matching line segments. The use of the fully-convolutional network ensures that the bulk of the computations needed to compute descriptors is shared among the multiple line segments in the same image, enabling efficient implementation. We show that the learned line segment descriptors outperform the previously suggested hand-crafted line segment descriptors both in isolation (i.e., for the subtask of distinguishing matching and non-matching line segments), but also when built into the SLAM system. We construct a new line based SLAM pipeline built upon a state-of-the-art point-only system. We demonstrate generalization of the learned parameters of the descriptor network between two well-known datasets for autonomous driving and indoor micro aerial vehicle navigation.

show abstract

Stereo Relative Pose from Line and Point Feature Triplets

Vakhitov

Lempitsky

Zheng

2018

View full text Add to dashboard Cite

Stereo relative pose problem lies at the core of stereo visual odometry systems that are used in many applications. In this work we present two minimal solvers for the stereo relative pose. We specifically consider the case when a minimal set consists of three point or line features and each of them has three known projections on two stereo cameras. We validate the importance of this formulation for practical purposes in our experiments with motion estimation. We then present a complete classification of minimal cases with three point or line correspondences each having three projections, and present two new solvers that can handle all such cases. We demonstrate a considerable effect from the integration of the new solvers into a visual SLAM system.

show abstract

Adaptive control of siso plant with time-varying coefficients based on random test perturbation

Vakhitov

Vlasov

Granichin

2010

View full text Add to dashboard Cite

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.