Figure 1: A few results from our VRN -Guided method, on a full range of pose, including large expressions. Abstract 3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability of multiple facial images (sometimes from the same subject) as input, and must address a number of methodological challenges such as establishing dense correspondences across large facial poses, expressions, and non-uniform illumination. In general these methods require complex and inefficient pipelines for model building and fitting. In this work, we propose to address many of these limitations by training a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans. Our CNN works with just a single 2D facial image, does not require accurate alignment nor establishes dense correspondence between images, works for arbitrary facial poses and expressions, and can be used to reconstruct the whole 3D facial geometry (including the non-visible parts of the face) bypassing the construction (during training) and fitting (during testing) of a 3D Morphable Model. We achieve this via a simple CNN architecture that performs direct regression of a volumetric representation of the 3D facial geometry from a single 2D image. We also demonstrate how the related task of facial landmark localization can be incorporated into the proposed framework and help improve reconstruction quality, especially for the cases of large poses and facial expressions. Code and models will be made available at
In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.
We present a robust FFT-based approach to scale-invariant image registration. Our method relies on FFT-based correlation twice: once in the log-polar Fourier domain to estimate the scaling and rotation and once in the spatial domain to recover the residual translation. Previous methods based on the same principles are not robust. To equip our scheme with robustness and accuracy, we introduce modifications which tailor the method to the nature of images. First, we derive efficient log-polar Fourier representations by replacing image functions with complex gray-level edge maps. We show that this representation both captures the structure of salient image features and circumvents problems related to the low-pass nature of images, interpolation errors, border effects, and aliasing. Second, to recover the unknown parameters, we introduce the normalized gradient correlation. We show that, using image gradients to perform correlation, the errors induced by outliers are mapped to a uniform distribution for which our normalized gradient correlation features robust performance. Exhaustive experimentation with real images showed that, unlike any other Fourier-based correlation techniques, the proposed method was able to estimate translations, arbitrary rotations, and scale factors up to 6.
Unmanned aerial vehicles (UAVs) have enormous potential in enabling new applications in various areas, ranging from military, security, medicine, and surveillance to traffic-monitoring applications. Lately, there has been heavy investment in the development of UAVs and multi-UAVs systems that can collaborate and complete missions more efficiently and economically. Emerging technologies such as 4G/5G networks have significant potential on UAVs equipped with cameras, sensors, and GPS receivers in delivering Internet of Things (IoT) services from great heights, creating an airborne domain of the IoT. However, there are many issues to be resolved before the effective use of UAVs can be made, including security, privacy, and management. As such, in this paper we review new UAV application areas enabled by the IoT and 5G technologies, analyze the sensor requirements, and overview solutions for fleet management over aerial-networking, privacy, and security challenges. Finally, we propose a framework that supports and enables these technologies on UAVs. The introduced framework provisions a holistic IoT architecture that enables the protection of UAVs as “flying” things in a collaborative networked environment.
In this paper we present a new database suitable for both 2D and 3D face recognition based on photometric stereo, the so-called Photoface database. The Photoface database was collected using a custom-made four-source photometric stereo device that could be easily deployed in commercial settings. Unlike other publicly available databases the level of cooperation between subjects and the capture mechanism was minimal. The proposed device may also be used, to capture 3D expressive faces. Apart from the description of the device and the Photoface database, we present experiments from baseline face recognition and verification algorithms using albedo, normals and the recovered depth maps. Finally, we have conducted experiments in order to demonstrate how different methods in the pipeline of photometric stereo (i.e. normal field computation and depth map reconstruction methods) affect recognition/verification performance.
In this paper we present the design and evaluation of an end-to-end trainable, deep neural network with a visual attention mechanism for memorability estimation in still images. We analyze the suitability of transfer learning of deep models from image classification to the memorability task. Further on we study the impact of the attention mechanism on the memorability estimation and evaluate our network on the SUN Memorability and the LaMem datasets. Our network outperforms the existing state of the art models on both datasets in terms of the Spearman's rank correlation as well as the mean squared error, closely matching human consistency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.