In this work, we present a novel approach for training Generative Adversarial Networks (GANs). Using the attention maps produced by a Teacher-Network we are able to improve the quality of the generated images as well as perform weakly object localization on the generated images. To this end, we generate images of HEp-2 cells captured with Indirect Imunofluoresence (IIF) and study the ability of our network to perform a weakly localization of the cell. Firstly, we demonstrate that whilst GANs can learn the mapping between the input domain and the target distribution efficiently, the discriminator network is not able to detect the regions of interest. Secondly, we present a novel attention transfer mechanism which allows us to enforce the discriminator to put emphasis on the regions of interest via transfer learning. Thirdly, we show that this leads to more realistic images, as the discriminator learns to put emphasis on the area of interest. Fourthly, the proposed method allows one to generate both images as well as attention maps which can be useful for data annotation e.g in object detection.
In this work, a feature extraction method for offline signature verification is presented that harnesses the power of sparse representation in order to deliver state-of-the-art verification performance in several signature datasets like CEDAR, MCYT-75, GPDS and UTSIG. Beyond the accuracy improvements, several major parameters associated with sparse representation; such as selected configuration, dictionary size, sparsity level and positivity priors are investigated. Besides, it is evinced that 2 nd order statistics of the sparse codes is a powerful pooling function for the formation of the global signature descriptor. Also, a thorough evaluation of the effects of preprocessing is introduced by an automated algorithm in order to select the optimum thinning level. Finally, a segmentation strategy which employs a special form of spatial pyramid tailored to the problem of sparse representation is presented along with the enhancing of the produced descriptor on meaningful areas of the signature as emerged from the BRISK key-point detection mechanism. The obtained state-of-the-art results on the most challenging signature datasets provide a strong indication towards the benefits of learned features, even in writer dependent (WD) scenarios with a unique model for each writer and only a few available reference samples of him/her.
Lip reading (LR) is the task of predicting the speech utilizing only the visual information of the speaker. In this work, for the first time, the benefits of alternating between spatiotemporal and spatial convolutions for learning effective features from the LR sequences are studied. In this context, a new learnable module named ALSOS (Alternating Spatiotemporal and Spatial Convolutions) is introduced in the proposed LR system. The ALSOS module consists of spatiotemporal (3D) and spatial (2D) convolutions along with two conversion components (3D-to-2D and 2D-to-3D) providing a sequence-to-sequence-mapping. The designed LR system utilizes the ALSOS module in-between ResNet blocks, as well as Temporal Convolutional Networks (TCNs) in the backend for classification. The whole framework is composed by feedforward convolutional along with residual layers and can be trained end-to-end directly from the image sequences in the word-level LR problem. The ALSOS module can capture spatiotemporal dynamics and can be advantageous in the task of LR when combined with the ResNet topology. Experiments with different combinations of ALSOS with ResNet are performed on a dataset in Greek language simulating a medical support application scenario and on the popular large-scale LRW-500 dataset of English words. Results indicate that the proposed ALSOS module can improve the performance of a LR system. Overall, the insertion of ALSOS module into the ResNet architecture obtained higher classification accuracy since it incorporates the contribution of the temporal information captured at different spatial scales of the framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.