Clothing retrieval is a challenging problem in computer vision. With the advance of Convolutional Neural Networks (CNNs), the accuracy of clothing retrieval has been significantly improved. FashionNet [1], a recent study, proposes to employ a set of artificial features in the form of landmarks for clothing retrieval, which are shown to be helpful for retrieval. However, the landmark detection module is trained with strong supervision which requires considerable efforts to obtain. In this paper, we propose a self-learning Visual Attention Model (VAM) to extract attention maps from clothing images. The VAM is further connected to a global network to form an end-to-end network structure through Impdrop connection which randomly Dropout on the feature maps with the probabilities given by the attention map. Extensive experiments on several widely used benchmark clothing retrieval data sets have demonstrated the promise of the proposed method. We also show that compared to the trivial Product connection, the Impdrop connection makes the network structure more robust when training sets of limited size are used.
Understanding human visual attention is important for multimedia applications. Many studies have attempted to build saliency prediction models on natural images. However, limited efforts have been devoted to saliency prediction for Web pages, which are characterized by diverse content elements and spatial layouts. In this paper, we propose a novel endto-end deep generative saliency model for Web pages. To capture position biases introduced by page layouts, a Position Prior Learning (PPL) sub-network is proposed, which models the position biases with a variational auto-encoder. To model different elements of a Web page, a Multi Discriminative Region Detection (MDRD) branch and a Text Region Detection (TRD) branch are introduced, which extract discriminative localizations and prominent text regions, respectively. We validate the proposed model with a public Webpage dataset 'FIWI', and show that the proposed model outperforms the state-of-art models for Web-page saliency prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.