Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image/text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this article constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language-based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.
Interfacial electron transfer between cocatalyst and photosensitizer is key in heterogeneous photocatalysis, yet the underlying mechanism remains subtle and unclear. Surfactant coated on the metal cocatalysts, greatly modulating the microenvironment of catalytic sites, is largely ignored. Herein, a series of Pt co-catalysts with modulated microenvironments,
This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Experimental results on various datasets demonstrate the effectiveness of our approach.
Hexagonal boron nitride (h-BN) catalyst has recently been reported to be highly selective in oxidative dehydrogenation of propane (ODHP) for olefin production. In addition to propene, ethylene also forms with much higher overall selectivities to C2-products than to C1-products. In this work, we report that the reaction pathways over the h-BN catalyst are different from the V-based catalysts in ODHP. Oxidative coupling reaction of methyl, an intermediate from the cleavage of C─C bond of propane, contributes to the high selectivities to C2-products, leading to more C2-products than C1-products over the h-BN catalyst. This work not only provides insight into the reaction mechanisms involved in ODHP over the boron-based catalysts but also sheds light on the selective oxidation of alkanes such as direct upgrading of methane via oxidative upgrading to ethylene or CHxOy on boron-based catalysts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.