Mingliang Xu scite author profile

Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image/text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this article constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language-based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.

show abstract

Attention Scaling for Crowd Counting

Jiang

Zhang

et al. 2020

220

107

View full text Add to dashboard Cite

Interfacial Microenvironment Modulation Boosting Electron Transfer between Metal Nanoparticles and MOFs for Enhanced Photocatalysis

Sun

et al. 2021

Angew Chem Int Ed

174

View full text Add to dashboard Cite

show abstract

Person Re-Identification with Correspondence Structure Learning

et al. 2015

View full text Add to dashboard Cite

This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Experimental results on various datasets demonstrate the effectiveness of our approach.

show abstract

Propane oxidative dehydrogenation over highly selective hexagonal boron nitride catalysts: The role of oxidative coupling of methyl

Tian

Tan

et al. 2019

Sci. Adv.

View full text Add to dashboard Cite

Hexagonal boron nitride (h-BN) catalyst has recently been reported to be highly selective in oxidative dehydrogenation of propane (ODHP) for olefin production. In addition to propene, ethylene also forms with much higher overall selectivities to C2-products than to C1-products. In this work, we report that the reaction pathways over the h-BN catalyst are different from the V-based catalysts in ODHP. Oxidative coupling reaction of methyl, an intermediate from the cleavage of C─C bond of propane, contributes to the high selectivities to C2-products, leading to more C2-products than C1-products over the h-BN catalyst. This work not only provides insight into the reaction mechanisms involved in ODHP over the boron-based catalysts but also sheds light on the selective oxidation of alkanes such as direct upgrading of methane via oxidative upgrading to ethylene or CHxOy on boron-based catalysts.

show abstract

A cloud image detection method based on SVM vector machine

et al. 2015

View full text Add to dashboard Cite

DerainCycleGAN: Rain Attentive CycleGAN for Single Image Deraining and Rainmaking

Wei

Zhang

Wang

et al. 2021

IEEE Trans. on Image Process.

125

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mingliang Xu

Attention-Guided Hierarchical Structure Aggregation for Image Matting

Dual-path Convolutional Image-Text Embeddings with Instance Loss

Attention Scaling for Crowd Counting

Interfacial Microenvironment Modulation Boosting Electron Transfer between Metal Nanoparticles and MOFs for Enhanced Photocatalysis

Person Re-Identification with Correspondence Structure Learning

Propane oxidative dehydrogenation over highly selective hexagonal boron nitride catalysts: The role of oxidative coupling of methyl

A cloud image detection method based on SVM vector machine

DerainCycleGAN: Rain Attentive CycleGAN for Single Image Deraining and Rainmaking

Contact Info

Product

Resources

About