Xiangxi Shi scite author profile

Image manipulation with natural language, which aims to manipulate images with the guidance of language descriptions, has been a challenging problem in the fields of computer vision and natural language processing (NLP). Currently, a number of efforts have been made for this task, but their performances are still distant away from generating realistic and text-conformed manipulated images. Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description. We propose a two-stage network with an additional reconstruction stage to learn the latent memories efficiently. To avoid the unnecessary background changes, we propose a Target Localization Unit (TLU) to focus on the manipulation of the region mentioned by the text. Moreover, to learn a robust memory, we further propose a novel randomized memory training loss. Experiments on the four popular datasets show the better performance of our method compared to the existing ones. CCS CONCEPTS• General and reference → Document types; • Computing methodologies → Computer vision.

show abstract

Video captioning with boundary-aware hierarchical language decoding and joint video prediction

Shi

Cai

et al. 2020

Neurocomputing

View full text Add to dashboard Cite

Learning Meta-class Memory for Few-Shot Semantic Segmentation

Shi

Lin

et al. 2021

Preprint

View full text Add to dashboard Cite

Currently, the state-of-the-art methods treat few-shot semantic segmentation task as a conditional foregroundbackground segmentation problem, assuming each class is independent. In this paper, we introduce the concept of meta-class, which is the meta information (e.g. certain middle-level features) shareable among all classes. To explicitly learn meta-class representations in few-shot segmentation task, we propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer to novel classes during the inference stage. Moreover, for the k-shot scenario, we propose a novel image quality measurement module to select images from the set of support images. A high-quality class prototype could be obtained with the weighted sum of support image features based on the quality measure. Experiments on both PASCAL-5 i and COCO dataset shows that our proposed method is able to achieve state-of-the-art results in both 1shot and 5-shot settings. Particularly, our proposed MM-Net achieves 37.5% mIoU on the COCO dataset in 1-shot setting, which is 5.1% higher than the previous state-of-theart.

show abstract

An iterative method for optical flow estimation with motion blur

Shi

Kang

Cao

2016

View full text Add to dashboard Cite

Watch It Twice

Shi

Cai

Joty

et al. 2019

View full text Add to dashboard Cite

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Shi

et al. 2020

Preprint

View full text Add to dashboard Cite

Change Captioning is a task that aims to describe the difference between images with natural language. Most existing methods treat this problem as a difference judgment without the existence of distractors, such as viewpoint changes. However, in practice, viewpoint changes happen often and can overwhelm the semantic difference to be described. In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task. Moreover, we further simulate the attention preference of humans and propose a novel reinforcement learning process to fine-tune the attention directly with language evaluation rewards. Extensive experimental results show that our method outperforms the state-of-the-art approaches by a large margin in both Spot-the-Diff and CLEVR-Change datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiangxi Shi

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Learning Meta-class Memory for Few-Shot Semantic Segmentation

Remember What You have drawn: Semantic Image Manipulation with Memory

Video captioning with boundary-aware hierarchical language decoding and joint video prediction

Learning Meta-class Memory for Few-Shot Semantic Segmentation

An iterative method for optical flow estimation with motion blur

Watch It Twice

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Contact Info

Product

Resources

About