Tao Xu scite author profile

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing textto-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256×256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.

show abstract

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Zhang

Huang

et al. 2018

1,317

1,548

View full text Add to dashboard Cite

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the At-tnGAN can synthesize fine-grained details at different subregions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.

show abstract

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Zhang

et al. 2019

IEEE Trans. Pattern Anal. Mach. Intell.

853

758

View full text Add to dashboard Cite

Although Generative Adversarial Networks (GANs) have shown remarkable success in various tasks, they still face challenges in generating high quality images. In this paper, we propose Stacked Generative Adversarial Networks (StackGANs) aimed at generating high-resolution photo-realistic images. First, we propose a two-stage generative adversarial network architecture, StackGAN-v1, for text-to-image synthesis. The Stage-I GAN sketches the primitive shape and colors of a scene based on a given text description, yielding low-resolution images. The Stage-II GAN takes Stage-I results and the text description as inputs, and generates high-resolution images with photo-realistic details. Second, an advanced multi-stage generative adversarial network architecture, StackGAN-v2, is proposed for both conditional and unconditional generative tasks. Our StackGAN-v2 consists of multiple generators and multiple discriminators arranged in a tree-like structure; images at multiple scales corresponding to the same scene are generated from different branches of the tree. StackGAN-v2 shows more stable training behavior than StackGAN-v1 by jointly approximating multiple distributions. Extensive experiments demonstrate that the proposed stacked generative adversarial networks significantly outperform other state-of-the-art methods in generating photo-realistic images.

show abstract

SegAN: Adversarial Network with Multi-scale L1 Loss for Medical Image Segmentation

et al. 2018

View full text Add to dashboard Cite

Inspired by classic Generative Adversarial Networks (GANs), we propose a novel end-to-end adversarial neural network, called SegAN, for the task of medical image segmentation. Since image segmentation requires dense, pixel-level labeling, the single scalar real/fake output of a classic GAN's discriminator may be ineffective in producing stable and sufficient gradient feedback to the networks. Instead, we use a fully convolutional neural network as the segmentor to generate segmentation label maps, and propose a novel adversarial critic network with a multi-scale L loss function to force the critic and segmentor to learn both global and local features that capture long- and short-range spatial relationships between pixels. In our SegAN framework, the segmentor and critic networks are trained in an alternating fashion in a min-max game: The critic is trained by maximizing a multi-scale loss function, while the segmentor is trained with only gradients passed along by the critic, with the aim to minimize the multi-scale loss function. We show that such a SegAN framework is more effective and stable for the segmentation task, and it leads to better performance than the state-of-the-art U-net segmentation method. We tested our SegAN method using datasets from the MICCAI BRATS brain tumor segmentation challenge. Extensive experimental results demonstrate the effectiveness of the proposed SegAN with multi-scale loss: on BRATS 2013 SegAN gives performance comparable to the state-of-the-art for whole tumor and tumor core segmentation while achieves better precision and sensitivity for Gd-enhance tumor core segmentation; on BRATS 2015 SegAN achieves better performance than the state-of-the-art in both dice score and precision.

show abstract

SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition

et al. 2016

View full text Add to dashboard Cite

Chinese experts’ consensus on the Internet of Things-aided diagnosis and treatment of coronavirus disease 2019 (COVID-19)

et al. 2020

View full text Add to dashboard Cite

RSPO2–LGR5 signaling has tumour-suppressive activity in colorectal cancer

Qiu

et al. 2014

Nat Commun

View full text Add to dashboard Cite

R-spondins are a family of secreted Wnt agonists. One of the family members, R-spondin 2 (RSPO2), has an important role in embryonic development, bone formation and myogenic differentiation; however, its role in human cancers remains largely unknown. Here we show that RSPO2 expression is downregulated in human colorectal cancers (CRCs) due to promoter hypermethylation, and that the RSPO2 reduction correlates with tumour differentiation, size and metastasis. Overexpression of RSPO2 suppresses CRC cell proliferation and tumorigenicity, whereas the depletion of RSPO2 enhances tumour cell growth. RSPO2 has an inhibitory effect on Wnt/β-catenin signaling in the CRC cells that show suppressed cell proliferation. In human CRC cells, the RSPO2-induced inhibition of Wnt signaling depends on leucine-rich repeat-containing G-protein-coupled receptor 5 (LGR5); RSPO2 interacts with LGR5 to stabilize the membrane-associated zinc and ring finger 3 (ZNRF3). Our data suggest that RSPO2 functions as a tumour suppressor in human CRCs, and these data reveal a RSPO2-induced, LGR5-dependent Wnt signaling-negative feedback loop that exerts a net growth-suppressive effect on CRC cells.

show abstract

Multimodal Recurrent Model with Attention for Automated Radiology Report Generation

Xue

Long

et al. 2018

121

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tao Xu

StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

SegAN: Adversarial Network with Multi-scale L1 Loss for Medical Image Segmentation

SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition

Chinese experts’ consensus on the Internet of Things-aided diagnosis and treatment of coronavirus disease 2019 (COVID-19)

RSPO2–LGR5 signaling has tumour-suppressive activity in colorectal cancer

Multimodal Recurrent Model with Attention for Automated Radiology Report Generation

Contact Info

Product

Resources

About