Realistic Image Generation using Region-phrase Attention

Huang, Wanming; Xu, Yida; Oppermann, Ian

doi:10.48550/arxiv.1902.05395

Cited by 2 publications

(3 citation statements)

References 13 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [17], pedestrians are edited into predefined scenes using pix2pix, with spatial pyramid pooling in the discriminator for direct scrutiny. Pre-trained R-CNN object detection systems have been used to propose regions during GAN training in [8], or as a feature extractor in object-driven GANs [12]. In contrast, we modify the GAN discriminator itself, and leverage automatically derived RoI data.…”

Section: Related Workmentioning

confidence: 99%

Region-Guided CycleGANs for Stain Transfer in Whole Slide Images

Boyd

Villa

Mathieu

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In whole slide imaging, commonly used staining techniques based on hematoxylin and eosin (H&E) and immunohistochemistry (IHC) stains accentuate different aspects of the tissue landscape. In the case of detecting metastases, IHC provides a distinct readout that is readily interpretable by pathologists. IHC, however, is a more expensive approach and not available at all medical centers. Virtually generating IHC images from H&E using deep neural networks thus becomes an attractive alternative. Deep generative models such as CycleGANs learn a semantically-consistent mapping between two image domains, while emulating the textural properties of each domain. They are therefore a suitable choice for stain transfer applications. However, they remain fully unsupervised, and possess no mechanism for enforcing biological consistency in stain transfer. In this paper, we propose an extension to CycleGANs in the form of a region of interest discriminator. This allows the CycleGAN to learn from unpaired datasets where, in addition, there is a partial annotation of objects for which one wishes to enforce consistency. We present a use case on whole slide images, where an IHC stain provides an experimentally generated signal for metastatic cells. We demonstrate the superiority of our approach over prior art in stain transfer on histopathology tiles over two datasets. Our code and model are available at https://github.com/jcboyd/miccai2022-roigan.

show abstract

Section: Related Workmentioning

confidence: 99%

Region-Guided CycleGANs for Stain Transfer in Whole Slide Images

Boyd

Villa

Mathieu

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The work of Huang et al [129] improved the DAMSM loss by introducing true-grid regions inside every bounding box with word phrases, where attention weights depend on the bounding box and phrase information. So, this mechanism extends the regular gridbased attention that utilizes additional phrase features through parts-of-speech tagging besides sentence and word features.…”

Section: Direct T2imentioning

confidence: 99%

“…Similar to [129], Dynamic Aspect-awarE GAN (DAE-GAN) [136] refers to the importance of aspect in the input text. The model represents text information from multiple granularities of sentence-level, word-level, and aspect-level, for which, besides other attention mechanisms, the aspect-aware dynamic re-drawer (ADR) module is employed.…”

Section: Direct T2imentioning

confidence: 99%

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Ullah

Lee

et al. 2022

Sensors

View full text Add to dashboard Cite

For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language to process 2D or 3D images and videos with the immense power of neural nets has witnessed a promising future. Despite the diverse range of remarkable work in this field, notably in the past few years, rapid improvements have also solved future challenges for researchers. Moreover, the connection between these two domains is mainly subjected to GAN, thus limiting the horizons of this field. This review analyzes Text-to-Image (T2I) synthesis as a broader picture, Text-guided Visual-output (T2Vo), with the primary goal being to highlight the gaps by proposing a more comprehensive taxonomy. We broadly categorize text-guided visual output into three main divisions and meaningful subdivisions by critically examining an extensive body of literature from top-tier computer vision venues and closely related fields, such as machine learning and human–computer interaction, aiming at state-of-the-art models with a comparative analysis. This study successively follows previous surveys on T2I, adding value by analogously evaluating the diverse range of existing methods, including different generative models, several types of visual output, critical examination of various approaches, and highlighting the shortcomings, suggesting the future direction of research.

show abstract

Realistic Image Generation using Region-phrase Attention

Cited by 2 publications

References 13 publications

Region-Guided CycleGANs for Stain Transfer in Whole Slide Images

Region-Guided CycleGANs for Stain Transfer in Whole Slide Images

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Contact Info

Product

Resources

About