Adversarial text-to-image synthesis: A review

Frolov, Stanislav; Hinz, Tobias; Raue, Federico; Hees, J.J. van; Dengel, Andreas

doi:10.1016/j.neunet.2021.07.019

Cited by 124 publications

(55 citation statements)

References 102 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These metrics do not take ground truth data into account and use a classifier pretrained on ImageNet [16] that mostly contains single-object images. Therefore, they are likely not well suited for more complex datasets [20]. To measure image-text alignment, metrics based on retrieval, captioning and object detection models have been proposed.…”

Section: Related Workmentioning

confidence: 99%

“…Semantic object accuracy (SOA) [27] measures whether an object detector can detect an object described in the text from a generated image. R-precision and image captioning based evaluation can fail when many different captions correctly describe the same image [20,27]. 4 SOA only focuses on the existence of objects, which makes it not well suited to evaluate object attributes and relation between objects [20,27].…”

Section: Related Workmentioning

confidence: 99%

“…Generating images from textual descriptions based on machine learning is an active research area [20]. The ability to visualize sentences suggests that a model can understand language and ground abstract concepts to objects in the real world.…”

Section: Introductionmentioning

confidence: 99%

“…A comprehensive evaluation of a text-to-image generation can provide a better understanding of what models can and cannot do, help users to decide when to and when not to use them for real-world applications, and inspire novel ideas on how to improve them. Most works have only evaluated their text-to-image generation models with two types of automated metrics [20]: 1) image-text alignment [27,29,62] -whether the generated images align with the semantics of the text descriptions; 2) image quality [26,49] -whether the generated images look similar to images from training data. However, these automated evaluation metrics are not designed to capture visual reasoning capabilities (e.g., understanding the count of objects or the spatial relations between objects).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Cho¹,

Zala²,

Bansal³

2022

Preprint

View full text Add to dashboard Cite

Text-to-Image Generative ModelFigure 1. Overview of our evaluation process for text-to-image models. We propose to evaluate models in four ways: visual reasoning skills (Sec. 4.1), image-text alignment (Sec. 4.2), image quality (Sec. 4.3), and social biases (Sec. 4.4). Images in the figure are generated using ruDALL-E-XL. We also conduct human evaluation to verify our model-based visual reasoning, image-text alignment, and social bias evaluations.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Cho¹,

Zala²,

Bansal³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Although much of the patient’s narrative may be told separately through text, imaging, and omics modalities,[ 63 ] there is tremendous potential to integrate semantic information contained in pathologist notes with imaging and omics modalities to capture a more holistic perspective of the patient’s health and integrate potentially useful information that could otherwise be overlooked. For instance, the semantic information contained in a report may highlight specific morphological and macro-architectural features in the correspondent biopsy specimen that an image-based deep learning model might struggle to identify without additional information.…”

Section: Discussionmentioning

confidence: 99%

Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

Levy

N²,

Haudenschild

et al. 2022

Journal of Pathology Informatics

View full text Add to dashboard Cite

Background: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. Methods: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. Results: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. Conclusions: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist c...

show abstract

Metaverse Wearables for Immersive Digital Healthcare: A Review

Kim,

Yang,

Lee

et al. 2023

Advanced Science

View full text Add to dashboard Cite

The recent exponential growth of metaverse technology has been instrumental in reshaping a myriad of sectors, not least digital healthcare. This comprehensive review critically examines the landscape and future applications of metaverse wearables toward immersive digital healthcare. The key technologies and advancements that have spearheaded the metamorphosis of metaverse wearables are categorized, encapsulating all‐encompassed extended reality, such as virtual reality, augmented reality, mixed reality, and other haptic feedback systems. Moreover, the fundamentals of their deployment in assistive healthcare (especially for rehabilitation), medical and nursing education, and remote patient management and treatment are investigated. The potential benefits of integrating metaverse wearables into healthcare paradigms are multifold, encompassing improved patient prognosis, enhanced accessibility to high‐quality care, and high standards of practitioner instruction. Nevertheless, these technologies are not without their inherent challenges and untapped opportunities, which span privacy protection, data safeguarding, and innovation in artificial intelligence. In summary, future research trajectories and potential advancements to circumvent these hurdles are also discussed, further augmenting the incorporation of metaverse wearables within healthcare infrastructures in the post‐pandemic era.

show abstract

Adversarial text-to-image synthesis: A review

Cited by 124 publications

References 102 publications

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

Metaverse Wearables for Immersive Digital Healthcare: A Review

Contact Info

Product

Resources

About