A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Ullah, Ubaid; Lee, Jeong-Sik; An, Chang-Hyeon; Lee, Hyeonjin; Park, Su-Yeong; Baek, Rock‐Hyun; Choi, Hyun‐Chul

doi:10.3390/s22186816

Cited by 4 publications

(4 citation statements)

References 451 publications

(462 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Text Evaluation Methodsmentioning

confidence: 99%

“…Evaluation methods and metrics are needed to determine the validity of auto-generated captions [63,67]. Popular evaluation metrics are shown in Table 3, but more extensive reviews currently exist in the literature [63,87]. The MS COCO Dataset Challenge uses BLEU, ROUGE, METEOR, CIDEr, and SPICE to evaluate performance, so these have become the status quo for evaluating the similarity between texts [74].…”

Section: Text Evaluation Methodsmentioning

confidence: 99%

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

“…As GAI focuses on using AI to generate a new creation, visual GAI focuses on the translation between text and visualization [63]. The flow of translation can occur in either direction, either by taking text and transforming it into an image or by taking an image and deriving a description or caption [63][64][65][66][67]. Previous similar studies include [67,68]; however, we differentiate ourselves by utilizing different image-to-text and text-to-image generators, text prompts, and evaluation metrics.…”

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

See 3 more Smart Citations

Uncertainty in Visual Generative AI

Combs,

Moyer,

Bihl

2024

Algorithms

View full text Add to dashboard Cite

Recently, generative artificial intelligence (GAI) has impressed the world with its ability to create text, images, and videos. However, there are still areas in which GAI produces undesirable or unintended results due to being “uncertain”. Before wider use of AI-generated content, it is important to identify concepts where GAI is uncertain to ensure the usage thereof is ethical and to direct efforts for improvement. This study proposes a general pipeline to automatically quantify uncertainty within GAI. To measure uncertainty, the textual prompt to a text-to-image model is compared to captions supplied by four image-to-text models (GIT, BLIP, BLIP-2, and InstructBLIP). Its evaluation is based on machine translation metrics (BLEU, ROUGE, METEOR, and SPICE) and word embedding’s cosine similarity (Word2Vec, GloVe, FastText, DistilRoBERTa, MiniLM-6, and MiniLM-12). The generative AI models performed consistently across the metrics; however, the vector space models yielded the highest average similarity, close to 80%, which suggests more ideal and “certain” results. Suggested future work includes identifying metrics that best align with a human baseline to ensure quality and consideration for more GAI models. The work within can be used to automatically identify concepts in which GAI is “uncertain” to drive research aimed at increasing confidence in these areas.

show abstract

Section: Text Evaluation Methodsmentioning

confidence: 99%

Section: Text Evaluation Methodsmentioning

confidence: 99%

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

Section: Image Retrieval and Visual Gaimentioning

confidence: 99%

See 2 more Smart Citations

Uncertainty in Visual Generative AI

Combs,

Moyer,

Bihl

2024

Algorithms

View full text Add to dashboard Cite

show abstract

Exploring the Role of Mathematical Modelling in Automatic Scene Generation amidst Rapid Technological Advances

Kaur,

Khurana

2023

2023 4th International Conference on Data Analytics for Business and Industry (ICDABI)

View full text Add to dashboard Cite

A Multi-Modal Story Generation Framework with AI-Driven Storyline Guidance

Kim

Heo

et al. 2023

Electronics

View full text Add to dashboard Cite

An automatic story generation system continuously generates stories with a natural plot. The major challenge of automatic story generation is to maintain coherence between consecutive generated stories without the need for human intervention. To address this, we propose a novel multi-modal story generation framework that includes automated storyline decision-making capabilities. Our framework consists of three independent models: a transformer encoder-based storyline guidance model, which predicts a storyline using a multiple-choice question-answering problem; a transformer decoder-based story generation model that creates a story that describes the storyline determined by the guidance model; and a diffusion-based story visualization model that generates a representative image visually describing a scene to help readers better understand the story flow. Our proposed framework was extensively evaluated through both automatic and human evaluations, which demonstrate that our model outperforms the previous approach, suggesting the effectiveness of our storyline guidance model in making proper plans.

show abstract

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Cited by 4 publications

References 451 publications

Uncertainty in Visual Generative AI

Uncertainty in Visual Generative AI

Exploring the Role of Mathematical Modelling in Automatic Scene Generation amidst Rapid Technological Advances

A Multi-Modal Story Generation Framework with AI-Driven Storyline Guidance

Contact Info

Product

Resources

About