The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2022
DOI: 10.48550/arxiv.2207.00056
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MultiViz: Towards Visualizing and Understanding Multimodal Models

Abstract: The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model behavior, perform model debugging, and promote trust in machine learning models. However, modern multimodal models are typically black-box neural networks, which makes it challenging to understand their internal mechanics. How can we visualize the internal modeling of multimodal interactions in these models? … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 56 publications
(143 reference statements)
0
2
0
Order By: Relevance
“…It uses four modality attention heads: languageto-vision attention, vision-to-language attention, languageto-language attention, and vision-to-vision attention, allowing it to look at interactions within and between modalities. MULTIVIZ (Liang et al 2022) is another method to analyze multimodal models interpreting unimodal interactions, cross-modal interactions, multi-modal representations, and multimodal prediction. gScoreCAM (Chen et al 2022) studied the CLIP (Radford et al 2021) model specifically to understand large multimodal models.…”
Section: Gradient-based and Visualization-based Methodsmentioning
confidence: 99%
“…It uses four modality attention heads: languageto-vision attention, vision-to-language attention, languageto-language attention, and vision-to-vision attention, allowing it to look at interactions within and between modalities. MULTIVIZ (Liang et al 2022) is another method to analyze multimodal models interpreting unimodal interactions, cross-modal interactions, multi-modal representations, and multimodal prediction. gScoreCAM (Chen et al 2022) studied the CLIP (Radford et al 2021) model specifically to understand large multimodal models.…”
Section: Gradient-based and Visualization-based Methodsmentioning
confidence: 99%
“…While the experiments performed only involve the visual and text modalities due to the high computational cost of the method, user evaluations show that DIME can help researchers determine which unimodal or multimodal contributions are the dominant factors behind the model's prediction. An improvement of DIME aimed at improving its scalability is introduced in MULTIVIZ [183], a tool for analyzing the behavior of multimodal models that scaffold the problem of interpretability into unimodal importance, cross-modal interactions, multimodal representations, and multimodal predictions.…”
Section: A Postmodel Xai Applied On Discrete Sets Of Unimodal Inputsmentioning
confidence: 99%