Proceedings of ACL 2017, Student Research Workshop 2017
DOI: 10.18653/v1/p17-3008
|View full text |Cite
|
Sign up to set email alerts
|

Segmentation Guided Attention Networks for Visual Question Answering

Abstract: In this paper we propose to solve the problem of Visual Question Answering by using a novel segmentation guided attention based network which we call SegAttendNet. We use image segmentation maps, generated by a Fully Convolutional Deep Neural Network to refine our attention maps and use these refined attention maps to make the model focus on the relevant parts of the image to answer a question. The refined attention maps are used by the LSTM network to learn to produce the answer. We presently train our model … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…Recently, the use of ML models trained on multimodal data has gained traction, particularly the combination of image and text data modalities. Several papers have shown that multimodal models may provide some resilience against attacks [328], but other papers show that multimodal models themselves could be vulnerable to attacks mounted on all modalities at the same time [63,261,326]. See Section 4.6 for additional discussion.…”
Section: Cybersecuritymentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, the use of ML models trained on multimodal data has gained traction, particularly the combination of image and text data modalities. Several papers have shown that multimodal models may provide some resilience against attacks [328], but other papers show that multimodal models themselves could be vulnerable to attacks mounted on all modalities at the same time [63,261,326]. See Section 4.6 for additional discussion.…”
Section: Cybersecuritymentioning
confidence: 99%
“…Without such an effort, single modality attacks can be effective and compromise multimodal models across a wide range of multimodal tasks despite the information contained in the remaining unperturbed modalities [328,335]. Moreover, researchers have devised effcient mechanisms for constructing simultaneous attacks on multiple modalities, which suggests that multimodal models might not be more robust against adversarial attacks despite improved performance [63,261,326].…”
Section: Tradeofs Between the Attributes Of Trustworthy Aimentioning
confidence: 99%
See 1 more Smart Citation
“…The field of multi-modal learning had lots of progress [2,16,19,18,17] in recent years for cross-modal understanding. The line of work in Trojan attacks on multi-modal models [20,5,27] is usually limited to a single modality, investigating the robustness to single modality Trojan and how the presence of such a Trojan would affect the multimodal model performance. For instance, Attend and Attack [20] generates adversarial visual inputs to fool a visual question answering (VQA) model [30,31,3,7,21] through a compromised attention map.…”
Section: Related Workmentioning
confidence: 99%
“…The line of work in Trojan attacks on multi-modal models [20,5,27] is usually limited to a single modality, investigating the robustness to single modality Trojan and how the presence of such a Trojan would affect the multimodal model performance. For instance, Attend and Attack [20] generates adversarial visual inputs to fool a visual question answering (VQA) model [30,31,3,7,21] through a compromised attention map. Chaturvedi et al [5] presented a targeted adversarial attack on VQA using adversarial background noise in the vision input.…”
Section: Related Workmentioning
confidence: 99%