2018
DOI: 10.1007/978-3-030-01228-1_48
|View full text |Cite
|
Sign up to set email alerts
|

Visual Reasoning with Multi-hop Feature Modulation

Abstract: Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(15 citation statements)
references
References 36 publications
(65 reference statements)
0
15
0
Order By: Relevance
“…Referring expression grounding, also known as referring expression comprehension, is often formulated as an object retrieval task [11,26]. [39,23,41] explored context information in images, and [31] proposed multi-step reasoning by multi-hop Feature-wise Linear Modulation. Hu et al [10] proposed compositional modular networks, composed of a localization module and a relationship module, to identify subjects, objects and their relationships.…”
Section: Related Workmentioning
confidence: 99%
“…Referring expression grounding, also known as referring expression comprehension, is often formulated as an object retrieval task [11,26]. [39,23,41] explored context information in images, and [31] proposed multi-step reasoning by multi-hop Feature-wise Linear Modulation. Hu et al [10] proposed compositional modular networks, composed of a localization module and a relationship module, to identify subjects, objects and their relationships.…”
Section: Related Workmentioning
confidence: 99%
“…To the best of our knowledge, all existing work use the same baseline Oracle [8] except [32]. We compare the performance of the baseline oracles with the proposed VilBERT-Oracle.…”
Section: The Oracle Modelmentioning
confidence: 99%
“…Guesser model is evaluated by classification error rate. The 2 baseline models [6]: HRED, HRED-VGG, 3 attention-based models PLAN [28], A-ATT [7], HACAN [25], and 2 Feature-wise Linear Modulation (FiLM) models: single-hop FiLM [14], multi-hop FiLM [23], are compared. Table 3 compares the test error of Guess models.…”
Section: Evaluation Metric and Comparison Modelsmentioning
confidence: 99%