Human Attention in Visual Question Answering: Do Humans and Deep
            Networks look at the same regions?

Das, Abhishek; Agrawal, Harsh; Zitnick, Larry; Parikh, Devi; Batra, Dhruv

doi:10.18653/v1/d16-1092

Cited by 225 publications

(376 citation statements)

References 16 publications

Supporting

Mentioning

369

Contrasting

Unclassified

Order By: Relevance

“…We evaluate the proposed GCA methods and have provided both quantitative analysis and qualitative analysis. The former includes: i) Ablation analysis of proposed models (Section-VII-B1), ii) Analysis of uncertainty effect on answer predictions ( Figure-7 (a,b)), iii) Differences of Top-2 softmax scores for answers for some representative questions ( Figure-7 (c,d)) and iv) Comparison of attention map of our proposed uncertainty model against other variants using Rank correlation (RC) and Earth Mover Distance (EMD) [45] as shown in Table-IV for VQA-HAT [34] and in Table-III for VQA-X [46] . Finally, we compare PGCA with state of the art methods, as mentioned in Section-VII-D.…”

Section: Methodsmentioning

confidence: 99%

U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps

Patro

Lunayach

Patel

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering task. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions. The improved attention maps result in consistent improvement for various methods for visual question answering. Therefore, the proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanations for deep learning models. We provide detailed empirical analysis for the visual question answering task on all standard benchmarks and comparison with state of the art methods.

show abstract

Section: Methodsmentioning

confidence: 99%

U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps

Patro

Lunayach

Patel

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…In recent years, some researchers have made different proposals to contract RNNs with a memory module or an attention module. Proposals include the Neural Turing Machine [131] and Attention Network [132], which have yielded excellent performance on standard question-answering and video-storying tasks [133,134].…”

Section: An Overview Of Deep Learningmentioning

confidence: 99%

State-of-the-Art Mobile Intelligence: Enabling Robots to Move Like Humans by Estimating Mobility with Artificial Intelligence

Kong

Bai

et al. 2018

Applied Sciences

View full text Add to dashboard Cite

Mobility is a significant robotic task. It is the most important function when robotics is applied to domains such as autonomous cars, home service robots, and autonomous underwater vehicles. Despite extensive research on this topic, robots still suffer from difficulties when moving in complex environments, especially in practical applications. Therefore, the ability to have enough intelligence while moving is a key issue for the success of robots. Researchers have proposed a variety of methods and algorithms, including navigation and tracking. To help readers swiftly understand the recent advances in methodology and algorithms for robot movement, we present this survey, which provides a detailed review of the existing methods of navigation and tracking. In particular, this survey features a relation-based architecture that enables readers to easily grasp the key points of mobile intelligence. We first outline the key problems in robot systems and point out the relationship among robotics, navigation, and tracking. We then illustrate navigation using different sensors and the fusion methods and detail the state estimation and tracking models for target maneuvering. Finally, we address several issues of deep learning as well as the mobile intelligence of robots as suggested future research topics. The contributions of this survey are threefold. First, we review the literature of navigation according to the applied sensors and fusion method. Second, we detail the models for target maneuvering and the existing tracking based on estimation, such as the Kalman filter and its series developed form, according to their model-construction mechanisms: linear, nonlinear, and non-Gaussian white noise. Third, we illustrate the artificial intelligence approach-especially deep learning methods-and discuss its combination with the estimation method.

show abstract

“…For example, though neural networks were previously thought by many to be inscrutable, 16 new research suggests this may be actually possible at some point. 12,49 If successful, this might give to rise to the ability to interpret networks learned by neuromorphic chips.…”

Section: Leveraging the Distinctiveness Of Hpc As An Opportunitymentioning

confidence: 99%