Understanding More About Human and Machine Attention in Deep Neural Networks

Lai, Qiuxia; Khan, Salman; Nie, Yongwei; Sun, Hao; Shen, Jianbing; Shao, Ling

doi:10.1109/tmm.2020.3007321

Cited by 49 publications

(26 citation statements)

References 67 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the block, SA layer is utilized to make the important information more distinguishable by mimicking the human vision system (HVS) [15], which is composed of two group convolutional layers, one ReLU activation and one Sigmoid activation. Figure 3 shows the design of SA layer.…”

Section: B Residual Sr Blockmentioning

confidence: 99%

“…The key issue of NR-IQA is to build a metric that in consistence with the human vision system (HVS). According to the HVS, different areas of the images hold different importance for visual perception [15]. However, recent NR-IQA methods usually neglect to distinguish the visual sensitive information in the image, which restricts the effectiveness of prediction.…”

Section: Introductionmentioning

confidence: 99%

“…Another CNN branch is developed to explore the pixel-wise textural information with shallow layer extractors [13]. By mimicking the HVS [15] that pays more attention to the significant information, spatial attention mechanism is introduced to make the visual sensitive areas more distinguishable. Furthermore, feature normalization (F-Norm) is also developed to investigate the inherent spatial correlation of SR features [20], [21].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Textural-Perceptual Joint Learning for No-Reference Super-Resolution Image Quality Assessment

Liu¹,

Jia²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Image super-resolution (SR) has been widely investigated in recent years. However, it is challenging to fairly estimate the performances of various SR methods, as the lack of reliable and accurate criteria for perceptual quality. Existing SR image quality assessment (IQA) metrics usually concentrate on the specific kind of degradation without distinguishing the visual sensitive areas, which have no adaptive ability to describe the diverse SR degeneration situations. In this paper, we focus on the textural and structural degradation of image SR which acts as a critical role for visual perception, and design a dual stream network to jointly explore the textural and structural information for quality prediction, dubbed TSNet. By mimicking the human vision system (HVS) that pays more attention to the significant areas of the image, we develop the spatial attention mechanism to make the visual-sensitive areas more distinguishable, which improves the prediction accuracy. Feature normalization (F-Norm) is also developed to investigate the inherent spatial correlation of SR features and boost the network representation capacity. Experimental results show the proposed TSNet predicts the visual quality more accurate than the state-ofthe-art IQA methods, and demonstrates better consistency with the human's perspective. The source code will be made available at http://github.com/yuqing-liu-dut/NRIQA SR.

show abstract

Section: B Residual Sr Blockmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Textural-Perceptual Joint Learning for No-Reference Super-Resolution Image Quality Assessment

Liu¹,

Jia²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…are more likely to be followed by the human gaze. Inspired by a biological mechanism known as human attention [17], the UVOS system should have remarkable motion perception capabilities to quickly orient gaze to moving objects in dynamic scenes. We argue that the primary object(s) in a video should be (i) the most distinguishable in a single frame, (ii) repeatedly appearing throughout the video sequence, and (iii) moving objects in the video.…”

Section: Introductionmentioning

confidence: 99%

Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation

Lin¹,

Chen²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

“…However, a natural question is whether ANNs select information in the same way, and in particular whether they attend to the same visual regions as humans when extracting information for visual object recognition and localization. While prior work has developed ANNs trained explicitly to predict human visual gaze [14], and even incorporated simulated foveated systems into the model design [15], comparatively little work comparing human attention to computational attention [16,17,18] has attempted a comprehensive examination of how ANNs compare to humans using a variety of human visual selectivity measures as well as the wide range of interpretability techniques that are currently available to probe what visual information ANNs use.…”

Section: Introductionmentioning

confidence: 99%

Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity

Langlois¹,

Zhao²,

Grant³

et al. 2021

Preprint

View full text Add to dashboard Cite

Developments in machine learning interpretability techniques over the past decade have provided new tools to observe the image regions that are most informative for classification and localization in artificial neural networks (ANNs). Are the same regions similarly informative to human observers? Using data from 78 new experiments and 6,610 participants, we show that passive attention techniques reveal a significant overlap with human visual selectivity estimates derived from 6 distinct behavioral tasks including visual discrimination, spatial localization, recognizability, free-viewing, cued-object search, and saliency search fixations. We find that input visualizations derived from relatively simple ANN architectures probed using guided backpropagation methods are the best predictors of a shared component in the joint variability of the human measures. We validate these correlational results with causal manipulations using recognition experiments. We show that images masked with ANN attention maps were easier for humans to classify than control masks in a speeded recognition experiment. Similarly, we find that recognition performance in the same ANN models was likewise influenced by masking input images using human visual selectivity maps. This work contributes a new approach to evaluating the biological and psychological validity of leading ANNs as models of human vision: by examining their similarities and differences in terms of their visual selectivity to the information contained in images.Preprint. Under review.

show abstract

Understanding More About Human and Machine Attention in Deep Neural Networks

Cited by 49 publications

References 67 publications

Textural-Perceptual Joint Learning for No-Reference Super-Resolution Image Quality Assessment

Textural-Perceptual Joint Learning for No-Reference Super-Resolution Image Quality Assessment

Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation

Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity

Contact Info

Product

Resources

About