An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Zhu, Xizhou; Cheng, Dazhi; Zhang, Zheng; Lin, Stephen; Dai, Jifeng

doi:10.1109/iccv.2019.00679

Cited by 368 publications

(183 citation statements)

References 55 publications

Supporting

Mentioning

145

Contrasting

Order By: Relevance

“…At the beginning this approach was introduced to work in conjunction with recurrent neural network models, in order to combine the information extracted at different time stamps [35]. Successively, attention mechanisms were applied on 2D images [36] as well as to manage weak supervision and bag level classification [37], [27].…”

Section: B Attentive Aggregation Stepmentioning

confidence: 99%

Weakly Supervised Learning for Land Cover Mapping of Satellite Image Time Series via Attention-Based CNN

et al. 2020

View full text Add to dashboard Cite

The unprecedented possibility to acquire high resolution Satellite Image Time Series (SITS) data is opening new opportunities to monitor the different aspects of the Earth Surface but, at the same time, it is raising up new challenges in term of suitable methods to analyze and exploit such huge amount of rich image data. One of the main tasks associated to SITS data analysis is related to land cover mapping. Due to operational constraints, the collected label information is often limited in volume and obtained at coarse granularity level carrying out inexact and weak knowledge that can affect the whole process. To cope with such issues, in the context of object-based SITS land cover mapping, we propose a new deep learning framework, named T ASSEL (aTtentive weAkly Supervised Satellite image time sEries cLassifier), to deal with the weak supervision provided by the coarse granularity labels. Our framework exploits the multifaceted information conveyed by the object-based representation considering object components instead of aggregated object statistics. Furthermore, our framework also produces an additional outcome that supports the model interpretability. Quantitative and qualitative experimental evaluations are carried out on two real-world scenarios. Results indicate that not only T ASSEL outperforms the competing approaches in terms of predictive performances, but it also produces valuable extra information that can be practically exploited to interpret model decisions.

show abstract

Section: B Attentive Aggregation Stepmentioning

confidence: 99%

Weakly Supervised Learning for Land Cover Mapping of Satellite Image Time Series via Attention-Based CNN

et al. 2020

View full text Add to dashboard Cite

show abstract

“…While the sentence embedding model considers self-attention for concatenated output vectors of the forward and backward RNNs, we consider self-attention for output vectors of the forward and backward RNNs, independently, and additionally use the features from the self-attention for the forward and backward RNNs to estimate alleles for unobserved variants. We consider a simplified version of Transformer attention in [24,25] as the model…”

Section: Plos Computational Biologymentioning

confidence: 99%

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network

et al. 2020

View full text Add to dashboard Cite

Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a larger size of haplotype reference panel. Most of the existing genotype imputation methods explicitly require the haplotype reference panel in precise form, but the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors. Since de-identified information such as summary statistics or model parameters can be used publicly, imputation methods using de-identified haplotype reference information might be useful to enhance the quality of imputation results under the condition where the access of the haplotype data is limited. In this study, we proposed a novel imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network (RNN). The model parameters are presented in the form of de-identified information from which the restoration of the genotype data at the individual-level is almost impossible. We demonstrated that the proposed method provides comparable imputation accuracy when compared with the existing imputation methods using haplotype datasets from the 1000 Genomes Project (1KGP) and the Haplotype Reference Consortium. We also considered a scenario where a subset of haplotypes is made available only in de-identified form for the haplotype reference panel. In the evaluation using the 1KGP dataset under the scenario, the imputation accuracy of the proposed method is much higher than that of the existing imputation methods. We therefore conclude that our RNN-based method is quite promising to further promote the datasharing of sensitive genome data under the recent movement for the protection of individuals' privacy.

show abstract

“…Attention mechanisms. As argued in [76], spatial deformation modeling methods [28,37,10,48], including VTNs, can be viewed as hard attention mechanisms, in that they localize and attend to the discriminative image parts. Attention mechanisms in neural networks have quickly gained popularity in diverse computer vision and natural language processing tasks, such as relational reasoning among objects [4,52], image captioning [67], neural machine translation [3,61], image generation [68,71], and image recognition [23,63].…”

Section: Related Workmentioning

confidence: 99%

Volumetric Transformer Networks

Kim

Süsstrunk

Salzmann

2020

Computer Vision – ECCV 2020

View full text Add to dashboard Cite

Existing techniques to encode spatial invariance within deep convolutional neural networks (CNNs) apply the same warping field to all the feature channels. This does not account for the fact that the individual feature channels can represent different semantic parts, which can undergo different spatial transformations w.r.t. a canonical configuration. To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. We design our VTN as an encoder-decoder network, with modules dedicated to letting the information flow across the feature channels, to account for the dependencies between the semantic parts. We further propose a loss function defined between the warped features of pairs of instances, which improves the localization ability of VTN. Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.

show abstract

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Cited by 368 publications

References 55 publications

Weakly Supervised Learning for Land Cover Mapping of Satellite Image Time Series via Attention-Based CNN

Weakly Supervised Learning for Land Cover Mapping of Satellite Image Time Series via Attention-Based CNN

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network

Volumetric Transformer Networks

Contact Info

Product

Resources

About