2022
DOI: 10.1109/jstars.2022.3173349
|View full text |Cite
|
Sign up to set email alerts
|

Mapping Coastal Wetlands Using Transformer in Transformer Deep Network on China ZY1-02D Hyperspectral Satellite Images

Abstract: Coastal wetlands mapping is a big challenge in remote sensing fields because of similar spectrum of different ground objects and their severe fragmentation and spatial heterogeneity. In this paper, we propose a Hyperspectral Image Transformer iN Transformer (HSI-TNT) method for mapping coastal wetlands on ZY1-02D hyperspectral images, which uses two transformer deep networks to fuse local and global features. First, we put forward the idea that each hyperspectral pixel can be considered as a superpixel in spec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 39 publications
(6 citation statements)
references
References 49 publications
(49 reference statements)
0
2
0
Order By: Relevance
“…However, the locally sliding convolution operation of CNN with fixed kernels performs better at characterizing spatial context information and is not In terms of feature extraction, 3D-CNN is the most commonly used structure in the state-of-the-art hyperspectral classification method to integrally mine the spatial-spectral discrimination features [43]. However, the locally sliding convolution operation of CNN with fixed kernels performs better at characterizing spatial context information and is not suitable for the extraction of spectral sequence features, which will lead to information loss and reduce model efficiency [44]. While the characteristic of Transformer to extract sequence features can make up for this.…”
Section: Overview Of the Scstinmentioning
confidence: 99%
“…However, the locally sliding convolution operation of CNN with fixed kernels performs better at characterizing spatial context information and is not In terms of feature extraction, 3D-CNN is the most commonly used structure in the state-of-the-art hyperspectral classification method to integrally mine the spatial-spectral discrimination features [43]. However, the locally sliding convolution operation of CNN with fixed kernels performs better at characterizing spatial context information and is not suitable for the extraction of spectral sequence features, which will lead to information loss and reduce model efficiency [44]. While the characteristic of Transformer to extract sequence features can make up for this.…”
Section: Overview Of the Scstinmentioning
confidence: 99%
“…Considering the above, significant contributions have been made for HSIC, for instance, the Simple and Effective Spatial-Spectral (SESS) approach [31] involves the selection of representative spectral bands, the extraction of spatial characteristics at multiple scales, and the fusion of spatial-spectral features while preserving neighborhood information. In a similar vein, the work by Liu et al [32] introduces HSI transformers, a network that incorporates two transformers to handle both spatial and spectral information. This approach treats HSI pixels as superpixels and maintains spatial information through position encodings.…”
Section: Introductionmentioning
confidence: 99%
“…In the past two years, a transformer model combining self-attention has been proposed. In [46], a hyperspectral image transformer in transformer (HSI-TNT) method was proposed. In this method, two transformer deep networks were used to fuse local and global features, and the effectiveness of the method was finally demonstrated through a large number of experiments.…”
Section: Introductionmentioning
confidence: 99%