2022
DOI: 10.1109/jstars.2022.3210707
|View full text |Cite
|
Sign up to set email alerts
|

FTC-Net: Fusion of Transformer and CNN Features for Infrared Small Target Detection

Abstract: Single-frame infrared small target detection is still a challenging task due to the complex background and unobvious structural characteristics of small targets. Recently, convolutional neural networks (CNN) began to appear in the field of infrared small target detection and have been widely used for excellent performance. However, existing CNN-based methods mainly focus on local spatial features while ignoring the long-range contextual dependencies between small targets and backgrounds. To capture the global … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 34 publications
(11 citation statements)
references
References 42 publications
(52 reference statements)
0
7
0
Order By: Relevance
“…Furthermore, some deep learning algorithms [36][37][38][39] are also used to detect small targets. Qi et al provided a fusion network of Transformer and a CNN (FTC-Net) [37], which extracts local detail features and global contextual features.…”
Section: The Deep Learning Methodsmentioning
confidence: 99%
“…Furthermore, some deep learning algorithms [36][37][38][39] are also used to detect small targets. Qi et al provided a fusion network of Transformer and a CNN (FTC-Net) [37], which extracts local detail features and global contextual features.…”
Section: The Deep Learning Methodsmentioning
confidence: 99%
“…This makes CNN more suitable for extracting and encoding detailed features from low-level semantic feature layers [27][28].Therefore, in the design of the hybrid encoder for the MS-DETR model, we utilized a CNN structure in the feature extraction module to extract detailed information about weeds from the lowlevel semantic feature layer. When using the CNN network to extract low-level details, appropriately expanding the receptive eld of the CNN network enables it to capture richer features of the target and surrounding background areas, thereby improving the quality of small target detection [29][30]. Dilation convolution, compared to regular convolution, can enlarge the receptive eld, obtaining broader and richer features, which is crucial for detecting small targets of different scales [31][32].…”
Section: 1: Methodology Employed In This Studymentioning
confidence: 99%
“…Then, they adopted ViT to learn high-level information of target localization from local features. Next, Qi et.al [39] proposed fusion network architecture of transformer and CNN (FTC-Net), which consists of two branches. The CNN-based branch uses a U-Net with skip connections to obtain low-level local details of small targets.…”
Section: Vision Transformermentioning
confidence: 99%