“…Our results showed that the effect of a good training schedule was much greater than changing models (as long as model size was similar). Our ResNet-50 performed very similarly to our Swin-models, and greatly outperformed Wu's [7] ResNet-50 from the original MTARSI paper, as well as all models by previous authors [9] [10] [8] [11] on the MTARSI dataset. We attribute the performance of our models to: 1) having chosen a good training regiment, 2) using pretrained ImageNet weights and normalizing our dataset to ImageNet's mean and standard deviation (many previous authors have used pre-trained weights but have not performed the additional normalization step).…”
Section: B Results and Problems With Mtarsi Dataset Generalizabilitysupporting
confidence: 53%
“…These images were then labelled into distinct aircraft models/classes for classification. Recent research (within past 2 years) have achieved good results on the dataset using a variety of classical models, Convolutional Neural Networks (CNNs), and mixed classical/deep learning methods [7] [8] [9] [10] [11]. The results of Azam and al.…”
Section: ) Datasetsmentioning
confidence: 99%
“…Validation Accuracy AlexNet [7] 85.6 VGG [7] 87.7 GoogLeNet [7] 86.6 ResNet [7] 89.6 DenseNet [7] 89.1 EfficientNet [7] 89.8 Zhao's method [8] 78.1 FGATR-Net [10] 93.8 SRARNet-Net [11] 93.4 LinearSVM(CNN-PCA) [9] 96. aspect ratio. We then center-cropped the images to obtain 224 by 224 images.…”
Aircraft classification via remote sensing images has many commercial and military applications. The Swin-Transformer has shown great promise, recently dominating general purpose image classification benchmarks such as ImageNet. In this manuscript, we test whether the performance of Swin-Transformer on general purpose image classification translate to domain specific aircraft classification using the Multi-Type Aircraft from Remote Sensing Images dataset. We also investigate the effect of training procedure vs. model selection on the validation score. Our carefully trained Swin-Transformer model achieved an impressive 99.4 % validation set accuracy without super-resolution, and 99.5 % with super-resolution. Moreover, the generalization of models trained on the MTARSI dataset to real-world and synthetic aircraft classification is evaluated with some out-of-distribution samples. Our results demonstrated that the lack of complexity and heterogeneity of the MTARSI dataset, and the labelling errors resulted in models which struggle to achieve high accuracy on the adopted test samples despite near perfect validation scores.
“…Our results showed that the effect of a good training schedule was much greater than changing models (as long as model size was similar). Our ResNet-50 performed very similarly to our Swin-models, and greatly outperformed Wu's [7] ResNet-50 from the original MTARSI paper, as well as all models by previous authors [9] [10] [8] [11] on the MTARSI dataset. We attribute the performance of our models to: 1) having chosen a good training regiment, 2) using pretrained ImageNet weights and normalizing our dataset to ImageNet's mean and standard deviation (many previous authors have used pre-trained weights but have not performed the additional normalization step).…”
Section: B Results and Problems With Mtarsi Dataset Generalizabilitysupporting
confidence: 53%
“…These images were then labelled into distinct aircraft models/classes for classification. Recent research (within past 2 years) have achieved good results on the dataset using a variety of classical models, Convolutional Neural Networks (CNNs), and mixed classical/deep learning methods [7] [8] [9] [10] [11]. The results of Azam and al.…”
Section: ) Datasetsmentioning
confidence: 99%
“…Validation Accuracy AlexNet [7] 85.6 VGG [7] 87.7 GoogLeNet [7] 86.6 ResNet [7] 89.6 DenseNet [7] 89.1 EfficientNet [7] 89.8 Zhao's method [8] 78.1 FGATR-Net [10] 93.8 SRARNet-Net [11] 93.4 LinearSVM(CNN-PCA) [9] 96. aspect ratio. We then center-cropped the images to obtain 224 by 224 images.…”
Aircraft classification via remote sensing images has many commercial and military applications. The Swin-Transformer has shown great promise, recently dominating general purpose image classification benchmarks such as ImageNet. In this manuscript, we test whether the performance of Swin-Transformer on general purpose image classification translate to domain specific aircraft classification using the Multi-Type Aircraft from Remote Sensing Images dataset. We also investigate the effect of training procedure vs. model selection on the validation score. Our carefully trained Swin-Transformer model achieved an impressive 99.4 % validation set accuracy without super-resolution, and 99.5 % with super-resolution. Moreover, the generalization of models trained on the MTARSI dataset to real-world and synthetic aircraft classification is evaluated with some out-of-distribution samples. Our results demonstrated that the lack of complexity and heterogeneity of the MTARSI dataset, and the labelling errors resulted in models which struggle to achieve high accuracy on the adopted test samples despite near perfect validation scores.
“…The proper detection of tiny blurring airplanes in complicated airport photos is achieved by using an efficient deep belief network (DBN) [ 23 ] to rebuild high-resolution features from numerous input images, including grayscale images and two locally thresholded images. By creating high-resolution aircraft from low-resolution remote sensing images, Tang et al [ 24 ] proposed a joint super-resolution and aircraft recognition (Joint-SRARNet) SRARNet to enhance aircraft recognition performance. However, there is still a lack of study on the topic of aircraft pose estimation at low resolution, requiring further research.…”
The introduction of various deep neural network architectures has greatly advanced aircraft pose estimation using high-resolution images. However, realistic airport surface monitors typically take low-resolution (LR) images, and the results of the aircraft pose estimation are far from being accurate enough to be considered acceptable because of long-range capture. To fill this gap, we propose a brand-new, end-to-end low-resolution aircraft pose estimate network (LRF-SRNet) to address the problem of estimating the pose of poor-quality airport surface surveillance aircraft images. The method successfully combines the pose estimation method with the super-resolution (SR) technique. Specifically, to reconstruct high-resolution aircraft images, a super-resolution network (SRNet) is created. In addition, an essential component termed the large receptive field block (LRF block) helps estimate the aircraft’s pose. By broadening the neural network’s receptive field, it enables the perception of the aircraft’s structure. Experimental results demonstrate that, on the airport surface surveillance dataset, our method performs significantly better than the most widely used baseline methods, with AP exceeding Baseline and HRNet by 3.1% and 4.5%.
“…Compared with fully supervised object detection (FSOD) [1][2][3][4][5][6][7][8], the major advantage of weakly supervised object detection (WSOD) is that only image-level category annotations are necessary for training the WSOD model. Considering the low cost of data labeling, WSOD has been widely researched in recent years [9][10][11][12][13][14][15][16][17] and has been applied in scene classification [18,19], disaster detection [20,21], military [22,23], and other applications [24][25][26][27][28][29].…”
Weakly supervised object detection (WSOD) in remote sensing images (RSIs) aims to detect high-value targets by solely utilizing image-level category labels; however, two problems have not been well addressed by existing methods. Firstly, the seed instances (SIs) are mined solely relying on the category score (CS) of each proposal, which is inclined to concentrate on the most salient parts of the object; furthermore, they are unreliable because the robustness of the CS is not sufficient due to the fact that the inter-category similarity and intra-category diversity are more serious in RSIs. Secondly, the localization accuracy is limited by the proposals generated by the selective search or edge box algorithm. To address the first problem, a segment anything model (SAM)-induced seed instance-mining (SSIM) module is proposed, which mines the SIs according to the object quality score, which indicates the comprehensive characteristic of the category and the completeness of the object. To handle the second problem, a SAM-based pseudo-ground truth-mining (SPGTM) module is proposed to mine the pseudo-ground truth (PGT) instances, for which the localization is more accurate than traditional proposals by fully making use of the advantages of SAM, and the object-detection heads are trained by the PGT instances in a fully supervised manner. The ablation studies show the effectiveness of the SSIM and SPGTM modules. Comprehensive comparisons with 15 WSOD methods demonstrate the superiority of our method on two RSI datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.