Vision Transformers for Remote Sensing Image Classification

Bazi, Yakoub; Bashmal, Laila; Rahhal, Mohamad Mahmoud Al; Dayil, Reham Al; Ajlan, Naif Al

doi:10.3390/rs13030516

Cited by 288 publications

(136 citation statements)

References 46 publications

(53 reference statements)

Supporting

Mentioning

135

Contrasting

Order By: Relevance

“…From this, the preceding material, the Butterworth filter is manually calibrated, and a spectrogram is used to ensure that the waveforms have been recovered [40]. While the accuracy is increased, a downside of this approach is that other birds might still be making the same signal, which is when used alongside it [41]. But in this case, bird species have frequencies that are close to one another and appear to possess identical characteristics separating the various recordings of birds' calls to increase the success of singlelabel methods results would be very impractical so doing so would only make the success of the results of each form of detection less likely [42].…”

Section: Pre-processingmentioning

confidence: 99%

Birds Sound Classification Based on Machine Learning Algorithms

Mehyadin

Abdulazeez

Hasan

et al. 2021

AJRCoS

View full text Add to dashboard Cite

The bird classifier is a system that is equipped with an area machine learning technology and uses a machine learning method to store and classify bird calls. Bird species can be known by recording only the sound of the bird, which will make it easier for the system to manage. The system also provides species classification resources to allow automated species detection from observations that can teach a machine how to recognize whether or classify the species. Non-undesirable noises are filtered out of and sorted into data sets, where each sound is run via a noise suppression filter and a separate classification procedure so that the most useful data set can be easily processed. Mel-frequency cepstral coefficient (MFCC) is used and tested through different algorithms, namely Naïve Bayes, J4.8 and Multilayer perceptron (MLP), to classify bird species. J4.8 has the highest accuracy (78.40%) and is the best. Accuracy and elapsed time are (39.4 seconds).

show abstract

Section: Pre-processingmentioning

confidence: 99%

Birds Sound Classification Based on Machine Learning Algorithms

Mehyadin

Abdulazeez

Hasan

et al. 2021

AJRCoS

View full text Add to dashboard Cite

show abstract

“…In the first set of experiments, we tested the effect of using data augmentation on the classification results. Experimental results on single-label scene classification have shown that using a combination of data augmentation techniques can improve the classification results [25]. Therefore, we augmented the dataset with additional samples using random flipping, rotations, and cutout during training.…”

Section: Experiments 1: the Effect Of Data Augmentationmentioning

confidence: 99%

“…Meanwhile, a new type of deep-learning model known as transformers has been developed for natural language processing (NLP) and has started to gain some popularity in computer vision [24] and remote sensing communities [25,26]. A transformer is an architecture that was first introduced by Vaswani et al in 2017 for machine translation [27].…”

Section: Introductionmentioning

confidence: 99%

UAV Image Multi-Labeling with Data-Efficient Transformers

et al. 2021

Self Cite

View full text Add to dashboard Cite

In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model.

show abstract

“…e experimental results showed that the vision transformer classification accuracy rate of remote sensing images exceeds the CNN model [15].…”

Section: Introductionmentioning

confidence: 96%

Method for Diagnosis of Acute Lymphoblastic Leukemia Based on ViT-CNN Ensemble Model

Jiang

Dong

Wang

et al. 2021

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Acute lymphocytic leukemia (ALL) is a deadly cancer that not only affects adults but also accounts for about 25% of childhood cancers. Timely and accurate diagnosis of the cancer is an important premise for effective treatment to improve survival rate. Since the image of leukemic B-lymphoblast cells (cancer cells) under the microscope is very similar in morphology to that of normal B-lymphoid precursors (normal cells), it is difficult to distinguish between cancer cells and normal cells. Therefore, we propose the ViT-CNN ensemble model to classify cancer cells images and normal cells images to assist in the diagnosis of acute lymphoblastic leukemia. The ViT-CNN ensemble model is an ensemble model that combines the vision transformer model and convolutional neural network (CNN) model. The vision transformer model is an image classification model based entirely on the transformer structure, which has completely different feature extraction method from the CNN model. The ViT-CNN ensemble model can extract the features of cells images in two completely different ways to achieve better classification results. In addition, the data set used in this article is an unbalanced data set and has a certain amount of noise, and we propose a difference enhancement-random sampling (DERS) data enhancement method, create a new balanced data set, and use the symmetric cross-entropy loss function to reduce the impact of noise in the data set. The classification accuracy of the ViT-CNN ensemble model on the test set has reached 99.03%, and it is proved through experimental comparison that the effect is better than other models. The proposed method can accurately distinguish between cancer cells and normal cells and can be used as an effective method for computer-aided diagnosis of acute lymphoblastic leukemia.

show abstract

Vision Transformers for Remote Sensing Image Classification

Cited by 288 publications

References 46 publications

Birds Sound Classification Based on Machine Learning Algorithms

Birds Sound Classification Based on Machine Learning Algorithms

UAV Image Multi-Labeling with Data-Efficient Transformers

Method for Diagnosis of Acute Lymphoblastic Leukemia Based on ViT-CNN Ensemble Model

Contact Info

Product

Resources

About