A survey of the vision transformers and their CNN-transformer based variants

Khan, Asifullah; Rauf, Zunaira; Sohail, Anabia; Khan, Abdul Rehman; Asif, Hifsa; Asif, Aqsa; Farooq, Umair

doi:10.1007/s10462-023-10595-0

Cited by 28 publications

(5 citation statements)

References 220 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To validate the generalizability of the method proposed in this paper, land cover classification experiments were conducted on the Houston 2013 dataset. Our method was compared with traditional machine learning algorithms and state-of-the-art methods in the field of deep learning, including CCF [53], CoSpace [54], Co-CNN [55], FusAT-Net [56], ViT [57], S2FL [49], Spectral-Former [58], CCR-Net [2], MFT [52], and DIMNet [59]. The specific results are shown in Figure 16, where (a) displays the DSM of LiDAR data, (b) shows the heatmap, (c) represents the three-band color composite for HSI spectral information, (d) shows the train ground-truth map, (e) shows the test ground-truth map, and (f) illustrates the classification results, with good contrast post-reconstruction.…”

Section: Land Cover Classification Experiments On the Houston 2013 Da...mentioning

confidence: 99%

“…To validate the generalizability of the method proposed in this paper, land cover classification experiments were conducted on the Houston 2013 dataset. Our method was compared with traditional machine learning algorithms and state-of-the-art methods in the field of deep learning, including CCF [53], CoSpace [54], Co-CNN [55], FusAT-Net [56], ViT [57], S2FL [49], Spectral-Former [58], CCR-Net [2], MFT [52], and DIMNet [59]. The specific results are shown in Figure 16, where 4 for comparison, where the top outcomes are highlighted in bold.…”

Section: Land Cover Classification Experiments On the Houston 2013 Da...mentioning

confidence: 99%

See 1 more Smart Citation

HGR Correlation Pooling Fusion Framework for Recognition and Classification in Multimodal Remote Sensing Data

Zhang,

Huang,

Kuruoglu

2024

Remote Sensing

View full text Add to dashboard Cite

This paper investigates remote sensing data recognition and classification with multimodal data fusion. Aiming at the problems of low recognition and classification accuracy and the difficulty in integrating multimodal features in existing methods, a multimodal remote sensing data recognition and classification model based on a heatmap and Hirschfeld–Gebelein–Rényi (HGR) correlation pooling fusion operation is proposed. A novel HGR correlation pooling fusion algorithm is developed by combining a feature fusion method and an HGR maximum correlation algorithm. This method enables the restoration of the original signal without changing the value of transmitted information by performing reverse operations on the sample data. This enhances feature learning for images and improves performance in specific tasks of interpretation by efficiently using multi-modal information with varying degrees of relevance. Ship recognition experiments conducted on the QXS-SROPT dataset demonstrate that the proposed method surpasses existing remote sensing data recognition methods. Furthermore, land cover classification experiments conducted on the Houston 2013 and MUUFL datasets confirm the generalizability of the proposed method. The experimental results fully validate the effectiveness and significant superiority of the proposed method in the recognition and classification of multimodal remote sensing data.

show abstract

Section: Land Cover Classification Experiments On the Houston 2013 Da...mentioning

confidence: 99%

“…To validate the generalizability of the method proposed in this paper, land cover classification experiments were conducted on the Houston 2013 dataset. Our method was compared with traditional machine learning algorithms and state-of-the-art methods in the field of deep learning, including CCF [53], CoSpace [54], Co-CNN [55], FusAT-Net [56], ViT [57], S2FL [49], Spectral-Former [58], CCR-Net [2], MFT [52], and DIMNet [59]. The specific results are shown in Figure 16, where 4 for comparison, where the top outcomes are highlighted in bold.…”

Section: Land Cover Classification Experiments On the Houston 2013 Da...mentioning

confidence: 99%

HGR Correlation Pooling Fusion Framework for Recognition and Classification in Multimodal Remote Sensing Data

Zhang,

Huang,

Kuruoglu

2024

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…Recently, transformers have succeeded in various application fields, such as natural language processing (Kalyan et al, 2022) and computer vision (Khan et al, 2023). Its attention mechanism, capable of learning connections between sequence elements, has led to the development of transformer-based models like autoformer (Wu et al, 2021) and PatchTST (Nie et al, 2022) for time series representation.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, transformers have succeeded in various application fields, such as natural language processing (Kalyan et al. , 2022) and computer vision (Khan et al. , 2023).…”

Section: Related Workmentioning

confidence: 99%

Advancing predictive maintenance: a deep learning approach to sensor and event-log data fusion

Liu,

Hui

2024

View full text Add to dashboard Cite

Purpose This study aims to introduce an innovative approach to predictive maintenance by integrating time-series sensor data with event logs, leveraging the synergistic potential of deep learning models. The primary goal is to enhance the accuracy of equipment failure predictions, thereby minimizing operational downtime. Design/methodology/approach The methodology uses a dual-model architecture, combining the patch time series transformer (PatchTST) model for analyzing time-series sensor data and bidirectional encoder representations from transformers for processing textual event log data. Two distinct fusion strategies, namely, early and late fusion, are explored to integrate these data sources effectively. The early fusion approach merges data at the initial stages of processing, while late fusion combines model outputs toward the end. This research conducts thorough experiments using real-world data from wind turbines to validate the approach. Findings The results demonstrate a significant improvement in fault prediction accuracy, with early fusion strategies outperforming traditional methods by 2.6% to 16.9%. Late fusion strategies, while more stable, underscore the benefit of integrating diverse data types for predictive maintenance. The study provides empirical evidence of the superiority of the fusion-based methodology over singular data source approaches. Originality/value This research is distinguished by its novel fusion-based approach to predictive maintenance, marking a departure from conventional single-source data analysis methods. By incorporating both time-series sensor data and textual event logs, the study unveils a comprehensive and effective strategy for fault prediction, paving the way for future advancements in the field.

show abstract

“…Transformer-based attentional mechanisms with deep semantic features have a larger sensory field; however, a larger downsampling factor results in a loss of positional information. In addition to the transformer-based self-attention mechanism used to form a feature map that focuses on interrelationships, attentional mechanisms include channel attention, pixel attention, multilevel attention, and other methods of focusing on key features [8,20].…”

Section: Channel and Spatial Attention Component (Csac)mentioning

confidence: 99%

Enhanced Detection Method for Small and Occluded Targets in Large-Scene Synthetic Aperture Radar Images

Zhou,

Chen,

et al. 2023

JMSE

View full text Add to dashboard Cite

Ship detection in large-scene offshore synthetic aperture radar (SAR) images is crucial in civil and military fields, such as maritime management and wartime reconnaissance. However, the problems of low detection rates, high false alarm rates, and high missed detection rates of offshore ship targets in large-scene SAR images are due to the occlusion of objects or mutual occlusion among targets, especially for small ship targets. To solve this problem, this study proposes a target detection model (TAC_CSAC_Net) that incorporates a multi-attention mechanism for detecting marine vessels in large-scene SAR images. Experiments were conducted on two public datasets, the SAR-Ship-Dataset and high-resolution SAR image dataset (HRSID), with multiple scenes and multiple sizes, and the results showed that the proposed TAC_CSAC_Net model achieves good performance for both small and occluded target detection. Experiments were conducted on a real large-scene dataset, LS-SSDD, to obtain the detection results of subgraphs of the same scene. Quantitative comparisons were made with classical and recently developed deep learning models, and the experiments demonstrated that the proposed model outperformed other models for large-scene SAR image target detection.

show abstract

A survey of the vision transformers and their CNN-transformer based variants

Cited by 28 publications

References 220 publications

HGR Correlation Pooling Fusion Framework for Recognition and Classification in Multimodal Remote Sensing Data

HGR Correlation Pooling Fusion Framework for Recognition and Classification in Multimodal Remote Sensing Data

Advancing predictive maintenance: a deep learning approach to sensor and event-log data fusion

Enhanced Detection Method for Small and Occluded Targets in Large-Scene Synthetic Aperture Radar Images

Contact Info

Product

Resources

About