Vision Transformers in Image Restoration: A Survey

Ali, Anas M.; Benjdira, Bilel; Koubâa, Anis; El‐Shafai, Walid; Khan, Zahid; Boulila, ‪Wadii

doi:10.3390/s23052385

Cited by 38 publications

(15 citation statements)

References 110 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Patch embeddings, along with classification label and positional embeddings are summed and fed into a transformer encoder for image recognition [ 44 ]. For visual representation presented in Fig 3 , which taken reference from the source [ 46 ].…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model

P.,

2024

PLoS ONE

View full text Add to dashboard Cite

Crop losses caused by diseases and pests present substantial challenges to global agriculture, with groundnut crops particularly vulnerable to their detrimental effects. This study introduces the Groundnut Vision Transformer (GNViT) model, a novel approach that harnesses a pre-trained Vision Transformer (ViT) on the ImageNet dataset. The primary goal is to detect and classify various pests affecting groundnut crops. Rigorous training and evaluation were conducted using a comprehensive dataset from IP102, encompassing pests such as Thrips, Aphids, Armyworms, and Wireworms. The GNViT model’s effectiveness was assessed using reliability metrics, including the F1-score, recall, and overall accuracy. Data augmentation with GNViT resulted in a significant increase in training accuracy, achieving 99.52%. Comparative analysis highlighted the GNViT model’s superior performance, particularly in accuracy, compared to state-of-the-art methodologies. These findings underscore the potential of deep learning models, such as GNViT, in providing reliable pest classification solutions for groundnut crops. The deployment of advanced technological solutions brings us closer to the overarching goal of reducing crop losses and enhancing global food security for the growing population.

show abstract

Section: Methodsmentioning

confidence: 99%

“…RMSProp . Another optimization approach, root-mean-square propagation (RMSProp), has a learning rate adjusted for parameters [ 46 ]. The ’running average’ is calculated as follows ( Eq 12 ): …”

Section: Methodsmentioning

confidence: 99%

GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model

P.,

2024

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…The RGB images captured in intelligent driving often feature complex urban lighting and a variety of targets, which is markedly different from indoor scenes or low-light datasets with no light sources or only a single weak light source. Therefore, we chose CNN [ 38 ] and Vision Transformer [ 39 ] as different baselines to address the varying complexities of degradation in low and high-frequency information. The degradation on the Illumination Map is linear grayscale attenuation, which can be processed with a smaller parameter CNN.…”

Section: Methodsmentioning

confidence: 99%

VELIE: A Vehicle-Based Efficient Low-Light Image Enhancement Method for Intelligent Vehicles

Ye,

Wang,

Yang

et al. 2024

Sensors

View full text Add to dashboard Cite

In Advanced Driving Assistance Systems (ADAS), Automated Driving Systems (ADS), and Driver Assistance Systems (DAS), RGB camera sensors are extensively utilized for object detection, semantic segmentation, and object tracking. Despite their popularity due to low costs, RGB cameras exhibit weak robustness in complex environments, particularly underperforming in low-light conditions, which raises a significant concern. To address these challenges, multi-sensor fusion systems or specialized low-light cameras have been proposed, but their high costs render them unsuitable for widespread deployment. On the other hand, improvements in post-processing algorithms offer a more economical and effective solution. However, current research in low-light image enhancement still shows substantial gaps in detail enhancement on nighttime driving datasets and is characterized by high deployment costs, failing to achieve real-time inference and edge deployment. Therefore, this paper leverages the Swin Vision Transformer combined with a gamma transformation integrated U-Net for the decoupled enhancement of initial low-light inputs, proposing a deep learning enhancement network named Vehicle-based Efficient Low-light Image Enhancement (VELIE). VELIE achieves state-of-the-art performance on various driving datasets with a processing time of only 0.19 s, significantly enhancing high-dimensional environmental perception tasks in low-light conditions.

show abstract

“…Self-attention is a module that uses image patches as input, transforms it then returns the value of the token as output. A popular transformer model that is used for image processing is called Visual Transformer (ViT) [11].…”

Section: Restormermentioning

confidence: 99%

Deep Image Deblurring for Non-Uniform Blur: a Comparative Study of Restormer and BANet

Nugraha,

Rahadianti

2024

Jurnal Ilmu Komputer dan Informasi

View full text Add to dashboard Cite

Image blur is one of the common degradations on an image. The blur that occurs on the captured images is sometimes non-uniform, with different levels of blur in different areas of the image. In recent years, most deblurring methods have been deep learning-based. These methods model deblurring as an imageto-image translation problem, treating images globally. This may result in poor performance when handling non-uniform blur in images. Therefore, in this paper, the author compared two state-of-the-art supervised deep learning methods for deblurring and restoration, e.g. BANet and Restormer, with a special focus on the non-uniform blur. The GOPRO training dataset, which is also used in various studies as a benchmark, was used to train the models. The trained models were then tested on the GOPRO testing test, the HIDE testing set for cross-dataset testing, and GOPRO-NU, which consists of specifically selected non-uniform blurred images from the GOPRO testing set, for the non-uniform deblur testing. On the GOPRO testing set, Restormer achieved an SSIM of 0.891 and PSNR of 27.66 while BANet obtained an SSIM of 0.926 and PSNR of 34.90. Meanwhile, for the HIDE dataset, Restormer achieved an SSIM of 0.907 and PSNR of 27.93 while BANet obtained an SSIM of 0.908 and PSNR of 34.52. Finally, on the non-uniform blur GOPRO dataset, Restormer achieved an SSIM of 0.911 and PSNR of 29.48 while BANet obtained an SSIM of 0.935 and PSNR of 35.47. Overall, BANet shows the best result in handling non-uniform blur with a significant improvement over Restormer.

show abstract

Vision Transformers in Image Restoration: A Survey

Cited by 38 publications

References 110 publications

GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model

GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model

VELIE: A Vehicle-Based Efficient Low-Light Image Enhancement Method for Intelligent Vehicles

Deep Image Deblurring for Non-Uniform Blur: a Comparative Study of Restormer and BANet

Contact Info

Product

Resources

About