Can Vision Transformers Learn without Natural Images?

Nakashima, Kodai; Kataoka, Hirokatsu; Matsumoto, Asato; Iwata, Kenji; Inoue, Nozomu; Satoh, Yutaka

doi:10.1609/aaai.v36i2.20094

Cited by 12 publications

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, the model learns from images automatically generated using fractal geometry, computer graphics, and other methods. Existing studies [15,16,21,23,24] have shown that such models can effectively learn representations through fractal images, Bessel curves [25], and Perlin noise [26], improving the interpretability of the features and performing almost as well as pretrained models based on ImageNet.…”

Section: Formula-driven Supervised Learningmentioning

confidence: 99%

FractalAD: A simple industrial anomaly segmentation method using fractal anomaly generation and backbone knowledge distillation

Xia¹,

Lv²,

He³

et al. 2023

Preprint

View full text Add to dashboard Cite

Although industrial anomaly detection (AD) technology has made significant progress in recent years, generating realistic anomalies and learning priors knowledge of normal remain challenging tasks. In this study, we propose an end-to-end industrial anomaly segmentation method called FractalAD. Training samples are obtained by synthesizing fractal images and patches from normal samples. This fractal anomaly generation method is designed to sample the full morphology of anomalies. Moreover, we designed a backbone knowledge distillation structure to extract prior knowledge contained in normal samples. The differences between a teacher and a student model are converted into anomaly attention using a cosine similarity attention module. The proposed method enables an end-to-end semantic segmentation network to be used for anomaly detection without adding any trainable parameters to the backbone and segmentation head. The results of ablation studies confirmed the effectiveness of fractal anomaly generation and backbone knowledge distillation. The results of performance experiments showed that FractalAD achieved competitive results on the MVTec AD dataset compared with other state-of-the-art anomaly detection methods.

show abstract

Section: Formula-driven Supervised Learningmentioning

confidence: 99%

FractalAD: A simple industrial anomaly segmentation method using fractal anomaly generation and backbone knowledge distillation

Xia¹,

Lv²,

He³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This finding shows that fractal geometry plays an important role in dataset construction using FDSL. Nakashima et al [48] confirmed that the FDSL framework is effective for pre-training Vision Transformers (ViTs). More interestingly, they suggest that FDSL are more likely to benefit ViTs than CNNs.…”

Section: Introductionmentioning

confidence: 98%

“…* indicates that 5,000 epochs were learned during fine-tuning. † refers to a quote from[48]. The Underlined bold and bold scores indicate the best and second-best values, respectively.…”

mentioning

confidence: 99%

Does Formula-Driven Supervised Learning Work on Small Datasets?

Nakashima,

Kataoka,

Satoh

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

Does formula-driven supervised learning (FDSL) work effectively with fine-tuning on small datasets? Additionally, how many natural images do a network pre-trained with FDSL require to acquire sufficient image features? FDSL is a pre-training method that employs mathematical formulas to automatically generate images and their corresponding labels. These questions are crucial to address, as the acquisition of features valuable for natural image recognition tasks necessitates the opportunity to learn a certain number of natural images through pre-training and fine-tuning to achieve optimal results. Furthermore, because FDSL is progressively gaining attention as a promising method to mitigate concerns about privacy violations, fairness protection, and labor-intensive efforts associated with annotating natural images, clarifying its effectiveness and limitations is essential for widespread adoption. In this study, we compare FDSL with ImageNet-1k pre-training and training from scratch through fine-tuning on datasets of the order of 100 to 10,000 images. Through our experiments, we discovered that (i) there is a significant difference from ImageNet-1k pre-training when using datasets containing approximately 100 to 1,000 images, and (ii) approximately 50,000 images are required for FDSL to be equivalent to ImageNet-1k pretraining. Moreover, we verified the validity of the hyper-parameters during fine-tuning. We firmly believe that this study elucidates the current limitations of FDSL and offers valuable guidance for future research, ultimately contributing to the field of computer vision.INDEX TERMS Pre-training; Fine-tuning; Formula-driven supervised learning; Data-efficient training; Small dataset;

show abstract