Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment

Lee, Chung-Yeon; Yoo, Youngjae; Zhang, Byoung-Tak

doi:10.24963/ijcai.2022/141

Cited by 7 publications

(12 citation statements)

References 1 publication

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Test-time entropy minimization (Tent) [16] adapts to the target unlabelled online data by minimizing the Shannon entropy [25] of its prediction by updating only the affine transformation parameters of batch normalization. Tent was proven to be effective when the model is CNN that has batch normalization, e.g., ResNet50 [26], while recent study [17] has demonstrated that Tent is also applicable to ViT (see Sect. 3.1 for the details).…”

Section: Test-time Adaptationmentioning

confidence: 99%

“…3.1 for the details). One can also use different loss functions for updating parameters, such as pseudo-label (PL) [27], diversity regularization (SHOT-IM) [28], feature alignment (TFA [29] and CFA [17]), or contrastive learning [30]. A recent study of Iwasawa et al [31] has proposed gradient-free procedures to update only the classifier parameter of model (T3A).…”

Section: Test-time Adaptationmentioning

confidence: 99%

“…Our approach is categorized as a feature alignment approach, which minimizes the statistical distance between source and target dataset. It is worth noting that feature alignment approaches for test-time adaptation assume that one can access to the statistics on the source dataset during the test phase but does not need to access to the source dataset itself and to repeat the computationally heavy training [17]. Test-time adaptation generally assumes that the model would be distributed without source data due to bandwidth, privacy, or profit reasons [16].…”

Section: Test-time Adaptationmentioning

confidence: 99%

“…We argue that the statistics of source data would be distributed even in such a situation, since it could drastically compress data size and eliminate sensitive information. In fact, some layers often used in typical neural networks contain statistics of source data (e.g., batch normalization) [17].…”

Section: Test-time Adaptationmentioning

confidence: 99%

“…Recently, Kojima et al [17] study applying existing TTA methods including Tent to ViT. Assume that we have pretrained model whose parameters are denoted by .…”

Section: Tent For Vitmentioning

confidence: 99%

See 4 more Smart Citations

Robustifying Vision Transformer Without Retraining from Scratch Using Attention-Based Test-Time Adaptation

2022

Self Cite

View full text Add to dashboard Cite

Vision Transformer (ViT) is becoming more and more popular in the field of image processing. This study aims to improve the robustness against the unknown perturbations without retraining the ViT model from scratch. Since our approach does not alter the training phase, it does not need to repeat computationally heavy pretraining of ViT. Specifically, we use test-time adaptation (TTA) for this purpose, which corrects its prediction during test-time by itself. The representative test-time adaptation method, Tent, is recently found to be applicable to ViT by modulating parameters and gradient clipping. However, we observed that Tent sometimes catastrophically fails, especially under severe perturbations. To stabilize the adaptation, we propose a new loss function called Attent, which minimizes the distributional differences of the attention entropy between the source and target. Experiments of image classification task on CIFAR-10-C, CIFAR-100-C, and ImageNet-C show that both Tent and Attent are effective on a wide variety of corruptions. The results also show that by combining Attent and Tent, the classification accuracy on corrupted data is further improved.

show abstract

Section: Test-time Adaptationmentioning

confidence: 99%

Section: Test-time Adaptationmentioning

confidence: 99%

Section: Test-time Adaptationmentioning

confidence: 99%

Section: Test-time Adaptationmentioning

confidence: 99%

“…Recently, Kojima et al [17] study applying existing TTA methods including Tent to ViT. Assume that we have pretrained model whose parameters are denoted by .…”

Section: Tent For Vitmentioning

confidence: 99%

See 3 more Smart Citations