Distilling Model Failures as Directions in Latent Space

Jain, Sameer; Lawrence, Hannah R.; Moitra, Ankur; Mądry, Aleksander

doi:10.48550/arxiv.2206.14754

Cited by 4 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, we propose to describe the visual biases with language. Some recent works use vision-language models to analyze the model failures by detecting outliers in the visual embedding (Eyuboglu et al, 2022;Jain et al, 2022a). In contrast, we directly generate descriptive captions from images instead of embeddings, and could find multiple and fine-grained biases.…”

Section: Related Workmentioning

confidence: 99%

“…A few recent works use vision-language models to discover biases (they call slices or model failures). Concretely, they define biased groups as the outliers in the embedding space of the visual encoder, estimated by a Gaussian mixture model (Eyuboglu et al, 2022) or support vector machine (Jain et al, 2022a). In contrast, we directly generate captions from images, which may contain more detailed information than the encoder embeddings.…”

Section: G Additional Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Explaining Visual Biases as Words by Generating Captions

Kim¹,

Mo²,

Kim³

et al. 2023

Preprint

View full text Add to dashboard Cite

We aim to diagnose the potential biases in image classifiers. To this end, prior works manually labeled biased attributes or visualized biased features, which need high annotation costs or are often ambiguous to interpret. Instead, we leverage two types (generative and discriminative) of pretrained vision-language models to describe the visual bias as a word. Specifically, we propose bias-to-text (B2T), which generates captions of the mispredicted images using a pre-trained captioning model to extract the common keywords that may describe visual biases. Then, we categorize the bias type as spurious correlation or majority bias by checking if it is specific or agnostic to the class, based on the similarity of class-wise mispredicted images and the keyword upon a pretrained vision-language joint embedding space, e.g., CLIP. We demonstrate that the proposed simple and intuitive scheme can recover well-known gender and background biases, and discover novel ones in real-world datasets. Moreover, we utilize B2T to compare the classifiers using different architectures or training methods. Finally, we show that one can obtain debiased classifiers using the B2T bias keywords and CLIP, in both zero-shot and full-shot manners, without using any human annotation on the bias. 1 * Equal contribution 1 KAIST 2 POSTECH.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: G Additional Related Workmentioning

confidence: 99%

Explaining Visual Biases as Words by Generating Captions

Kim¹,

Mo²,

Kim³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Many recent works aim to understand model systematic errors by finding subsets of inputs with similar characteristics where the model performs significantly worse. This is referred to as slice discovery (Chung et al, 2019;Singla et al, 2021;d'Eon et al, 2022;Eyuboglu et al, 2022;Jain et al, 2022a). However, these algorithms fail to address the most fundamental challenge for slice discovery -the lack of data.…”

Section: Rectified Model Misbehaviorsmentioning

confidence: 99%

Diagnosing and Rectifying Vision Models using Language

Zhang¹,

HaoChen²,

Huang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Recent multi-modal contrastive learning models have demonstrated the ability to learn an embedding space suitable for building strong vision classifiers, by leveraging the rich information in large-scale image-caption datasets. Our work highlights a distinct advantage of this multi-modal embedding space: the ability to diagnose vision classifiers through natural language. The traditional process of diagnosing model behaviors in deployment settings involves labor-intensive data acquisition and annotation. Our proposed method can discover high-error data slices, identify influential attributes and further rectify undesirable model behaviors, without requiring any visual data. Through a combination of theoretical explanation and empirical verification, we present conditions under which classifiers trained on embeddings from one modality can be equivalently applied to embeddings from another modality. On a range of image datasets with known error slices, we demonstrate that our method can effectively identify the error slices and influential attributes, and can further use language to rectify failure modes of the classifier.

show abstract

“…However, certain factors such as lighting and contrast may not be captured by the object detector, which can limit the effectiveness of our approach. To mitigate this limitation, our method can be used in conjunction with previous work that utilizes system/content metadata or discovered visual features for general failure analysis (Nushi et al, 2018;Singla et al, 2021;Chung et al, 2019;Jain et al, 2022;Eyuboglu et al, 2022). For instance, one can enrich the test data with additional meta-data, such as contrast, blur, lighting, and camera angle, and apply our method as well as previous approaches to understand if the model's performance drops for some of these conditions.…”

Section: B2 Spurious Correlation Detectionmentioning

confidence: 99%

Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

Wang

Yang

Wang

2022

Findings of the Association for Computational Linguistics: NAACL 2022

View full text Add to dashboard Cite

Recently, NLP models have achieved remarkable progress across a variety of tasks; however, they have also been criticized for being not robust. Many robustness problems can be attributed to models exploiting spurious correlations, or shortcuts between the training data and the task labels. Most existing work identifies a limited set of task-specific shortcuts via human priors or error analyses, which requires extensive expertise and efforts. In this paper, we aim to automatically identify such spurious correlations in NLP models at scale. We first leverage existing interpretability methods to extract tokens that significantly affect model's decision process from the input text. We then distinguish "genuine" tokens and "spurious" tokens by analyzing model predictions across multiple corpora and further verify them through knowledge-aware perturbations. We show that our proposed method can effectively and efficiently identify a scalable set of "shortcuts", and mitigating these leads to more robust models in multiple applications.

show abstract

Distilling Model Failures as Directions in Latent Space

Cited by 4 publications

References 11 publications

Explaining Visual Biases as Words by Generating Captions

Explaining Visual Biases as Words by Generating Captions

Diagnosing and Rectifying Vision Models using Language

Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

Contact Info

Product

Resources

About