Recently, out-of-distribution (OOD) detection has received considerable attention, because confident labels assigned to OOD examples represent a vulnerability similar to adversarial input perturbation. We are interested in models that combine the benefits of being robust to adversarial input and being able to detect OOD examples. Furthermore, we require that both in-distribution classification and OOD detection be robust to adversarial input perturbation. Several related studies apply an ad-hoc combination of several design choices to achieve similar goals. One can use several functions over the logit or soft-max layer for defining training objectives, OOD detection methods and adversarial attacks. Here, we present a design-space that covers such design choices, as well as a principled way of evaluating the networks. This includes a strong attack scenario where both in-distribution and OOD examples are adversarially perturbed to mislead OOD detection. We draw several interesting conclusions based on our empirical analysis of this design space. Most importantly, we argue that the key factor is not the OOD training or detection method in itself, but rather the application of matching detection and training methods.
I. INTRODUCTIONAlthough computer vision models have achieved remarkable performance on various recognition tasks in recent years, they are susceptible to adversarial input [1]-[3], where invisibly small but well designed input perturbations mislead stateof-the-art models. The sensitivity of the current models to adversarial input indicates that these models are not aligned well with human perception. Among the many defenses against input perturbation, adversarial training has been found to be the most effective [4], [5]. In a nutshell, adversarial training means that the model is trained over the adversarially perturbed version of the training data to improve the robustness of the model.Recently, robust out-of-distribution (OOD) detection also received considerable attention [6]-[8]. Adversarially trained models are relatively robust to adversarial input but they might assign high confidence to OOD samples. In a real-world application, this also represents a serious vulnerability [8]. Besides, OOD input is also open to adversarial perturbation,