Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets. Finally, we present a simple setting where we can rigorously tie the phenomena we observe in practice to a misalignment between the (human-specified) notion of robustness and the inherent geometry of the data.
When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior. However, in practice the opposite can often happen: we find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.To develop better methods for selecting data, we start by framing dataset selection as an optimization problem that we can directly solve for: given target tasks, a learning algorithm, and candidate data, select the subset that maximizes model performance. This framework thus avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks. Our resulting method greatly improves language model (LM) performance on both pre-specified tasks and previously unseen tasks. Specifically, choosing target tasks representative of standard LM problems and evaluating on diverse held-out benchmarks, our selected datasets provide a 2× compute multiplier over baseline methods.
We evaluate the robustness of Adversarial Logit Pairing, a recently proposed defense against adversarial examples. We find that a network trained with Adversarial Logit Pairing achieves 0.6% correct classification rate under targeted adversarial attack, the threat model in which the defense is considered. We provide a brief overview of the defense and the threat models/claims considered, as well as a discussion of the methodology and results of our attack. Our results offer insights into the reasons underlying the vulnerability of ALP to adversarial attack, and are of general interest in evaluating and understanding adversarial defenses. ContributionsFor summary, the contributions of this note are as follows:1. Robustness: Under the white-box targeted attack threat model specified in Kannan et al., we upper bound the correct classification rate of the defense to 0.6% (Table 1). We also perform targeted and untargeted attacks and show that the attacker can reach success rates of 98.6% and 99.9% respectively (Figures 1, 2). Formulation:We analyze the ALP loss function and contrast it to that of Madry et al., pointing out several differences from the robust optimization objective (Section 4.1). Loss landscape:We analyze the loss landscape induced by ALP by visualizing loss landscapes and adversarial attack trajectories (Section 4.2).Furthermore, we suggest the experiments conducted in the analysis of ALP as another evaluation method for adversarial defenses.
Transfer learning is a widely-used paradigm in deep learning, where models pre-trained on standard datasets can be efficiently adapted to downstream tasks. Typically, better pre-trained models yield better transfer results, suggesting that initial accuracy is a key aspect of transfer learning performance. In this work, we identify another such aspect: we find that adversarially robust models, while less accurate, often perform better than their standard-trained counterparts when used for transfer learning. Specifically, we focus on adversarially robust ImageNet classifiers, and show that they yield improved accuracy on a standard suite of downstream classification tasks. Further analysis uncovers more differences between robust and standard models in the context of transfer learning. Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations. Our code and models are available at https://github.com/Microsoft/robust-models-transfer.
Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift". In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.
**Introduction and Goals.** SARS-CoV-2 is transmitted both in the community and within households. Social distancing and lockdowns reduce community transmission but do not directly address household transmission. We provide quantitative measures of household transmission based on empirical data, and estimate the contribution of households to overall spread. We highlight policy implications from our analysis of household transmission, and more generally, of changes in contact patterns under social distancing. **Methods. ** We investigate the household secondary attack rate (SAR) for SARS-CoV-2, as well as R_h, which is the average number of within-household infections caused by a single index case. We identify previous works that estimated the SAR. We correct these estimates based on the false-negative rate of PCR testing and the failure to test asymptomatics. Results are pooled by a hierarchical Bayesian random-effects model to provide a meta-analysis estimate of the SAR. We estimate R_h using results from population testing in Vo', Italy and contact tracing data that we curate from Singapore. The code and data behind our analysis are publicly available https://github.com/andrewilyas/covid-household-transmission. **Results.** We identified nine studies of the household secondary attack rate. Our modeling suggests the SAR is heterogeneous across studies. The pooled central estimate of the SAR is 30% but with a posterior 95% credible interval of (0%, 67%) reflecting this heterogeneity. This corresponds to a posterior mean for the SAR of 30% (18%,43%) and a standard deviation of 15% (9%, 27%). If results are not corrected for false negatives and asymptomatics, the pooled central estimate for the SAR is 20% (0%, 43%). From the same nine studies, we estimate R_h to be 0.47 (0.13, 0.77). Using contact tracing data from Singapore, we infer an R_h value of 0.32 (0.22,0.42). Population testing data from Vo' yields an R_h estimate of 0.37 (0.34, 0.40) after correcting for false negatives and asymptomatics. **Interpretation.** Our estimates of R_h suggest that household transmission was a small fraction (5%-35%) of R before social distancing but a large fraction after (30%-55%). This suggests that household transmission may be an effective target for interventions. A remaining uncertainty is whether household infections actually contribute to further community transmission or are contained within households. This can be estimated given high-quality contact tracing data. More broadly, our study points to emerging contact patterns (i.e., increased time at home relative to the community) playing a role in transmission of SARS-CoV-2. We briefly highlight another instance of this phenomenon (differences in contact between essential workers and the rest of the population), provide coarse estimates of its effect on transmission, and discuss how future data could enable a more reliable estimate.
Current neural network-based image classifiers are susceptible to adversarial examples, even in the black-box setting, where the attacker is limited to query access without access to gradients. Previous methods -substitute networks and coordinate-based finite-difference methods -are either unreliable or query-inefficient, making these methods impractical for certain problems.We introduce a new method for reliably generating adversarial examples under more restricted, practical blackbox threat models. First, we apply natural evolution strategies to perform black-box attacks using two to three orders of magnitude fewer queries than previous methods. Second, we introduce a new algorithm to perform targeted adversarial attacks in the partial-information setting, where the attacker only has access to a limited number of target classes. Using these techniques, we successfully perform the first targeted adversarial attack against a commercially deployed machine learning system, the Google Cloud Vision API, in the partial information setting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.