Anastasios N. Angelopoulos scite author profile

While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box predictor that controls the expected loss on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any dataset by using a holdout set to calibrate the size of the prediction sets. This framework enables simple, distribution-free, rigorous error control for many tasks, and we demonstrate it in five large-scale machine learning problems: (1) classification problems where some mistakes are more costly than others; (2) multi-label classification, where each observation has multiple associated labels; (3) classification problems where the labels have a hierarchical structure; (4) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (5) protein structure prediction. Last, we discuss extensions to uncertainty quantification for ranking, metric learning, and distributionally robust learning.

show abstract

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

Angelopoulos¹,

Bates²

2021

Preprint

View full text Add to dashboard Cite

Black-box machine learning learning methods are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Distributionfree uncertainty quantification (distribution-free UQ) is a user-friendly paradigm for creating statistically rigorous confidence intervals/sets for such predictions. Critically, the intervals/sets are valid without distributional assumptions or model assumptions, with explicit guarantees with finitely many datapoints. Moreover, they adapt to the difficulty of the input; when the input example is difficult, the uncertainty intervals/sets are large, signaling that the model might be wrong. Without much work, one can use distribution-free methods on any underlying algorithm, such as a neural network, to produce confidence sets guaranteed to contain the ground truth with a user-specified probability, such as 90%. Indeed, the methods are easy-to-understand and general, applying to many modern prediction problems arising in the fields of computer vision, natural language processing, deep reinforcement learning, and so on. This hands-on introduction is aimed at a reader interested in the practical implementation of distribution-free UQ, including conformal prediction and related methods, who is not necessarily a statistician. We will include many explanatory illustrations, examples, and code samples in Python, with PyTorch syntax. The goal is to provide the reader a working understanding of distribution-free UQ, allowing them to put confidence intervals on their algorithms, with one self-contained document.

show abstract

Private Prediction Sets

Angelopoulos

Bates

Zrnic

et al. 2022

View full text Add to dashboard Cite

We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distribution is stable, the coverage is close to the desired level for every time point, not just on average over the observed sequence.

show abstract

On Identifying and Mitigating Bias in the Estimation of the COVID-19 Case Fatality Rate

Angelopoulos

Pathak

Varma

et al. 2020

Preprint

View full text Add to dashboard Cite

The relative case fatality rates (CFRs) between groups and countries are key measures of relative risk that guide policy decisions regarding scarce medical resource allocation during the ongoing COVID-19 pandemic. In the middle of an active outbreak when surveillance data is the primary source of information, estimating these quantities involves compensating for competing biases in time series of deaths, cases, and recoveries. These include time- and severity- dependent reporting of cases as well as time lags in observed patient outcomes. In the context of COVID-19 CFR estimation, we survey such biases and their potential significance. Further, we analyze theoretically the effect of certain biases, like preferential reporting of fatal cases, on naive estimators of CFR. We provide a partially corrected estimator of these naive estimates that accounts for time lag and imperfect reporting of deaths and recoveries. We show that collection of randomized data by testing the contacts of infectious individuals regardless of the presence of symptoms would mitigate bias by limiting the covariance between diagnosis and death. Our analysis is supplemented by theoretical and numerical results and a simple and fast open-source codebase at https://github.com/aangelopoulos/cfr-covid-19 .

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.