Steven L. Scott scite author profile

An important problem in econometrics and marketing is to infer the causal impact that a designed market intervention has exerted on an outcome metric over time. This paper proposes to infer causal impact on the basis of a diffusion-regression state-space model that predicts the counterfactual market response in a synthetic control that would have occurred had no intervention taken place. In contrast to classical difference-in-differences schemes, state-space models make it possible to (i) infer the temporal evolution of attributable impact, (ii) incorporate empirical priors on the parameters in a fully Bayesian treatment, and (iii) flexibly accommodate multiple sources of variation, including local trends, seasonality and the time-varying influence of contemporaneous covariates. Using a Markov chain Monte Carlo algorithm for posterior inference, we illustrate the statistical properties of our approach on simulated data. We then demonstrate its practical utility by estimating the causal effect of an online advertising campaign on search-related site visits. We discuss the strengths and limitations of state-space models in enabling causal attribution in those settings where a randomised experiment is unavailable. The CausalImpact R package provides an implementation of our approach.Comment: Published at http://dx.doi.org/10.1214/14-AOAS788 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

show abstract

Bayes and big data: the consensus Monte Carlo algorithm

Scott

Blocker

Bonassi

et al. 2016

International Journal of Management Science and Engineering Man

236

317

View full text Add to dashboard Cite

A useful definition of 'big data' is data that is too big to process comfortably on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single-machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART). AbstractA useful definition of "big data" is data that is too big to comfortably process on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).

show abstract

A modern Bayesian look at the multi‐armed bandit

Scott

2010

Appl Stoch Models Bus & Ind

329

232

View full text Add to dashboard Cite

I would like to thank the editor and two referees for their comments, and especially Deepak Agarwal for a substantial, thorough, and thoughtful discussion. I mostly agree with the discussion, and I echo the sentiment that bandit problems are a fertile research area. This is partly because of the tremendous variety in the types of bandit problems that can occur in real-world applications. The discussion mentioned several ways that the content allocation problem differs from toy problems:(1) the curse of dimensionality that comes from very large numbers of arms, (2) delayed feedback as batches of successes and failures filter through logging systems, (3) payoff distribution parameters that change over time, (4) extreme heterogeneity of users, (5) and large data sets.The content optimization example also deals with an evolving population where new arms are born, grow, decay and die. Other relevant considerations can include the presence or absence of experimental factor structure, whether the action space is finite or continuous, whether the response space is discrete, continuous, or semi-continuous and the desired richness of output for reporting.With so many variations on the problem, it is unlikely that a single method would be the ideal solution to all of them. That is the appeal of randomized probability matching: it promises reasonable behavior in the face of a wide variety of payoff distributions. Truly massive scale problems like Yahoo content optimization are indeed a challenge for probability matching. Bayesian posterior sampling methods written for single processor machines do not scale well to cluster-sized problems. Work on scaling Bayesian computations is ongoing and promising, but still an area of research. Two examples are Suchard et al. [1] who have have demonstrated multiple order-ofmagnitude improvements in computing speed using graphics processing units and Polson et al. [2] who show how particle-based sampling schemes can generate approximate Bayesian inferences on a single pass through the data. However, neither of these approaches can offer real competition to Agarwal et al. [3] at this time.The article tries to 23 emphasize complementarities between classical and sequential experiments. There are a great many industrial experiments done using classical DOE techniques that could be done far more profitably 25 using multi-armed bandits. Likewise, the work I have seen thus far on multi-armed bandits focuses on the separable case (arms with distinct parameters) to an unhealthy degree. The point of the 27 article is that there are order-of-magnitude gains that can be had by both camps, and probability matching is a convenient way to get at them. On modest-sized problems that are typical of most industrial experiments, posterior simulation is not that big a deal. There are several off-the-shelf methods for posterior simulation in logistic regression models, including Holmes and Held [4], Frühwirth-Schnatter and Frühwirth [5], and Frühwirth-Schnatter and Frühwirth [6] among others. There are reason...

show abstract

Predicting the present with Bayesian structural time series

Scott

Varian

2014

IJMMNO

218

183

View full text Add to dashboard Cite

This article describes a system for short term forecasting based on an ensemble prediction that averages over different combinations of predictors. The system combines a structural time series model for the target series with regression component capturing the contributions of contemporaneous search query data. A spike-and-slab prior on the regression coefficients induces sparsity, dramatically reducing the size of the regression problem. Our system averages over potential contributions from a very large set of models and gives easily digested reports of which coefficients are likely to be important. We illustrate with applications to initial claims for unemployment benefits and to retail sales. Although our exposition focuses on using search engine data to forecast economic time series, the underlying statistical methods can be applied to more general short term forecasting with large numbers of contemporaneous predictors.

show abstract

Data augmentation for support vector machines

Polson¹,

Scott²

2011

Bayesian Anal.

131

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Steven L. Scott

Inferring causal impact using Bayesian structural time-series models

Bayes and big data: the consensus Monte Carlo algorithm

A modern Bayesian look at the multi‐armed bandit

Predicting the present with Bayesian structural time series

Data augmentation for support vector machines

Contact Info

Product

Resources

About