In this paper, we examine a data‐driven optimization approach to making optimal decisions as evaluated by a trained random forest, where these decisions can be constrained by an arbitrary polyhedral set. We model this optimization problem as a mixed‐integer linear program. We show this model can be solved to optimality efficiently using pareto‐optimal Benders cuts for ensembles containing a modest number of trees. We consider a random forest approximation that consists of sampling a subset of trees and establish that this gives rise to near‐optimal solutions by proving analytical guarantees. In particular, for axis‐aligned trees, we show that the number of trees we need to sample is sublinear in the size of the forest being approximated. Motivated by this result, we propose heuristics inspired by cross‐validation that optimize over smaller forests rather than one large forest and assess their performance on synthetic datasets. We present two case studies on a property investment problem and a jury selection problem. We show this approach performs well against other benchmarks while providing insights into the sensitivity of the algorithm's performance for different parameters of the random forest.
Problem definition: We present a data-driven study of the secondary ticket market. In particular, we are primarily concerned with accurately estimating price sensitivity for listed tickets. In this setting, there are many issues including endogeneity, heterogeneity in price sensitivity for different tickets, binary outcomes, and nonlinear interactions between ticket features. Our secondary goal is to highlight how this estimation can be integrated into a prescriptive trading strategy for buying and selling tickets in an active marketplace. Academic/practical relevance: We present a novel method for demand estimation with heterogeneous treatment effect in the presence of confounding. In practice, we embed this method within an optimization framework for ticket reselling, providing the ticket reselling platform with a new framework for pricing tickets on its platform. Methodology: We introduce a general double/orthogonalized machine learning method for classification problems. This method allows us to isolate the causal effects of price on the outcome by removing the conditional effects of the ticket and market features. Furthermore, we introduce a novel loss function that can be easily incorporated into powerful, off-the-shelf machine learning algorithms, including gradient boosted trees. We show how, in the presence of hidden confounding variables, instrumental variables can be incorporated. Results: Using a wide range of synthetic data sets, we show this approach beats state-of-the-art machine learning and causal inference approaches for estimating treatment effects in the classification setting. Furthermore, using National Basketball Association ticket listings from the 2014–2015 season, we show that probit models with instrumental variables, previously used for price estimation of tickets in the resale market, are significantly less accurate and potentially misspecified relative to our proposed approach. Through pricing simulations, we show our proposed method can achieve an 11% return on investment by buying and selling tickets, whereas existing techniques are not profitable. Managerial implications: The knowledge of how to price tickets on its platform offers a range of potential opportunities for our collaborator, both in terms of understanding sellers on their platform and in developing new products to offer them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.