Ellen Vitercik scite author profile

A crucial problem in modern data science is data-driven algorithm design, where the goal is to choose the best algorithm, or algorithm parameters, for a specific application domain. In practice, we often optimize over a parametric algorithm family, searching for parameters with high performance on a collection of typical problem instances. While effective in practice, these procedures generally have not come with provable guarantees. A recent line of work initiated by a seminal paper of Gupta and Roughgarden [34] analyzes application-specific algorithm selection from a theoretical perspective. We progress this research direction in several important settings. We provide upper and lower bounds on regret for algorithm selection in online settings, where problems arrive sequentially and we must choose parameters online. We also consider differentially private algorithm selection, where the goal is to find good parameters for a set of problems without divulging too much sensitive information contained therein.We analyze several important parameterized families of algorithms, including SDP-rounding schemes for problems formulated as integer quadratic programs as well as greedy techniques for several canonical subset selection problems. The cost function that measures an algorithm's performance is often a volatile piecewise Lipschitz function of its parameters, since a small change to the parameters can lead to a cascade of different decisions made by the algorithm. We present general techniques for optimizing the sum or average of piecewise Lipschitz functions when the underlying functions satisfy a sufficient and general condition called dispersion. Intuitively, a set of piecewise Lipschitz functions is dispersed if no small region contains many of the functions' discontinuities.Using dispersion, we improve over the best-known online learning regret bounds for a variety problems, prove regret bounds for problems not previously studied, and provide matching regret lower bounds. In the private optimization setting, we show how to optimize performance while preserving privacy for several important problems, providing matching upper and lower bounds on performance loss due to privacy preservation. Though algorithm selection is our primary motivation, we believe the notion of dispersion may be of independent interest. Therefore, we present our results for the more general problem of optimizing piecewise Lipschitz functions. Finally, we uncover dispersion in domains beyond algorithm selection, namely, auction design and pricing, providing online and privacy guarantees for these problems as well.Private algorithm configuration Kusner et al. [41] develop private Bayesian optimization techniques for tuning algorithm parameters. Their methods implicitly assume that the utility function is differentiable. Meanwhile, the class of functions we consider have discontinuities between pieces, and it is not enough to privately optimize on each piece, since the boundaries themselves are data-dependent.Online optimization Prior work on ...

show abstract

A General Theory of Sample Complexity for Multi-Item Profit Maximization

Balcan

Sandholm

Vitercik

2018

View full text Add to dashboard Cite

The design of profit-maximizing multi-item mechanisms is a notoriously challenging problem with tremendous real-world impact. The mechanism designer's goal is to field a mechanism with high expected profit on the distribution over buyers' values. Unfortunately, if the set of mechanisms he optimizes over is complex, a mechanism may have high empirical profit over a small set of samples but low expected profit. This raises the question, how many samples are sufficient to ensure that the empirically optimal mechanism is nearly optimal in expectation? We uncover structure shared by a myriad of pricing, auction, and lottery mechanisms that allows us to prove strong sample complexity bounds: for any set of buyers' values, profit is a piecewise linear function of the mechanism's parameters. We prove new bounds for mechanism classes not yet studied in the sample-based mechanism design literature and match or improve over the best known guarantees for many classes. The profit functions we study are significantly different from well-understood functions in machine learning, so our analysis requires a sharp understanding of the interplay between mechanism parameters and buyer values. We strengthen our main results with data-dependent bounds when the distribution over buyers' values is "wellbehaved." Finally, we investigate a fundamental tradeoff in sample-based mechanism design: complex mechanisms often have higher profit than simple mechanisms, but more samples are required to ensure that empirical and expected profit are close. We provide techniques for optimizing this tradeoff. arXiv:1705.00243v4 [cs.LG] 8 Aug 2018 * We generalize to multiple buyers in Section 2.1.2.

show abstract

How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design

Balcan

DeBlasio

Dick

et al. 2021

View full text Add to dashboard Cite

Estimating Approximate Incentive Compatibility

Balcan

Sandholm

Vitercik

2019

View full text Add to dashboard Cite

In practice, most mechanisms for selling, buying, matching, voting, and so on are not incentive compatible. We present techniques for estimating how far a mechanism is from incentive compatible. Given samples from the agents' type distribution, we show how to estimate the extent to which an agent can improve his utility by misreporting his type. We do so by first measuring the maximum utility an agent can gain by misreporting his type on average over the samples, assuming his true and reported types are from a finite subset-which our technique constructs-of the type space. The challenge is that by measuring utility gains over a finite subset of the type space, we might miss type pairs θ andθ where an agent with type θ can greatly improve his utility by reporting typeθ. Indeed, our primary technical contribution is proving that the maximum utility gain over this finite subset nearly matches the maximum utility gain overall, despite the volatility of the utility functions we study. We apply our tools to the single-item and combinatorial first-price auctions, generalized second-price auction, discriminatory auction, uniform-price auction, and second-price auction with spiteful bidders.

show abstract

Improved Sample Complexity Bounds for Branch-and-Cut

Balcan¹,

Prasad²,

Sandholm³

et al. 2021

Preprint

View full text Add to dashboard Cite

Branch-and-cut is the most widely used algorithm for solving integer programs, employed by commercial solvers like CPLEX and Gurobi. Branch-and-cut has a wide variety of tunable parameters that have a huge impact on the size of the search tree that it builds, but are challenging to tune by hand. An increasingly popular approach is to use machine learning to tune these parameters: using a training set of integer programs from the application domain at hand, the goal is to find a configuration with strong predicted performance on future, unseen integer programs from the same domain. If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs. In this paper, we prove sample complexity guarantees for this procedure, which bound how large the training set should be to ensure that for any configuration, its average performance over the training set is close to its expected future performance. Our guarantees apply to parameters that control the most important aspects of branch-and-cut: node selection, branching constraint selection, and cutting plane selection, and are sharper and more general than those found in prior research [6,8].

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ellen Vitercik

Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization

A General Theory of Sample Complexity for Multi-Item Profit Maximization

How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design

Estimating Approximate Incentive Compatibility

Improved Sample Complexity Bounds for Branch-and-Cut

Contact Info

Product

Resources

About