We show that many machine learning goals, such as improved fairness metrics, can be expressed as constraints on the model's predictions, which we call rate constraints. We study the problem of training non-convex models subject to these rate constraints (or any non-convex and non-differentiable constraints). In the non-convex setting, the standard approach of Lagrange multipliers may fail. Furthermore, if the constraints are non-differentiable, then one cannot optimize the Lagrangian with gradient-based methods. To solve these issues, we introduce the proxy-Lagrangian formulation. This new formulation leads to an algorithm that produces a stochastic classifier by playing a two-player non-zero-sum game solving for what we call a semi-coarse correlated equilibrium, which in turn corresponds to an approximately optimal and feasible solution to the constrained optimization problem. We then give a procedure which shrinks the randomized solution down to one that is a mixture of at most m + 1 deterministic solutions, given m constraints. This culminates in algorithms that can solve non-convex constrained optimization problems with possibly non-differentiable and non-convex constraints with theoretical guarantees. We provide extensive experimental results enforcing a wide range of policy goals including different fairness metrics, and other goals on accuracy, coverage, recall, and churn.
Given a classifier ensemble and a dataset, many examples may be confidently and accurately classified after only a subset of the base models in the ensemble is evaluated. Dynamically deciding to classify early can reduce both mean latency and CPU without harming the accuracy of the original ensemble. To achieve such gains, we propose jointly optimizing the evaluation order of the base models and early-stopping thresholds. Our proposed objective is a combinatorial optimization problem, but we provide a greedy algorithm that achieves a 4-approximation of the optimal solution under certain assumptions, which is also the best achievable polynomial-time approximation bound. Experiments on benchmark and real-world problems show that the proposed
Quit When You Can (QWYC)
algorithm can speed up average evaluation time by 1.8–2.7 times on even jointly trained ensembles, which are more difficult to speed up than independently or sequentially trained ensembles. QWYC’s joint optimization of ordering and thresholds also performed better in experiments than previous fixed orderings, including gradient boosted trees’ ordering.
Abstract-The optimal power flow (OPF) problem is fundamental to power system planing and operation. It is a nonconvex optimization problem and the semidefinite programing (SDP) relaxation has been proposed recently. However, the SDP relaxation may give an infeasible solution to the original OPF problem. In this paper, we apply the alternating direction method of multiplier method to recover a feasible solution when the solution of the SDP relaxation is infeasible to the OPF problem. Specifically, the proposed procedure iterates between a convex optimization problem, and a non-convex optimization with the rank constraint. By exploiting the special structure of the rank constraint, we obtain a closed form solution of the non-convex optimization based on the singular value decomposition. As a result, we obtain a computationally tractable heuristic for the OPF problem. Although the convergence of the algorithm is not theoretically guaranteed, our simulations show that a feasible solution can be recovered using our method.
NOTATIONi is the imaginary unit. W * is the Hermitian of W , Tr (W ) is a trace of W , and W F = Tr (W W * ) is the Frobenius norm of W . The generalized inequality, W 0, means W is a positive semidefinite matrix.The projection operator Π S (W ) = argmin Z∈S W − Z 2 F , is the projection of W onto the set S.
Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals. We study the generalization performance for such constrained optimization problems, in terms of how well the constraints are satisfied at evaluation time, given that they are satisfied at training time. To improve generalization performance, we frame the problem as a two-player game where one player optimizes the model parameters on a training dataset, and the other player enforces the constraints on an independent validation dataset. We build on recent work in two-player constrained optimization to show that if one uses this two-dataset approach, then constraint generalization can be significantly improved. As we illustrate experimentally, this approach works not only in theory, but also in practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.