In this paper, we propose and analyze a trust-region model-based algorithm for solving unconstrained stochastic optimization problems. Our framework utilizes random models of an objective function f (x), obtained from stochastic observations of the function or its gradient. Our method also utilizes estimates of function values to gauge progress that is being made. The convergence analysis relies on requirements that these models and these estimates are sufficiently accurate with high enough, but fixed, probability. Beyond these conditions, no assumptions are made on how these models and estimates are generated. Under these general conditions we show an almost sure global convergence of the method to a first order stationary point. In the second part of the paper, we present examples of generating sufficiently accurate random models under biased or unbiased noise assumptions. Lastly, we present some computational results showing the benefits of the proposed method compared to existing approaches that are based on sample averaging or stochastic gradients.
Dedicated to the memory of Andrew R. Conn for his inspiring enthusiasm and his many contributions to the renaissance of derivative-free optimization methods. AbstractIn many optimization problems arising from scientific, engineering and artificial intelligence applications, objective and constraint functions are available only as the output of a black-box or simulation oracle that does not provide derivative information. Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization. We provide a review and perspectives on developments in these methods, with an emphasis on highlighting recent developments and on unifying treatment of such problems in the non-linear optimization and machine learning literature. We categorize methods based on assumed properties of the black-box functions, as well as features of the methods. We first overview the primary setting of deterministic methods applied to unconstrained, non-convex optimization problems where the objective function is defined by a deterministic black-box oracle. We then discuss developments in randomized methods, methods that assume some additional structure about the objective (including convexity, separability and general non-smooth compositions), methods for problems where the output of the black-box oracle is stochastic, and methods for handling different types of constraints.
We propose a novel framework for analyzing convergence rates of stochastic optimization algorithms with adaptive step sizes. This framework is based on analyzing properties of an underlying generic stochastic process, in particular by deriving a bound on the expected stopping time of this process. We utilize this framework to analyze the bounds on expected global convergence rates of a stochastic variant of a traditional trust region method, introduced in [8]. While traditional trust region methods rely on exact computations of the gradient, Hessian and values of the objective function, this method assumes that these values are available up to some dynamically adjusted accuracy. Moreover, this accuracy is assumed to hold only with some sufficiently large, but fixed, probability, without any additional restrictions on the variance of the errors. This setting applies, for example, to standard stochastic optimization and machine learning formulations. Improving upon the analysis in [8], we show that the stochastic process defined by the algorithm satisfies the assumptions of our proposed general framework, with the stopping time defined as reaching accuracy ∇f (x) ≤ ǫ. The resulting bound for this stopping time is O(ǫ −2 ), under the assumption of sufficiently accurate stochastic gradient, and is the first global complexity bound for a stochastic trust-region method. Finally, we apply the same framework to derive second order complexity bound under some additional assumptions.
Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.
We present a new algorithm, called manifold sampling, for the unconstrained minimization of a nonsmooth composite function h • F. By classifying points in the domain of the nonsmooth function h into what we call manifolds, we adapt search directions within a trust-region framework based on knowledge of manifolds intersecting the current trust region. We motivate this idea through a study of ℓ 1 functions, where the classification into manifolds using zero-order information about the constituent functions F i is trivial, and give an explicit statement of a manifold sampling algorithm in that case. We prove that all cluster points of iterates generated by this algorithm are stationary in the Clarke sense. We prove a similar result for a stochastic variant of the algorithm; interestingly, the result is deterministic (not almost sure). Additionally, our algorithm can accept iterates that are points of nondifferentiability and requires only an approximation of gradients of F at the trust-region center. Numerical results presented for several variants of the algorithm show that using manifold information from additional points near the current iterate can improve practical performance. The best variants are also shown to be competitive, particularly in terms of robustness, with other nonsmooth solvers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.