Scaling Bayesian optimization to high dimensions is challenging task as the global optimization of high-dimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited "active" variables or the additive form of the objective function. We propose a new method for high-dimensional Bayesian optimization, that uses a dropout strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two realworld applications -training cascade classifiers and optimizing alloy composition.
Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions. Systems implementing BO has successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a userdefined space to perform optimization. This pre-defined space specifies the ranges of hyperparameter values. In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization -if a space is too large, we can miss the optimum with a limited budget, on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice. Therefore, in this paper, we narrow down to consider specifically the setting of "weakly specified" search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum. We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. We develop an efficient algorithm for this strategy and derive its regret bound. These theoretical results are complemented by an extensive set of experiments on benchmark functions and two real-world applications which demonstrate the benefits of our proposed approach.
While biomanufacturing plays a significant role in supporting the economy and ensuring public health, it faces critical challenges, including complexity, high variability, lengthy lead time, and very limited process data, especially for personalized new cell and gene biotherapeutics. Driven by these challenges, we propose an interpretable semantic bioprocess probabilistic knowledge graph and develop a game theory based risk and sensitivity analyses for production process to facilitate quality-by-design and stability control. Specifically, by exploring the causal relationships and interactions of critical process parameters and product quality attributes, we create a Bayesian network based probabilistic knowledge graph characterizing the complex causal interdependencies of all factors. Then, we introduce a Shapley value based sensitivity analysis, which can correctly quantify the variation contribution from each input factor on the outputs (i.e., productivity, product quality). Since the bioprocess model coefficients are learned from limited process observations, we derive the Bayesian posterior distribution to quantify model uncertainty and further develop the Shapley value based sensitivity analysis to evaluate the impact of estimation uncertainty from each set of model coefficients. Therefore, the proposed bioprocess risk and sensitivity analyses can identify the bottlenecks, guide the reliable process specifications and the most informative data collection, and improve production stability.
When model uncertainty is handled by Bayesian model averaging (BMA) or Bayesian model selection (BMS), the posterior distribution possesses a desirable "oracle property" for parametric inference, if for large enough data it is nearly as good as the oracle posterior, obtained by assuming unrealistically that the true model is known and only the true model is used. We study the oracle properties in a very general context of quasi-posterior, which can accommodate non-regular models with cubic root asymptotics and partial identification. Our approach for proving the oracle properties is based on a unified treatment that bounds the posterior probability of model mis-selection. This theoretical framework can be of interest to Bayesian statisticians who would like to theoretically justify their new model selection or model averaging methods in addition to empirical results. Furthermore, for non-regular models, we obtain nontrivial conclusions on the choice of prior penalty on model complexity, the temperature parameter of the quasi-posterior, and the advantage of BMA over BMS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.