Daniel Hernández-Lobato scite author profile

Bayesian Optimization (BO) is useful for optimizing functions that are expensive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. The acquisition function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continuous input variables. When this is not the case, for example when some of the input variables take categorical or integer values, one has to introduce extra approximations. Consider a suggested input location taking values in the real line. Before doing the evaluation of the objective, a common approach is to use a one hot encoding approximation for categorical variables, or to round to the closest integer, in the case of integer-valued variables. We show that this can lead to optimization problems and describe a more principled approach to account for input variables that are categorical or integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods using Gaussian processes on problems with categorical or integer-valued variables.

show abstract

Expectation propagation in linear regression models with spike-and-slab priors

Hernández-Lobato

Suárez

2014

Mach Learn

View full text Add to dashboard Cite

An expectation propagation (EP) algorithm is proposed for approximate inference in linear regression models with spike-and-slab priors. This EP method is applied to regression tasks in which the number of training instances is small and the number of dimensions of the feature space is large. The problems analyzed include the reconstruction of genetic networks, the recovery of sparse signals, the prediction of user sentiment from customer-written reviews and the analysis of biscuit dough constituents from NIR spectra. The proposed EP method outperforms in most of these tasks another EP method that ignores correlations in the posterior and a variational Bayes technique for approximate inference. Additionally, the solutions generated by EP are very close to those given by Gibbs sampling, which can be taken as the gold standard but can be much more computationally expensive. In the tasks analyzed, spikeand-slab priors generally outperform other sparsifying priors, such as Laplace, Student's t and horseshoe priors. The key to the improved predictions with respect to Laplace and Student's t priors is the superior selective shrinkage capacity of the spike-and-slab prior distribution.

show abstract

Statistical Instance-Based Pruning in Ensembles of Independent Classifiers

Hernández-Lobato¹,

Martinez-Muoz²,

Suárez³

2009

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

The global prediction of a homogeneous ensemble of classifiers generated in independent applications of a randomized learning algorithm on a fixed training set is analyzed within a Bayesian framework. Assuming that majority voting is used, it is possible to estimate with a given confidence level the prediction of the complete ensemble by querying only a subset of classifiers. For a particular instance that needs to be classified, the polling of ensemble classifiers can be halted when the probability that the predicted class will not change when taking into account the remaining votes is above the specified confidence level. Experiments on a collection of benchmark classification problems using representative parallel ensembles, such as bagging and random forests, confirm the validity of the analysis and demonstrate the effectiveness of the instance-based ensemble pruning method proposed.

show abstract

Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints

Garrido-Merchán

Hernández-Lobato

2019

Neurocomputing

View full text Add to dashboard Cite

Real-world problems often involve the optimization of several objectives under multiple constraints. Furthermore, we may not have an expression for each objective or constraint; they may be expensive to evaluate; and the evaluations can be noisy. These functions are referred to as black-boxes. Bayesian optimization (BO) can efficiently solve the problems described. For this, BO iteratively fits a model to the observations of each black-box. The models are then used to choose where to evaluate the black-boxes next, with the goal of solving the optimization problem in a few iterations. In particular, they guide the search for the problem's solution, and avoid evaluations in regions of little expected utility. A limitation, however, is that current BO methods for these problems choose a point at a time at which to evaluate the black-boxes. If the expensive evaluations can be carried out in parallel (as when a cluster of computers is available), this results in a waste of resources. Here, we introduce PPESMOC, Parallel Predictive Entropy Search for Multi-objective Optimization with Constraints, a BO strategy for solving the problems described. PPESMOC selects, at each iteration, a batch of input locations at which to evaluate the black-boxes, in parallel, to maximally reduce the entropy of the problem's solution. To our knowledge, this is the first batch method for constrained multi-objective BO. We present empirical evidence in the form of synthetic, benchmark and real-world experiments that illustrate the effectiveness of PPESMOC.

show abstract

How large should ensembles of classifiers be?

Hernández-Lobato

Martínez-Muñoz

Suárez

2013

Pattern Recognition

View full text Add to dashboard Cite

Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in: AbstractWe propose to determine the size of a parallel ensemble by estimating the minimum number of classifiers that are required to obtain stable aggregate predictions. Assuming that majority voting is used, a statistical description of the convergence of the ensemble prediction to its asymptotic (infinite size) limit is given. The analysis of the voting process shows that for most test instances the ensemble prediction stabilizes after only a few classifiers are polled. By contrast, a small but non-negligible fraction of these instances require large numbers of classifier queries to reach stable predictions. Specifically, the fraction of instances whose stable predictions require more than T classifiers for T ≫ 1 has a universal form and is proportional to T −1/2 . The ensemble size is determined as the minimum number of classifiers that are needed to estimate the infinite ensemble prediction at an average confidence level α, close to one. This approach differs from previous proposals, which are based on determining the size for which the prediction error (not the predictions themselves) stabilizes. In particular, it does not require estimates of the generalization performance of the ensemble, which can be unreliable. It has general validity because it is based solely on the statistical description of the convergence of majority voting to its asymptotic limit. Extensive experiments using representative parallel ensembles (bagging and random forest) illustrate the application of the proposed framework in a wide range of classification problems. These experiments show that the optimal ensemble size is very sensitive to the particular classification problem considered.

show abstract

Expectation Propagation for Bayesian Multi-task Feature Selection

Hernández-Lobato

Helleputte

et al. 2010

View full text Add to dashboard Cite

Abstract. In this paper we propose a Bayesian model for multi-task feature selection. This model is based on a generalized spike and slab sparse prior distribution that enforces the selection of a common subset of features across several tasks. Since exact Bayesian inference in this model is intractable, approximate inference is performed through expectation propagation (EP). EP approximates the posterior distribution of the model using a parametric probability distribution. This posterior approximation is particularly useful to identify relevant features for prediction. We focus on problems for which the number of features d is significantly larger than the number of instances for each task. We propose an efficient parametrization of the EP algorithm that offers a computational complexity linear in d. Experiments on several multi-task datasets show that the proposed model outperforms baseline approaches for single-task learning or data pooling across all tasks, as well as two state-of-the-art multi-task learning approaches. Additional experiments confirm the stability of the proposed feature selection with respect to various sub-samplings of the training data.

show abstract

Ambiguity Helps: Classification with Disagreements in Crowdsourced Annotations

Sharmanska

Hernández-Lobato

et al. 2016

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.