Efficiently Sampling Functions from Gaussian Process Posteriors

Wilson, James T.; Borovitskiy, Viacheslav; Terenin, Alexander; Mostowsky, Peter; Deisenroth, Marc Peter

doi:10.48550/arxiv.2002.09309

Cited by 4 publications

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This distinction mirrors the one in Machine Learning between aleatoric uncertaintydue to the stochastic variability inherent in querying 𝑓(𝑥) and epistemic uncertaintydue to the lack of knowledge about the actual structure of 𝑓(𝑥)which can be reduced by collecting more information. The same point is argued in (Gershman, 2019) which associates random exploration to Thompson Sampling, which consists in drawing a sample of 𝑓(𝑥) from the GP model and then make the next decision according to the optimization of that sample (Wilson et al, 2020b).…”

Section: Related Workmentioning

confidence: 90%

“…An analysis on TS has been recently proposed in (Russo & Van Roy 2016), concluding that TS is biased towards exploitation and suggesting that an 𝜀-greedy version of TS can lead to a better performance (i.e., randomly selecting 𝑥 (𝑛+1) within the search space, with probability 𝜀, or performing TS with probability 1 − 𝜀). An efficient sampling procedure has been recently proposed in (Hahn et al, 2019) (Wilson et al, 2020b). Sampling from GP posterior is at the basis of information-based acquisition functions, described in the following section.…”

Section: Information-based Acquisition Functionsmentioning

confidence: 99%

See 1 more Smart Citation

Uncertainty quantification and exploration–exploitation trade-off in humans

Candelieri

Ponti

Archetti

2021

J Ambient Intell Human Comput

View full text Add to dashboard Cite

The main objective of this paper is to outline a theoretical framework to analyse how humans' decision-making strategies under uncertainty manage the trade-off between information gathering (exploration) and reward seeking (exploitation). A key observation, motivating this line of research, is the awareness that human learners are amazingly fast and effective at adapting to unfamiliar environments and incorporating upcoming knowledge: this is an intriguing behaviour for cognitive sciences as well as an important challenge for Machine Learning. The target problem considered is active learning in a black-box optimization task and more specifically how the exploration/exploitation dilemma can be modelled within Gaussian Process based Bayesian Optimization framework, which is in turn based on uncertainty quantification. The main contribution is to analyse humans' decisions with respect to Pareto rationality where the two objectives are improvement expected and uncertainty quantification. According to this Pareto rationality model, if a decision set contains a Pareto efficient (dominant) strategy, a rational decision maker should always select the dominant strategy over its dominated alternatives. The distance from the Pareto frontier determines whether a choice is (Pareto) rational (i.e., lays on the frontier) or is associated to "exasperate" exploration. However, since the uncertainty is one of the two objectives defining the Pareto frontier, we have investigated three different uncertainty quantification measures and selected the one resulting more compliant with the Pareto rationality model proposed. The key result is an analytical framework to characterize how deviations from "rationality" depend on uncertainty quantifications and the evolution of the reward seeking process.

show abstract

Section: Related Workmentioning

confidence: 90%

Section: Information-based Acquisition Functionsmentioning

confidence: 99%

Uncertainty quantification and exploration–exploitation trade-off in humans

Candelieri

Ponti

Archetti

2021

J Ambient Intell Human Comput

View full text Add to dashboard Cite

show abstract

“…In Theorem 1, we first establish an upper bound on the regret of S-GP-TS for any approximate model that satisfies some conditions on the quality of their posterior approximations (Assumptions 1 and 2). Then, focusing on SVGP models, we provide bounds for the number m of inducing variables required to guarantee a low regret when the decomposed sampling rule of Wilson et al (2020) is used. The bounds on m are characterized by the spectrum of the kernel of the GP model.…”

Section: Contributionsmentioning

confidence: 99%

“…The inducing variables are manifested either as inducing points or inducing features (sometimes referred to as inducing inter-domain variables, Burt et al, 2019;van der Wilk et al, 2020). Furthermore, Wilson et al (2020) introduced an efficient sampling rule (referred to as decomposed sampling) which decomposes a sample from the posterior into the sum of a prior with M features (see Sec. 3.2) and an SVGP update, reducing the computational cost of drawing a sample to O ((m + M )N ).…”

Section: Introductionmentioning

confidence: 99%

Scalable Thompson Sampling using Sparse Gaussian Process Models

Vakili¹,

Moss²,

Артемьев³

et al. 2020

Preprint

View full text Add to dashboard Cite

Thompson Sampling (TS) with Gaussian Process (GP) models is a powerful tool for optimizing non-convex objective functions. Despite favorable theoretical properties, the computational complexity of the standard algorithms quickly becomes prohibitive as the number of observation points grows. Scalable TS methods can be implemented using sparse GP models, but at the price of an approximation error that invalidates the existing regret bounds. Here, we prove regret bounds for TS based on approximate GP posteriors, whose application to sparse GPs shows a drastic improvement in computational complexity with no loss in terms of the order of regret performance. In addition, an immediate implication of our results is an improved regret bound for the exact GP-TS. Specifically, we show an Õ( √ γ T T ) bound on regret that is an O( √ γ T ) improvement over the existing results where T is the time horizon and γ T is an upper bound on the information gain. This improvement is important to ensure sublinear regret bounds.

show abstract

“…Therefore, L , ∼ N (0, I) can be used to draw samples from from N (0, K) and L −1 b can be used to "whiten" the vector b. However, the Cholesky factor requires O(N 3 ) computation and O(N 2 ) memory for an N × N covariance matrix K. To avoid this large complexity, randomized algorithms [49,53], low-rank/sparse approximations [34,51,71], or alternative distributions [68] are often used to approximate the sampling and whitening operations.…”

Section: Introductionmentioning

confidence: 99%

Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization

Pleiss,

Jankowiak,

Eriksson

et al. 2020

Preprint

View full text Add to dashboard Cite

Matrix square roots and their inverses arise frequently in machine learning, e.g., when sampling from high-dimensional Gaussians N (0, K) or "whitening" a vector b against covariance matrix K. While existing methods typically require O(N 3 ) computation, we introduce a highly-efficient quadratic-time algorithm for computing K 1/2 b, K −1/2 b, and their derivatives through matrix-vector multiplication (MVMs). Our method combines Krylov subspace methods with a rational approximation and typically achieves 4 decimal places of accuracy with fewer than 100 MVMs. Moreover, the backward pass requires little additional computation. We demonstrate our method's applicability on matrices as large as 50,000 × 50,000well beyond traditional methods-with little approximation error. Applying this increased scalability to variational Gaussian processes, Bayesian optimization, and Gibbs sampling results in more powerful models with higher accuracy.Preprint. Under review.

show abstract

Efficiently Sampling Functions from Gaussian Process Posteriors

Cited by 4 publications

References 0 publications

Uncertainty quantification and exploration–exploitation trade-off in humans

Uncertainty quantification and exploration–exploitation trade-off in humans

Scalable Thompson Sampling using Sparse Gaussian Process Models

Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization

Contact Info

Product

Resources

About