Bayesian experimental design using regularized determinantal point processes

Dereziński, Michał; Liang, Feynman T.; Mahoney, Michael W.

doi:10.48550/arxiv.1906.04133

Cited by 3 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A number of optimality criteria have been considered for selecting the best subsets in experimental design. DPP subset selection has been shown to provide useful guarantees for some of the most popular criteria (such as for A-optimality and D-optimality), leading to new approximation algorithms [65,61,24]. Stochastic optimization.…”

Section: Discussionmentioning

confidence: 99%

Determinantal Point Processes in Randomized Numerical Linear Algebra

Dereziński¹,

Mahoney²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop improved algorithms for matrix problems that arise in scientific computing, data science, machine learning, etc. Determinantal Point Processes (DPPs), a seemingly unrelated topic in pure and applied mathematics, is a class of stochastic point processes with probability distribution characterized by sub-determinants of a kernel matrix. Recent work has uncovered deep and fruitful connections between DPPs and RandNLA which lead to new guarantees and improved algorithms that are of interest to both areas. We provide an overview of this exciting new line of research, including brief introductions to RandNLA and DPPs, as well as applications of DPPs to classical linear algebra tasks such as least squares regression, low-rank approximation and the Nyström method. For example, random sampling with a DPP leads to new kinds of unbiased estimators for least squares, enabling more refined statistical and inferential understanding of these algorithms; a DPP is, in some sense, an optimal randomized algorithm for the Nyström method; and a RandNLA technique called leverage score sampling can be derived as the marginal distribution of a DPP. We also discuss recent algorithmic developments, illustrating that, while not quite as efficient as standard RandNLA techniques, DPP-based algorithms are only moderately more expensive.

show abstract

Section: Discussionmentioning

confidence: 99%

Determinantal Point Processes in Randomized Numerical Linear Algebra

Dereziński¹,

Mahoney²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The term "cardinality constrained DPP" (also known as a "k-DPP" or "volume sampling") was introduced by Kulesza & Taskar (2011) to differentiate from standard DPPs which have random cardinality. Our proofs rely in part on converting DPP bounds to k-DPP bounds via a refinement of the concentration of measure argument used by Dereziński et al (2019a).…”

Section: Related Workmentioning

confidence: 99%

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Dereziński

Khanna

Mahoney

2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

The Column Subset Selection Problem (CSSP) and the Nystrom method are among the leading tools for constructing interpretable low-rank approximations of large datasets by selecting a small but representative set of features or instances. A fundamental question in this area is: what is the cost of this interpretability, i.e., how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the cost of interpretability as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nystrom tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.

show abstract

“…Given k and L, we define S ∼ k-DPP(L) as a distribution over all n k index subsets S ⊆ [n] of size k, such that Pr(S) ∝ det(L S ) is proportional to the determinant of the sub-matrix L S induced by the subset. DPPs have found numerous applications in machine learning, not only for summarization [31,22,20,7] and recommendation [18,8], but also in experimental design [14,33], stochastic optimization [38,34], Gaussian Process optimization [25], low-rank approximation [17,23,16], and more (recent surveys include [28,4,11]). Note that early work on DPPs focused on a random-size variant, which we denote S ∼ DPP(L), where the subset size is allowed to take any value between 0 and n, and the role of parameter k is replaced by the expected size E[|S|] = d eff (L) def = tr L(L + I) −1 .…”

Section: Introductionmentioning

confidence: 99%

Sampling from a $k$-DPP without looking at all items

Calandriello,

Dereziński,

Valko

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size k, our goal is to sample k out of n items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. k-DPP). Existing k-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all n items, making it infeasible for large datasets. A naïve heuristic addressing this problem is to uniformly subsample a fraction of the data and perform k-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of k items, while ensuring that this set is drawn exactly from the target distribution defined on all n items. We show empirically that our algorithm produces a k-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art. * Equal contribution.Preprint. Under review.

show abstract

Bayesian experimental design using regularized determinantal point processes

Cited by 3 publications

References 9 publications

Determinantal Point Processes in Randomized Numerical Linear Algebra

Determinantal Point Processes in Randomized Numerical Linear Algebra

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Sampling from a $k$-DPP without looking at all items

Contact Info

Product

Resources

About