“…Given k and L, we define S ∼ k-DPP(L) as a distribution over all n k index subsets S ⊆ [n] of size k, such that Pr(S) ∝ det(L S ) is proportional to the determinant of the sub-matrix L S induced by the subset. DPPs have found numerous applications in machine learning, not only for summarization [31,22,20,7] and recommendation [18,8], but also in experimental design [14,33], stochastic optimization [38,34], Gaussian Process optimization [25], low-rank approximation [17,23,16], and more (recent surveys include [28,4,11]). Note that early work on DPPs focused on a random-size variant, which we denote S ∼ DPP(L), where the subset size is allowed to take any value between 0 and n, and the role of parameter k is replaced by the expected size E[|S|] = d eff (L) def = tr L(L + I) −1 .…”