Konstantinos E. Nikolakakis scite author profile

We provide sharp path-dependent generalization and excess error guarantees for the fullbatch Gradient Decent (GD) algorithm for smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is a novel generalization error technique for deterministic symmetric algorithms, that implies average output stability and a bounded expected gradient of the loss at termination leads to generalization. This key result shows that small generalization error occurs at stationary points, and allows us to bypass Lipschitz assumptions on the loss prevalent in previous work.For nonconvex, convex and strongly convex losses, we show the explicit dependence of the generalization error in terms of the accumulated path-dependent optimization error, terminal optimization error, number of samples, and number of iterations. For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, under the proper choice of a decreasing step size. Further, if the loss is nonconvex but the objective is PL, we derive vanishing bounds on the corresponding excess risk. For convex and strongly-convex smooth losses, we prove that full-batch GD generalizes even for large constant step sizes, and achieves a small excess risk while training fast. Our full-batch GD generalization error and excess risk bounds are significantly tighter than the existing bounds for (stochastic) GD, when the loss is smooth (but possibly non-Lipschitz).

show abstract

Predictive Learning on Hidden Tree-Structured Ising Models

Nikolakakis¹,

Kalogerias²,

Sarwate³

2018

Preprint

View full text Add to dashboard Cite

We provide high-probability sample complexity guarantees for exact structure recovery and accurate Predictive Learning using noise-corrupted samples from an acyclic (tree-shaped) graphical model. The hidden variables follow a tree-structured Ising model distribution, whereas the observable variables are generated by a binary symmetric channel, taking the hidden variables as its input. This model arises naturally in a variety of applications, such as in physics, biology, computer science, and finance. The noiseless structure learning problem has been studied earlier by Bresler and Karzand (2018); this paper quantifies how noise in the hidden model impacts the sample complexity of structure learning and predictive distributional inference by proving upper and lower bounds on the sample complexity. Quite remarkably, for any tree with p vertices and probability of incorrect recovery δ > 0, the order of necessary number of samples remains logarithmic as in the noiseless case, i.e., O(log(p/δ)), for both aforementioned tasks. We also present a new equivalent of Isserlis's Theorem for sign-valued tree-structured distributions, yielding a new low-complexity algorithm for higher order moment estimation.

show abstract

Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme

Nikolakakis

Kalogerias

Sheffet

et al. 2021

IEEE J. Sel. Areas Inf. Theory

View full text Add to dashboard Cite

Black-Box Generalization

Nikolakakis¹,

Haddadpour²,

Kalogerias³

et al. 2022

Preprint

View full text Add to dashboard Cite

We provide the first generalization error analysis for black-box learning through derivativefree optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a d-dimensional model by replacing stochastic gradient directions with stochastic differences of K + 1 perturbed loss evaluations per dataset (example) query. For both unbounded and bounded possibly nonconvex losses, we present the first generalization bounds for the ZoSS algorithm. These bounds coincide with those for SGD, and rather surprisingly are independent of d, K and the batch size m, under appropriate choices of a slightly decreased learning rate. For bounded nonconvex losses and a batch size m = 1, we additionally show that both generalization error and learning rate are independent of d and K, and remain essentially the same as for the SGD, even for two function evaluations. Our results extensively extend and consistently recover established results for SGD in prior work, on both generalization bounds and corresponding learning rates. If additionally m = n, where n is the dataset size, we derive generalization guarantees for full-batch GD as well.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.