Alexander Rakhlin scite author profile

We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer. Compared to previous work, these complexity bounds have improved dependence on the network depth, and under some additional assumptions, are fully independent of the network size (both depth and width). These results are derived using some novel techniques, which may be of independent interest.

show abstract

Automatic Instrument Segmentation in Robot-Assisted Surgery using Deep Learning

Shvets

Rakhlin²,

Kalinin

et al. 2018

260

197

View full text Add to dashboard Cite

Semantic segmentation of robotic instruments is an important problem for the robot-assisted surgery. One of the main challenges is to correctly detect an instrument's position for the tracking and pose estimation in the vicinity of surgical scenes. Accurate pixel-wise instrument segmentation is needed to address this challenge. In this paper we describe our deep learning-based approach for robotic instrument segmentation. Our approach demonstrates an improvement over the state-of-the-art results using several novel deep neural network architectures. It addressed the binary segmentation problem, where every pixel in an image is labeled as an instrument or background from the surgery video feed. In addition, we solve a multi-class segmentation problem, in which we distinguish between different instruments or different parts of an instrument from the background. In this setting, our approach outperforms other methods for automatic instrument segmentation thereby providing state-of-the-art results for these problems. The source code for our solution is made publicly available.

show abstract

Stochastic Convex Optimization with Bandit Feedback

Agarwal¹,

Foster²,

Hsu³

et al. 2013

SIAM J. Optim.

157

175

View full text Add to dashboard Cite

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f (x) at any query point x ∈ X . The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly(d) √ T ) regret. Since any algorithm has regret at least Ω( √ T ) on this problem, our algorithm is optimal in terms of the scaling with T .

show abstract

Distributed Detection: Finite-Time Analysis and Impact of Network Topology

Shahrampour

Rakhlin

Jadbabaie

2016

IEEE Trans. Automat. Contr.

137

View full text Add to dashboard Cite

This paper addresses the problem of distributed detection in multi-agent networks. Agents receive private signals about an unknown state of the world. The underlying state is globally identifiable, yet informative signals may be dispersed throughout the network. Using an optimization-based framework, we develop an iterative local strategy for updating individual beliefs. In contrast to the existing literature which focuses on asymptotic learning, we provide a finite-time analysis . Furthermore, we introduce a Kullback-Leibler cost to compare the efficiency of the algorithm to its centralized counterpart. Our bounds on the cost are expressed in terms of network size, spectral gap, centrality of each agent and relative entropy of agents' signal structures. A key observation is that distributing more informative signals to central agents results in a faster learning rate. Furthermore, optimizing the weights, we can speed up learning by improving the spectral gap. We also quantify the effect of link failures on learning speed in symmetric networks. We finally provide numerical simulations which verify our theoretical results.

show abstract

Just interpolate: Kernel “Ridgeless” regression can generalize

Liang¹,

Rakhlin²

2020

Ann. Statist.

152

133

View full text Add to dashboard Cite

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

Raginsky¹,

Rakhlin²,

Telgarsky³

2017

Preprint

123

View full text Add to dashboard Cite

Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular nonconvex objectives (Gelfand and Mitter, 1991).The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.As in the asymptotic setting, our analysis relates the discrete-time SGLD Markov chain to a continuous-time diffusion process. A new tool that drives the results is the use of weighted transportation cost inequalities to quantify the rate of convergence of SGLD to a stationary distribution in the Euclidean 2-Wasserstein distance.

show abstract

Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis

Rakhlin

Shvets

Iglovikov³

et al. 2018

Preprint

107

126

View full text Add to dashboard Cite

Abstract. Breast cancer is one of the main causes of cancer death worldwide. Early diagnostics significantly increases the chances of correct treatment and survival, but this process is tedious and often leads to a disagreement between pathologists. Computer-aided diagnosis systems showed potential for improving the diagnostic accuracy. In this work, we develop the computational approach based on deep convolution neural networks for breast cancer histology image classification. Hematoxylin and eosin stained breast histology microscopy image dataset is provided as a part of the ICIAR Grand Challenge on Breast Cancer Histology Images. Our approach utilizes several deep neural network architectures and gradient boosted trees classifier. For -class classification task, we report . % accuracy. For -class classification task to detect carcinomas we report . % accuracy, AUC . %, and sensitivity/specificity . / . % at the high-sensitivity operating point. To our knowledge, this approach outperforms other common methods in automated histopathological image classification. The source code for our approach is made publicly available at https://github.com/alexander-rakhlin/ICIAR

show abstract

Sequential complexities and uniform martingale laws of large numbers

Rakhlin

Sridharan

Tewari

2014

Probab. Theory Relat. Fields

122

View full text Add to dashboard Cite

We establish necessary and sufficient conditions for a uniform martingale Law of Large Numbers. We extend the technique of symmetrization to the case of dependent random variables and provide "sequential" (noni.i.d.) analogues of various classical measures of complexity, such as covering numbers and combinatorial dimensions from empirical process theory. We establish relationships between these various sequential complexity measures and show that they provide a tight control on the uniform convergence rates for empirical processes with dependent data. As a direct application of our results, we provide exponential inequalities for sums of martingale differences in Banach spaces.Keywords empirical processes, dependent data, uniform Glivenko-Cantelli classes, rademacher averages, sequential prediction Disciplines Statistics and ProbabilityThis journal article is available at ScholarlyCommons: http://repository.upenn.edu/statistics_papers/531 Sequential Complexities and Uniform Martingale Laws of Large NumbersAlexander Rakhlin Karthik Sridharan Ambuj TewariSeptember 19, 2013 AbstractWe establish necessary and sufficient conditions for a uniform martingale Law of Large Numbers. We extend the technique of symmetrization to the case of dependent random variables and provide "sequential" (non-i.i.d.) analogues of various classical measures of complexity, such as covering numbers and combinatorial dimensions from empirical process theory. We establish relationships between these various sequential complexity measures and show that they provide a tight control on the uniform convergence rates for empirical processes with dependent data. As a direct application of our results, we provide exponential inequalities for sums of martingale differences in Banach spaces.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.