Variance-Reduced Stochastic Learning Under Random Reshuffling

Liu

et al. 2019

Self Cite

A new amortized variance-reduced gradient (AVRG) algorithm was developed in [1], which has constant storage requirement in comparison to SAGA and balanced gradient computations in comparison to SVRG. One key advantage of the AVRG strategy is its amenability to decentralized implementations. In this work, we show how AVRG can be extended to the network case where multiple learning agents are assumed to be connected by a graph topology. In this scenario, each agent observes data that is spatially distributed and all agents are only allowed to communicate with direct neighbors. Moreover, the amount of data observed by the individual agents may differ drastically. For such situations, the balanced gradient computation property of AVRG becomes a real advantage in reducing idle time caused by unbalanced local data storage requirements, which is characteristic of other reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is shown to have linear convergence to the exact solution, and is much more memory efficient than other alternative algorithms. In addition, we propose a mini-batch strategy to balance the communication and computation efficiency for diffusion-AVRG. When a proper batch size is employed, it is observed in simulations that diffusion-AVRG is more computationally efficient than exact diffusion or EXTRA while maintaining almost the same communication efficiency.

Section: A Problem Formulationmentioning

confidence: 99%

Section: Contributionmentioning

confidence: 99%

See 1 more Smart Citation

Variance-Reduced Stochastic Learning by Networked Agents Under Random Reshuffling

Liu

et al. 2019

Self Cite

“…The boldface notation for the symbols w and σ in (3) emphasizes the random nature of these variables due to the randomness in the permutation operation. While the samples over one epoch are no longer picked independently from each other, the uniformity of the permutation function implies the following useful properties [19], [22], [23]:…”

Section: Motivationmentioning

confidence: 99%

Stochastic Learning Under Random Reshuffling With Constant Step-Sizes

Vlaski

et al. 2019

Self Cite

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case and strongly convex loss functions. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size O(µ 2 ) around the minimizer rather than O(µ). Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail the differences between sampling with and without replacement. We also explain the periodic behavior that is observed in random reshuffling implementations.

“…There is a family of variance-reduction algorithms such as SVRG [55], SAGA [44], and AVRG [56] that can approach the exact solution of the empirical risk function with constant stepsize. In this work, we exploit the SAGA construction because the variables {u n,k } can readily be used in that implementation.…”

Section: B Variance-reduction Algorithmmentioning

confidence: 99%

Supervised Learning Under Distributed Features

Sayed

2019

Self Cite

This work studies the problem of learning under both large datasets and large-dimensional feature space scenarios. The feature information is assumed to be spread across agents in a network, where each agent observes some of the features. Through local cooperation, the agents are supposed to interact with each other to solve an inference problem and converge towards the global minimizer of an empirical risk. We study this problem exclusively in the primal domain, and propose new and effective distributed solutions with guaranteed convergence to the minimizer with linear rate under strong convexity. This is achieved by combining a dynamic diffusion construction, a pipeline strategy, and variance-reduced techniques. Simulation results illustrate the conclusions.