2019
DOI: 10.48550/arxiv.1911.07971
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

vqSGD: Vector Quantized Stochastic Gradient Descent

Abstract: In this work, we present a family of vector quantization schemes vqSGD (Vector-antized Stochastic Gradient Descent) that provide asymptotic reduction in the communication cost with convergence guarantees in distributed computation and learning se ings. In particular, we consider a randomized scheme, based on convex hull of a point set, that returns an unbiased estimator of a d-dimensional gradient vector with bounded variance. We provide multiple e cient instances of our scheme that require only O(log d) bits … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
15
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(15 citation statements)
references
References 14 publications
0
15
0
Order By: Relevance
“…Works [33,10,24,17] consider empirical mean estimation without assuming any statistical distribution on the data, and quantize the vectors to a small number of bits. There is a significant recent interest in considering the communicationefficient (empirical) mean estimation problem in the context of distributed stochastic gradient descent, see e.g., [1,2,12,5,37,36,34,20,4,23,6,14,27]. These works can broadly be partitioned into three categories: (i) Quantization: encoding each element of the vectors to a small number of bits [18,1,12,5,37,20,23,27], (ii) Sparsification: sending only a subset of elements of the vectors [2,34,36].…”
Section: Related Workmentioning
confidence: 99%
“…Works [33,10,24,17] consider empirical mean estimation without assuming any statistical distribution on the data, and quantize the vectors to a small number of bits. There is a significant recent interest in considering the communicationefficient (empirical) mean estimation problem in the context of distributed stochastic gradient descent, see e.g., [1,2,12,5,37,36,34,20,4,23,6,14,27]. These works can broadly be partitioned into three categories: (i) Quantization: encoding each element of the vectors to a small number of bits [18,1,12,5,37,20,23,27], (ii) Sparsification: sending only a subset of elements of the vectors [2,34,36].…”
Section: Related Workmentioning
confidence: 99%
“…When designing a first-order optimization algorithm under local information constraints, one not only needs to design the optimization algorithm itself, but also the algorithm for local processing of the gradient estimates. Many such algorithms have been proposed in recent years; see, for instance, [DJW14], [ACGMMTZ16], [ASYKM18], [GKMM19], [SVK20], [GDDKS20], and the references therein for privacy constraints; [SFDLY14], [AGLTV17], [SYKM17], [KR18], [FTMARRK20], [RKFR19], [LKH20], [ADSFS19], [CKÖ20], [HHWY19], [MT20b], [MT20a], [SSR20], and the references therein for communication constraints; [Nes13,RT12] for computational constraints. However, these algorithms primarily consider nonadaptive procedures for gradient processing (with the exception of [FTMARRK20]): that is, the scheme used to process the gradients at any iteration cannot depend on the information gleaned from previous iterations.…”
Section: Introductionmentioning
confidence: 99%
“…We remark that while our quantizers are related to the ones used in prior works, our main contribution is to show that our specific design choices yield optimal precision. For instance, the quantizers in [11] expresses the input as a convex combination of set of points, similar to SimQ. In fact, one of the quantizers in [11] uses similar set of points as that of SimQ with a different scaling.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, the quantizers in [11] expresses the input as a convex combination of set of points, similar to SimQ. In fact, one of the quantizers in [11] uses similar set of points as that of SimQ with a different scaling. However, the quantizers in [11] are designed keeping in mind other objectives and they fall short of attaining the optimal precision guarantees of SimQ and SimQ + .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation