“…Works [33,10,24,17] consider empirical mean estimation without assuming any statistical distribution on the data, and quantize the vectors to a small number of bits. There is a significant recent interest in considering the communicationefficient (empirical) mean estimation problem in the context of distributed stochastic gradient descent, see e.g., [1,2,12,5,37,36,34,20,4,23,6,14,27]. These works can broadly be partitioned into three categories: (i) Quantization: encoding each element of the vectors to a small number of bits [18,1,12,5,37,20,23,27], (ii) Sparsification: sending only a subset of elements of the vectors [2,34,36].…”