Communication Efficient Distributed SGD with Compressed Sensing

Published in ACC, 2021

We consider large scale distributed optimization over a set of workers connected to a central server, where the limited communication bandwidth between the server and workers imposes a significant bottleneck for the optimization procedure. Inspired by recent advances in communication-efficient federated learning, we propose a distributed SGD-type algorithm that exploits the sparsity of the gradient, when possible, to reduce communication burden. At the heart of the proposed algorithm is to use compressed sensing techniques for the compression of the local stochastic gradients at the worker side; and at the server side, a sparse approximation of the global stochastic gradient is recovered from the noisy aggregated compressed local gradients. Theoretical analysis shows that our proposed algorithm is able to achieve favorable convergence rate compared to the vanilla SGD even in the presence of noise perturbation incurred by the communication channels. We also conduct numerical experiments of training residual networks on the CIFAR-10 dataset to corroborate the effectiveness of the proposed algorithm.