ICLR Gradient-Masked Federated Optimization

Poster
in
Workshop: Workshop on Distributed and Private Machine Learning

Gradient-Masked Federated Optimization

Irene Tenison · Sreya Francis · Irina Rish

[ Abstract ]

Abstract:

Federated Averaging (FedAVG) has become the most popular federated learning algorithm due to its simplicity and low communication overhead. We use simple examples to show that FedAVG has the tendency to sew together the optima across the participating clients. These sewed optima exhibit poor generalization when used on a new client with new data distribution. Inspired by the invariance principles in Arjovsky et al. (2019); Parascandolo et al. (2020), we focus on learning a model that is locally optimal across the different clients simultaneously. We propose an algorithm that masks gradients (AND-mask from Parascandoloet al.) across the clients and uses them to carry out server model updates. We show that this algorithm achieves similar accuracy (in and out-of-distribution) and requires fewer communication rounds to converge than FedAVG, especially when the data is non-identically distributed.

Chat is not available.

Poster in Workshop: Workshop on Distributed and Private Machine Learning

Gradient-Masked Federated Optimization

Irene Tenison · Sreya Francis · Irina Rish

Poster
in
Workshop: Workshop on Distributed and Private Machine Learning