🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

What optimization algorithms are used in federated learning?

Federated learning relies on optimization algorithms designed to handle decentralized data, device heterogeneity, and communication constraints. The most widely used algorithm is Federated Averaging (FedAvg), which trains models locally on devices (clients) and aggregates updates on a central server. FedAvg performs multiple local stochastic gradient descent (SGD) steps on each client’s data before averaging the model parameters across clients. This reduces communication overhead compared to sending raw gradients after every batch. While FedAvg is simple and effective, it assumes clients have similar computational resources and data distributions—a limitation in real-world scenarios where data is non-IID (not independently and identically distributed).

To address challenges like client data skew and system heterogeneity, algorithms like SCAFFOLD and FedProx have been developed. SCAFFOLD introduces control variates (correction terms) to mitigate client drift caused by non-IID data, ensuring local updates align better with the global model. FedProx adds a proximal term to the local loss function, penalizing large deviations from the global model during local training. This stabilizes convergence when devices have varying computational capacities (e.g., some clients run fewer local epochs). For communication efficiency, methods like quantization (compressing model updates) and structured updates (enforcing sparsity) are often combined with these algorithms. For example, Google’s implementation of federated learning uses quantization to reduce the size of transmitted model weights by up to 90%.

Adaptive optimization techniques, inspired by centralized deep learning, have also been adapted for federated settings. FedAdam and FedYogi extend Adam’s adaptive learning rate mechanism to federated averaging, adjusting learning rates per parameter based on historical gradient statistics. These methods improve convergence in scenarios with uneven client participation or noisy updates. Additionally, differential privacy and secure aggregation are often integrated into optimization pipelines to protect user data. For instance, Apple’s federated learning framework uses secure aggregation to combine encrypted model updates from devices before decryption. Developers typically choose algorithms based on specific constraints: FedAvg for simplicity, FedProx for device heterogeneity, or adaptive methods for unstable training dynamics. Open-source libraries like TensorFlow Federated and PyTorch Lightning provide implementations of these algorithms, allowing developers to experiment and adapt them to their use cases.

Like the article? Spread the word