In federated learning, communication between the server and clients is structured to enable collaborative model training without transferring raw data. The server coordinates the process by sending the global model to participating clients, which train locally on their data. Clients then return model updates (e.g., gradients or weights) to the server, which aggregates them to improve the global model. Communication typically occurs over standard protocols like HTTP/HTTPS or gRPC, with encryption ensuring privacy. For example, a server might use TLS to secure data in transit, while clients authenticate via tokens or certificates to prevent unauthorized access. This setup balances efficiency and security, as only model parameters—not sensitive data—are exchanged.
Clients handle most computation locally, reducing server load. When a client receives the global model, it trains using its local dataset, computes updates, and sends these back. To minimize bandwidth, techniques like gradient compression (e.g., quantizing values to lower precision) or sparsification (sending only significant updates) are often applied. For instance, a mobile device might train a keyboard prediction model, then transmit only the top 10% of gradient values. The server must also handle variability in client availability—some devices might be offline or slow. To address this, frameworks like TensorFlow Federated use asynchronous updates, allowing the server to proceed once a subset of clients responds, rather than waiting for all.
The server aggregates client updates using algorithms like Federated Averaging (FedAvg), which computes a weighted average of client models based on dataset size or other factors. Secure aggregation protocols, such as those leveraging homomorphic encryption or secure multi-party computation, can further protect updates during aggregation. For example, Google’s implementation in Gboard masks individual contributions by combining encrypted updates before decryption. Frameworks like PyTorch’s FLSim or OpenFL provide built-in tools for managing communication workflows, handling retries for failed clients, and optimizing network usage. This architecture ensures scalability, privacy, and adaptability across diverse devices and network conditions while keeping raw data decentralized.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word