adap / flower Goto Github PK

Flower: A Friendly Federated Learning Framework

License: Apache License 2.0

Python 69.18% Shell 1.20% Dockerfile 0.10% C++ 1.18% Swift 2.83% Makefile 0.03% Batchfile 0.04% CSS 0.31% HTML 0.36% Kotlin 0.58% Jupyter Notebook 23.79% CMake 0.09% Smarty 0.30%

flower federated-learning federated-learning-framework federated-analytics fleet-learning fleet-intelligence deep-learning machine-learning pytorch scikit-learn

flower's People

Contributors

Stargazers

Watchers

Forkers

sishtiaq akhilmathurs yiliucs jding0 danieljanes mehrdad-shokri kuan-li vonrosenchild meteozay vballoli roberthoenig yyin-dev stanleyw-tw jafermarq abh15 lvnpz luan-gu rubiel1 hulanwin algebra-cadabra mbrukman raonyguimaraes twishmay zeta1999 internet-plus-of-hit zliel njhurst anupam1050 aouedions11 yuchenzhao xinshang-iai rahulmukthineni lizonghang matthmore sisco0 kooper-ai spread0x michellebarbosa khanzaifa37 meiruijiang gbarvinok dcmartin jeppeklitgaard dejavu6 vipulasd sudocreate258 patselle zbasheer tangkg ashkan-pirmani matrogersmtl kingofmath nnbtam99 pszemkor cozek yasmine-97 ddayzzz joshua-sterner garyye sebastianspeitel neiljdo morpeko99 tanvirarafin mh-mahmoud amitchaulwar stjordanis chill37 bhrtagrwl01 whoiszyc rampluto suresh2koushik muhammadfrdaus shanullah scylj1 hei411 hwangdongjun mrinaald amm299 danielspg rafaelabrum riccardo-rusca tatiana-s douglasmenezes08 aakashkhochare prasadph akaanirban jinb-park kushal-g xueyuuu shobhitdubey0729 leticiacechinel vu1seek nooralahzadeh architjen veryhannibal knut0815 prathapkumarbaratam wns823 31958 msimonin

flower's Issues

Improve docstring for `Client`

Document executing baselines

The Flower docs contain only a general overview of the available baselines. It would be great to document individual baselines more thoroughly and have detailed descriptions on how to execute/reproduce them.

Improve website menu display

Should we consider using a different theme structure for the menu in the website? Something like

Quickstrart
  |--Keras tensorflow
  |-- Pytorch
Installation
API
Examples
Cloud Usage

And maybe a different Sphinx theme? I found this: https://sphinx-themes.org/

Document available strategy implementations

Flower provides a few popular FL algorithms out-of-the-box. Those implementations along with their configuration parameters should be documented to help users understand what's already available.

Standalone examples w/ own pyproject.toml

Currently, examples are located in src/flwr_example/.... The extras required by those examples are mentioned in the general pyproject.toml. This structure makes it difficult to "just copy and paste" examples.

A better approach would be to move examples into a top-level examples directory and treat every example as a standalone project (i.e., give each example its own pyproject.toml).

Improve docstring for `start_keras_client`

Document Flower architecture

Document the general architecture of Flower:

Server side: Communication, strategies
Client side: Communication, callback methods, protocol-level integration
Extended: Baselines, ops, etc.

execution of client example crash after training is terminated

Hello !

I'm trying to use the example you provide (quickstart and tensorflow). I achieve to train models but clients can't achieve to stop themselves without crashing.

I just run run_server.sh and run_clients.sh in separate terminals, see the clients downloading data and train their models. After training, the server evaluate the model ant stop itself properly.
At this moments, clients crash by rising an exception with this message :

Traceback (most recent call last):
  File "client.py", line 114, in <module>
    main()
  File "client.py", line 110, in main
    fl.client.start_keras_client(args.server_address, client)
  File "/usr/local/lib/python3.7/dist-packages/flwr/client/app.py", line 47, in start_keras_client
    start_client(server_address, flower_client)
  File "/usr/local/lib/python3.7/dist-packages/flwr/client/app.py", line 35, in start_client
    server_message = receive()
  File "/usr/local/lib/python3.7/dist-packages/flwr/client/grpc_client/connection.py", line 59, in <lambda>
    receive: Callable[[], ServerMessage] = lambda: next(server_message_iterator)
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 706, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Socket closed"
        debug_error_string = "{"created":"@1601383903.613729459","description":"Error received from peer ipv6:[::]:8080","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket closed","grpc_status":14}"

Rename `rnd` to `fl_round`

rnd might be mistaken for random, thus we should perhaps rename it to fl_round (other suggestions are welcome)

Note: this would be an incompatible change.

JavaScript/TypeScript SDK

To support more heterogeneous environments/setups it would be great to have JavaScript/TypeScript Client SDK. The task shouldn't be to hard as gRPC-Web improved substantially. We are happy to support if someone wants to tackle this issue and ideally provide an example with e.g. TensorFlow Lite.
A potential use-case could be improving an image classification model as described in the TF Lite docs

Release process

Define and document the release process.

Topics include:

Define how to manage releases (e.g., branching, tags, …)
Document release note handling
Document where (and how) build artifacts are published

Improve docstring for `start_client`

Rename `Setting` client to `Baseline`

Rename the class Setting in

https://github.com/adap/flower/blob/34cb9fce969363437c2dd3cee6b6daaf93a2ffce/src/flwr_experimental/baseline/setting.py

to Baseline.

This will make it easier to understand at various places.

Remove obsolete baseline settings

Upgrade to PyTorch 1.6

Upgrade all torch dependency to 1.6 (torchvision 0.7) which hopefully improves PyTorch-related mypy type checks.

Move Python packages under `src/py`

Python packages are currently located under src, along with ProtoBuf definitions in src/proto.

To have a clean structure for upcoming SDKs in other languages (Java, Swift, C++, ...), the Python packages should be moved under src/py. The resulting structure would enable other languages to be placed under src in a clean way:

src/
  cc/
  proto/
  py/
  swift/
  ...

Update PyPI classifier to `4 - Beta`

Once all issues required for beta status are resolved, update the PyPI classifier to 4 - Beta.

Document server-side and client-side evaluation

Flower offers both server-side and client-side evaluation, but the ways to use it are not documented yet.

Create Android/Java Client SDK

Create a Java Client SDK to make it easier to start with Java/Android.

C++ Client SDK

Description

C++ is one of the most defining programming languages of our time. It is used in many critical applications and the go-to language for performance-sensitive applications, such as robotics or automotive. Federated Learning can enable entirely new platforms in these domains and we thus want to support C++ by providing a Flower C++ SDK. Flower communicates between the server and the client using gRPC. At the moment, every C++ user needs to build their own integration with the gRPC message protocol to run Flower.

Prep Work / PoC

The C++ SDK needs to serialize model parameters (and other values that get communicated between client and server) in a way that can be de-serialized by Python on the server-side. ProtoBuf makes this easy for most values, but it might be helpful to build a small proof of concept for serializing/deserializing the model parameters. Flower represents model parameters as a list of byte arrays (think: the parameters of each layer in a neural network can be serialized to a single byte array). A PoC would then serialized these parameters in C++ and deserialized them in Python (and vice versa):

Define a simple machine learning model using C++/libtorch
Extract the model parameters from the model in C++
Serialize the extracted model parameters into a byte array / a list of byte arrays
Save the list of byte arrays to disk
Read the list of byte arrays from Python
Deserialize the model parameters from the list of byte arrays
Load a PyTorch model and update it using the deserialized model parameters
Implement the flow in reverse (serialize in Python, deserialize in C++)

Expected Outcome

The full SDK implementation requires the following tasks:

Required Skills:

Strong experience with C++
Interest in gRPC
Basic understanding of machine learning
Optional: Basic libtorch (PyTorch C++ API https://pytorch.org/cppdocs/) understanding

Move datasets code for baselines into different repository/package

Currently the federated datasets for the baselines are generated on demand in flwr_experimental/baseline/dataset and cached afterwards. The cache will be used if present.
We would like to move the datasets code into a different repository named e.g. federated-datasets and host the generated datasets for everyone to load using the PyPi package which downloads and caches the datasets. Especially in case of bigger datasets each client in the federated baseline experiment doesn't have to download the whole (original) dataset so this would be a significant improvement.
It would be great if someone wants to tackle this task.

Replace individual mypy flags with `strict = True`

The mypy configuration in mypy.ini uses several flags to increase strictness.

To simplify this setup and automatically opt-in to upcoming strictness flags in future mypy versions, it would be preferable to replace those individual flags with the single strict = True setting.

New baseline: MNIST + FedAvg

Create a new baseline for MNIST and FedAvg, as described in McMahan et al., 2017 (https://arxiv.org/abs/1602.05629):

100 partitions
IID case and non-IID case
Two-layer CNN

Document Examples

The Flower docs contain only a general overview of the available examples. It would be beneficial (especially for first time users) to have a more detailed documentation for individual examples.

Publish versioned documentation

Currently, flower.dev only shows the latest documentation. Documentation of older versions of Flower should remain available (users need to be able to switch between the documentation for different versions).

Document strategy interface

Flower enables developers and researchers to implement custom federated learning algorithms using the Strategy interface. This interface should be documented with explanations on how to use it and a working example of a custom strategy implementation.

Is it possible to have Federated Learning on Cloud-Edge?

Hi everyone,

Currently I am working on a school project about federated learning and came across your framework during exploratory analysis. My project should utilize federated learning in this manner - I have an aggregation server (let's say in a cloud). I want this server to provide model to my 2 Raspberry PIs. These two RPIs would then train the model on a local data for x epochs and provide the trained models/gradients back to the global server. On this server, the results would be federated averaged and new model would be sent to the PIs. Is such a workflow possible with your framework? If so, could you provide me a hint?

Thank you,
Best regards

Python 3.9 compatibility

The Flower codebase itself is Python 3.9 ready. Some dependencies are however not yet Python 3.9 compatible, so we need to wait until those dependencies are ready.

Improve docstring for `Strategy`

Change strategy

Is it possible to provide some documentation/tutorials on how to change/customize the training strategy? Thanks!

Document Federated Learning 101

The Flower docs would benefit from having a general "Federated Learning 101" which presents the basic ideas, concepts, and terminologies around federated learning.

Is there any Secured Aggregation implemented?

Create a TensorFlow specific tutorial

Create an advanced tutorial for a use-case realised with TensorFlow.

New baseline: Shakespeare + LSTM

A common baseline for FL is based on the Shakespeare dataset (e.g., McMahan et al., 2017).

Improve docstring for `start_server`

Create a PyTorch specific tutorial

Create an advanced tutorial for a use-case realised with PyTorch.

Flutter SDK

To support more heterogeneous environments/setups it would be great to have Flutter Client SDK. We are happy to support if someone wants to tackle this issue and ideally provide an example with e.g. TensorFlow Lite.
A potential use-case could be improving an image classification model as described in the TF Lite docs.

Remove need for .flower_ops file

Currently when running the baseline we have to create a .flower_ops file to set various configs.We would like to remove the need for the config file as most of the settings in there could be automated.

The file contains

Path configuration

[paths]
wheel_dir = ~/some/path/flower/dist/
wheel_filename = flower-0.3.0-py3-none-any.whl

Which could be automatically obtained by a lookup in the dist directory.

AWS configs

[aws]
image_id = ami-123456789
key_name = flower
subnet_id = subnet-123456
security_group_ids = sg-123456789
logserver_s3_bucket = my_s3_bucket_name

Using the default VPC a subnet which has enough capacity to start the required number of instances could be automatically selected.
key_name could default to flower OR the user could be asked if a new key named flower should be created/downloaded/locally made available.
We could default to a security group named flower and again if it does not exit we could offer to create it.
logserver_s3_bucket not sure how to replace this.

[ssh]
private_key = ~/.ssh/my_private_key

Here we could default to a key named flower.

Finally all these are ideas/suggestions. If someone wants to tackle this we would be happy.

Add docformatter to the dev scripts and check in the CI if formatting is required

Description

Docformatter automatically formats docstrings to follow a subset of the PEP 257 conventions. Add docformatter to the Flower project.

Tasks

Add docformatter script
Add check to CI

FL and privacy preserving techniques

Federated averaging is being mentioned in the paper but not much about secured aggregation. What all privacy preserving and FL techniques are implemented?

Can the framework be deployed on institutions with different infrastructure? For example for healthcare institutions where the edge devices will be inside the institutions firewall.

Environment setup script

Create a convenience script which configures a fresh Ubuntu system so that a default coding environment is created. The Purpose is to ease the onboarding process for new contributors.

Install / configure PyEnv
Configure VSCode
Install VSCode extensions
- gRPC/protobuf
- Python

Alternative ideas to evaluate:

Visual Studio Code Remote - Containers.

Implement Secure Aggregation

Secure aggregation [Bonawitz et al., 2017] is an important element of many FL workloads. We need to add a way to use secure aggregation within different strategies, ideally in a modular way (i.e., there is one secure aggregation implementation which can easily be plugged into different strategy implementations).

Paper: https://eprint.iacr.org/2017/281.pdf

Persist Model During Training

Mode persistence is currently only possible via custom strategy implementations. There should be a better way to do save model checkpoints periodically, ideally in a modular way which is reusable across different strategy implementations.

Improve docstring for `KerasClient`

Asynchronous FL using Flower

I am interested in doing async FL using Flower. However, no async strategy is provided by Flower.
The Flower paper indicates that to change another strategy, we just need to implement a new Strategy. However, I think server.py is intrinsically synchronous, and not suitable for asynchronous strategies. In other words, to do asynchronous training, we need to change server.py.

Consider the code block in server.py:fit (I just show relevant lines):

def fit(self, num_rounds: int) -> History:
    # ...

    for current_round in range(1, num_rounds + 1):
		# Train model and replace previous global model
		weights_prime = self.fit_round(rnd=current_round)
		if weights_prime is not None:
			self.weights = weights_prime

		# ...
        
def fit_round(self, rnd: int) -> Optional[Weights]:
    # ...
    results, failures = fit_clients(client_instructions)
    return self.strategy.on_aggregate_fit(rnd, results, failures)

def fit_clients(client_instructions):
    """Refine weights concurrently on all selected clients."""
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(fit_client, c, ins) for c, ins in client_instructions
        ]
        concurrent.futures.wait(futures)
    results: List[Tuple[ClientProxy, FitRes]] = []
    failures: List[BaseException] = []
    for future in futures:
        failure = future.exception()
        # ...
    return results, failures

fit_round doesn't update model until fit_clients collects all results/failures. However, in async FL, the model should be updated whenever the server receives a computation result from a client (Reference: Asynchronous Federated Optimization by Xie et al.). So we need to change server.py to do async FL.

Am I missing something? Is there a way to do async FL without changing server.py but only implement a new Strategy? All helps will be appreciated.

Upgrade build tools to their latest versions

Upgrade:

pip
setuptools
poetry

Not able to run example in flwr_example/quickstart_pytorch

I cannot run

$ ./src/py/flwr_example/quickstart_pytorch/run-server.sh
/usr/bin/python3: Error while finding module specification for 'flwr_example.quickstart_pytorch.server' (ModuleNotFoundError: No module named 'flwr_example.quickstart_pytorch')

But I can run the other example

$ ./src/py/flwr_example/pytorch/run-server.sh

Poetry Update

Update Poetry version to 1.1.4

Saving Global Model Parameters

Hi,

I am currently trying out the flower framework under pytorch.
I am very surprised how well it works.
One thing is still unclear to me, after the federated-learning process is over, i would like to save the new global model parameters on the clients, after the server distribute them to all clients.
How is that possible, or where to implement them, if not already done?

And why is min_fit_clients and min_eval_clients in fedavg.py set to 2 and not 1, is there a special reason?

Greetings

Patrick

adap / flower Goto Github PK

flower's People

Contributors

Stargazers

Watchers

Forkers

flower's Issues

Description

Prep Work / PoC

Expected Outcome

Required Skills:

Description

Tasks

Recommend Projects

Recommend Topics

Recommend Org