Giter Club home page Giter Club logo

adap / flower Goto Github PK

View Code? Open in Web Editor NEW
4.2K 37.0 744.0 109.44 MB

Flower: A Friendly Federated Learning Framework

Home Page: https://flower.ai

License: Apache License 2.0

Python 69.18% Shell 1.20% Dockerfile 0.10% C++ 1.18% Swift 2.83% Makefile 0.03% Batchfile 0.04% CSS 0.31% HTML 0.36% Kotlin 0.58% Jupyter Notebook 23.79% CMake 0.09% Smarty 0.30%
flower federated-learning federated-learning-framework federated-analytics fleet-learning fleet-intelligence deep-learning machine-learning pytorch scikit-learn

flower's People

Contributors

adam-narozniak avatar akhilmathurs avatar cantuerk avatar charlesbvll avatar chongshenng avatar cozek avatar danieljanes avatar danielnugraha avatar dannymcy avatar dependabot[bot] avatar edogab33 avatar gubertoli avatar jafermarq avatar makgulati avatar mariaboerner1987 avatar moep90 avatar mohammadnaseri avatar nfnt avatar panh99 avatar pedropgusmao avatar robert-steiner avatar sichanghe avatar sisco0 avatar stevelaskaridis avatar tabdar-khan avatar tanertopal avatar vasundharaagarwal avatar vingt100 avatar weblate avatar yan-gao-gy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flower's Issues

Document executing baselines

The Flower docs contain only a general overview of the available baselines. It would be great to document individual baselines more thoroughly and have detailed descriptions on how to execute/reproduce them.

Improve website menu display

Should we consider using a different theme structure for the menu in the website? Something like

Quickstrart
  |--Keras tensorflow
  |-- Pytorch
Installation
API
Examples
Cloud Usage 

And maybe a different Sphinx theme? I found this: https://sphinx-themes.org/

Document available strategy implementations

Flower provides a few popular FL algorithms out-of-the-box. Those implementations along with their configuration parameters should be documented to help users understand what's already available.

Standalone examples w/ own pyproject.toml

Currently, examples are located in src/flwr_example/.... The extras required by those examples are mentioned in the general pyproject.toml. This structure makes it difficult to "just copy and paste" examples.

A better approach would be to move examples into a top-level examples directory and treat every example as a standalone project (i.e., give each example its own pyproject.toml).

Document Flower architecture

Document the general architecture of Flower:

  • Server side: Communication, strategies
  • Client side: Communication, callback methods, protocol-level integration
  • Extended: Baselines, ops, etc.

execution of client example crash after training is terminated

Hello !

I'm trying to use the example you provide (quickstart and tensorflow). I achieve to train models but clients can't achieve to stop themselves without crashing.

I just run run_server.sh and run_clients.sh in separate terminals, see the clients downloading data and train their models. After training, the server evaluate the model ant stop itself properly.
At this moments, clients crash by rising an exception with this message :

Traceback (most recent call last):
  File "client.py", line 114, in <module>
    main()
  File "client.py", line 110, in main
    fl.client.start_keras_client(args.server_address, client)
  File "/usr/local/lib/python3.7/dist-packages/flwr/client/app.py", line 47, in start_keras_client
    start_client(server_address, flower_client)
  File "/usr/local/lib/python3.7/dist-packages/flwr/client/app.py", line 35, in start_client
    server_message = receive()
  File "/usr/local/lib/python3.7/dist-packages/flwr/client/grpc_client/connection.py", line 59, in <lambda>
    receive: Callable[[], ServerMessage] = lambda: next(server_message_iterator)
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 706, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Socket closed"
        debug_error_string = "{"created":"@1601383903.613729459","description":"Error received from peer ipv6:[::]:8080","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket closed","grpc_status":14}"

Rename `rnd` to `fl_round`

rnd might be mistaken for random, thus we should perhaps rename it to fl_round (other suggestions are welcome)

Note: this would be an incompatible change.

JavaScript/TypeScript SDK

To support more heterogeneous environments/setups it would be great to have JavaScript/TypeScript Client SDK. The task shouldn't be to hard as gRPC-Web improved substantially. We are happy to support if someone wants to tackle this issue and ideally provide an example with e.g. TensorFlow Lite.
A potential use-case could be improving an image classification model as described in the TF Lite docs

Release process

Define and document the release process.

Topics include:

  • Define how to manage releases (e.g., branching, tags, โ€ฆ)
  • Document release note handling
  • Document where (and how) build artifacts are published

Upgrade to PyTorch 1.6

Upgrade all torch dependency to 1.6 (torchvision 0.7) which hopefully improves PyTorch-related mypy type checks.

Move Python packages under `src/py`

Python packages are currently located under src, along with ProtoBuf definitions in src/proto.

To have a clean structure for upcoming SDKs in other languages (Java, Swift, C++, ...), the Python packages should be moved under src/py. The resulting structure would enable other languages to be placed under src in a clean way:

src/
  cc/
  proto/
  py/
  swift/
  ...

C++ Client SDK

Description

C++ is one of the most defining programming languages of our time. It is used in many critical applications and the go-to language for performance-sensitive applications, such as robotics or automotive. Federated Learning can enable entirely new platforms in these domains and we thus want to support C++ by providing a Flower C++ SDK. Flower communicates between the server and the client using gRPC. At the moment, every C++ user needs to build their own integration with the gRPC message protocol to run Flower.

Prep Work / PoC

The C++ SDK needs to serialize model parameters (and other values that get communicated between client and server) in a way that can be de-serialized by Python on the server-side. ProtoBuf makes this easy for most values, but it might be helpful to build a small proof of concept for serializing/deserializing the model parameters. Flower represents model parameters as a list of byte arrays (think: the parameters of each layer in a neural network can be serialized to a single byte array). A PoC would then serialized these parameters in C++ and deserialized them in Python (and vice versa):

  1. Define a simple machine learning model using C++/libtorch
  2. Extract the model parameters from the model in C++
  3. Serialize the extracted model parameters into a byte array / a list of byte arrays
  4. Save the list of byte arrays to disk
  5. Read the list of byte arrays from Python
  6. Deserialize the model parameters from the list of byte arrays
  7. Load a PyTorch model and update it using the deserialized model parameters
  8. Implement the flow in reverse (serialize in Python, deserialize in C++)

Expected Outcome

The full SDK implementation requires the following tasks:

  • Set up C++ tooling in the Flower codebase
    • Run tests on CI
  • Set up ProtoBuf/gRPC compilation for C++
  • Define the user-facing API of the C++ SDK
    • Define abstract class / interface which Flower users can override
    • Define a function to start the client
  • Implement the API
    • Establish a connection to the server
    • Implement handling of protocol messages
  • Test the new SDK
  • Document everything
  • Build a C++ library and publish it
  • Build a code example using the C++ SDK and libtorch (PyTorch C++ API)
  • Write Blog post about the available feature

Required Skills:

  • Strong experience with C++
  • Interest in gRPC
  • Basic understanding of machine learning
  • Optional: Basic libtorch (PyTorch C++ API https://pytorch.org/cppdocs/) understanding

Move datasets code for baselines into different repository/package

Currently the federated datasets for the baselines are generated on demand in flwr_experimental/baseline/dataset and cached afterwards. The cache will be used if present.
We would like to move the datasets code into a different repository named e.g. federated-datasets and host the generated datasets for everyone to load using the PyPi package which downloads and caches the datasets. Especially in case of bigger datasets each client in the federated baseline experiment doesn't have to download the whole (original) dataset so this would be a significant improvement.
It would be great if someone wants to tackle this task.

Replace individual mypy flags with `strict = True`

The mypy configuration in mypy.ini uses several flags to increase strictness.

To simplify this setup and automatically opt-in to upcoming strictness flags in future mypy versions, it would be preferable to replace those individual flags with the single strict = True setting.

Document Examples

The Flower docs contain only a general overview of the available examples. It would be beneficial (especially for first time users) to have a more detailed documentation for individual examples.

Publish versioned documentation

Currently, flower.dev only shows the latest documentation. Documentation of older versions of Flower should remain available (users need to be able to switch between the documentation for different versions).

Document strategy interface

Flower enables developers and researchers to implement custom federated learning algorithms using the Strategy interface. This interface should be documented with explanations on how to use it and a working example of a custom strategy implementation.

Is it possible to have Federated Learning on Cloud-Edge?

Hi everyone,

Currently I am working on a school project about federated learning and came across your framework during exploratory analysis. My project should utilize federated learning in this manner - I have an aggregation server (let's say in a cloud). I want this server to provide model to my 2 Raspberry PIs. These two RPIs would then train the model on a local data for x epochs and provide the trained models/gradients back to the global server. On this server, the results would be federated averaged and new model would be sent to the PIs. Is such a workflow possible with your framework? If so, could you provide me a hint?

Thank you,
Best regards

Python 3.9 compatibility

The Flower codebase itself is Python 3.9 ready. Some dependencies are however not yet Python 3.9 compatible, so we need to wait until those dependencies are ready.

Change strategy

Is it possible to provide some documentation/tutorials on how to change/customize the training strategy? Thanks!

Document Federated Learning 101

The Flower docs would benefit from having a general "Federated Learning 101" which presents the basic ideas, concepts, and terminologies around federated learning.

Flutter SDK

To support more heterogeneous environments/setups it would be great to have Flutter Client SDK. We are happy to support if someone wants to tackle this issue and ideally provide an example with e.g. TensorFlow Lite.
A potential use-case could be improving an image classification model as described in the TF Lite docs.

Remove need for .flower_ops file

Currently when running the baseline we have to create a .flower_ops file to set various configs.We would like to remove the need for the config file as most of the settings in there could be automated.

The file contains

Path configuration

[paths]
wheel_dir = ~/some/path/flower/dist/
wheel_filename = flower-0.3.0-py3-none-any.whl

Which could be automatically obtained by a lookup in the dist directory.

AWS configs

[aws]
image_id = ami-123456789
key_name = flower
subnet_id = subnet-123456
security_group_ids = sg-123456789
logserver_s3_bucket = my_s3_bucket_name

  • Using the default VPC a subnet which has enough capacity to start the required number of instances could be automatically selected.
  • key_name could default to flower OR the user could be asked if a new key named flower should be created/downloaded/locally made available.
  • We could default to a security group named flower and again if it does not exit we could offer to create it.
  • logserver_s3_bucket not sure how to replace this.

[ssh]
private_key = ~/.ssh/my_private_key

Here we could default to a key named flower.

Finally all these are ideas/suggestions. If someone wants to tackle this we would be happy.

FL and privacy preserving techniques

Federated averaging is being mentioned in the paper but not much about secured aggregation. What all privacy preserving and FL techniques are implemented?

Can the framework be deployed on institutions with different infrastructure? For example for healthcare institutions where the edge devices will be inside the institutions firewall.

Environment setup script

Create a convenience script which configures a fresh Ubuntu system so that a default coding environment is created. The Purpose is to ease the onboarding process for new contributors.

  • Install / configure PyEnv
  • Configure VSCode
  • Install VSCode extensions
    • gRPC/protobuf
    • Python

Alternative ideas to evaluate:

  • Visual Studio Code Remote - Containers.

Implement Secure Aggregation

Secure aggregation [Bonawitz et al., 2017] is an important element of many FL workloads. We need to add a way to use secure aggregation within different strategies, ideally in a modular way (i.e., there is one secure aggregation implementation which can easily be plugged into different strategy implementations).

Paper: https://eprint.iacr.org/2017/281.pdf

Persist Model During Training

Mode persistence is currently only possible via custom strategy implementations. There should be a better way to do save model checkpoints periodically, ideally in a modular way which is reusable across different strategy implementations.

Asynchronous FL using Flower

I am interested in doing async FL using Flower. However, no async strategy is provided by Flower.
The Flower paper indicates that to change another strategy, we just need to implement a new Strategy. However, I think server.py is intrinsically synchronous, and not suitable for asynchronous strategies. In other words, to do asynchronous training, we need to change server.py.

Consider the code block in server.py:fit (I just show relevant lines):

def fit(self, num_rounds: int) -> History:
    # ...

    for current_round in range(1, num_rounds + 1):
		# Train model and replace previous global model
		weights_prime = self.fit_round(rnd=current_round)
		if weights_prime is not None:
			self.weights = weights_prime

		# ...
        
def fit_round(self, rnd: int) -> Optional[Weights]:
    # ...
    results, failures = fit_clients(client_instructions)
    return self.strategy.on_aggregate_fit(rnd, results, failures)

def fit_clients(client_instructions):
    """Refine weights concurrently on all selected clients."""
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(fit_client, c, ins) for c, ins in client_instructions
        ]
        concurrent.futures.wait(futures)
    results: List[Tuple[ClientProxy, FitRes]] = []
    failures: List[BaseException] = []
    for future in futures:
        failure = future.exception()
        # ...
    return results, failures

fit_round doesn't update model until fit_clients collects all results/failures. However, in async FL, the model should be updated whenever the server receives a computation result from a client (Reference: Asynchronous Federated Optimization by Xie et al.). So we need to change server.py to do async FL.

Am I missing something? Is there a way to do async FL without changing server.py but only implement a new Strategy? All helps will be appreciated.

Not able to run example in flwr_example/quickstart_pytorch

I cannot run

$ ./src/py/flwr_example/quickstart_pytorch/run-server.sh
/usr/bin/python3: Error while finding module specification for 'flwr_example.quickstart_pytorch.server' (ModuleNotFoundError: No module named 'flwr_example.quickstart_pytorch')

But I can run the other example

$ ./src/py/flwr_example/pytorch/run-server.sh

Saving Global Model Parameters

Hi,

I am currently trying out the flower framework under pytorch.
I am very surprised how well it works.
One thing is still unclear to me, after the federated-learning process is over, i would like to save the new global model parameters on the clients, after the server distribute them to all clients.
How is that possible, or where to implement them, if not already done?

And why is min_fit_clients and min_eval_clients in fedavg.py set to 2 and not 1, is there a special reason?

Greetings

Patrick

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.