owkin / flamby Goto Github PK

Cross-silo Federated Learning playground in Python. Discover 7 real-world federated datasets to test your new FL strategies and try to beat the leaderboard.

Home Page: https://owkin.github.io/FLamby/

License: MIT License

Python 94.77% Makefile 0.08% Jupyter Notebook 5.15%

dataset deep-learning differential-privacy federated-learning healthcare machine-learning python

flamby's Introduction

FLamby

⚠️ Please checkout FLamby's 0.1.0 recent release, which introduces changes to the Fed-Camelyon16 benchmark as well as fixes some reproducibility issue across datasets ⚠️

Overview
Dataset suite
Installation
Usage
Contributing
FAQ
Team
Acknowledgements

Overview

➡️The API doc is available here⬅️

FLamby is a benchmark for cross-silo Federated Learning with natural partitioning, currently focused in healthcare applications. It spans multiple data modalities and should allow easy interfacing with most Federated Learning frameworks (including Fed-BioMed, FedML, Substra...). It contains implementations of different standard federated learning strategies. A companion paper describing it was published at NeurIPS 2022 in the Datasets & Benchmarks track.

The FLamby package contains:

Data loaders that automatically handle data preprocessing and partitions of distributed datasets.
Evaluation functions to evaluate trained models on the different tracks as defined in the companion paper.
Benchmark code using the utilities below to obtain the performances of baselines using different strategies.

It does not contain datasets, which have to be downloaded separately (see the section below).

FLamby was tested on Ubuntu and MacOS environment. If you are facing any problems installing or executing FLamby code please help us improve it by filing an issue on FLamby github page ensuring to explain it in detail.

Dataset suite

FLamby is a dataset suite instead of a repository. We provide code to easily access existing datasets stored in other repositories. In particular, we do not distribute datasets in this repository, and we do not own copyrights on any of the datasets.

The use of any of the datasets included in FLamby requires accepting its corresponding license on the original website. We refer to each dataset's README for more details.

For any problem or question with respect to any license related matters, please open a github issue on this repository.

Installation

We recommend using anaconda and pip. You can install anaconda by downloading and executing appropriate installers from the Anaconda website, pip often comes included with python otherwise check the following instructions. We support all Python version starting from 3.8.

You may need make for simplification. The following command will install all packages used by all datasets within FLamby. If you already know you will only need a fraction of the datasets inside the suite you can do a partial installation and update it along the way using the options described below. Create and launch the environment using:

git clone https://github.com/owkin/FLamby.git
cd FLamby
make install
conda activate flamby

To limit the number of installed packages you can use the enable argument to specify which dataset(s) you want to build required dependencies for and if you need to execute the tests (tests) and build the documentation (docs):

git clone https://github.com/owkin/FLamby.git
cd FLamby
make enable=option_name install
conda activate flamby

where option_name can be one of the following: cam16, heart, isic2019, ixi, kits19, lidc, tcga, docs, tests

if you want to use more than one option you can do it using comma (WARNING: there should be no space after ,), eg:

git clone https://github.com/owkin/FLamby.git
cd FLamby
make enable=cam16,kits19,tests install
conda activate flamby

Be careful, each command tries to create a conda environment named flamby therefore make install will fail if executed numerous times as the flamby environment will already exist. Use make update as explained in the next section if you decide to use more datasets than intended originally.

Update environment

Use the following command if new dependencies have been added, and you want to update the environment for additional datasets:

make update

or you can use enable option:

make enable=cam16 update

In case you don't have the `make` command (e.g. Windows users)

You can install the environment by running:

git clone https://github.com/owkin/FLamby.git
cd FLamby
conda env create -f environment.yml
conda activate flamby
pip install -e .[all_extra]

or if you wish to install the environment for only one or more datasets, tests or documentation:

git clone https://github.com/owkin/FLamby.git
cd FLamby
conda env create -f environment.yml
conda activate flamby
pip install -e .[option_name]

where option_name can be one of the following: cam16, heart, isic2019, ixi, kits19, lidc, tcga, docs, tests. If you want to use more than one option you can do it using comma (,) (no space after comma), eg:

pip install -e .[cam16,ixi]

Accepting data licensing

Then proceed to read and accept the different licenses and download the data from all the datasets you are interested in by following the instructions provided in each folder:

Quickstart

Follow the quickstart section to learn how to get started with FLamby.

Reproduce benchmark and figures from the companion article

Benchmarks

The results are stored in flamby/results in corresponding subfolders results_benchmark_fed_dataset for each dataset. These results can be plotted using:

python plot_results.py

which produces the plot at the end of the main article.

In order to re-run each of the benchmark on your machine, first download the dataset you are interested in and then run the following command replacing config_dataset.json by one of the listed config files (config_camelyon16.json, config_heart_disease.json, config_isic2019.json, config_ixi.json, config_kits19.json, config_lidc_idri.json, config_tcga_brca.json):

cd flamby/benchmarks
python fed_benchmark.py --seed 42 -cfp ../config_dataset.json
python fed_benchmark.py --seed 43 -cfp ../config_dataset.json
python fed_benchmark.py --seed 44 -cfp ../config_dataset.json
python fed_benchmark.py --seed 45 -cfp ../config_dataset.json
python fed_benchmark.py --seed 46 -cfp ../config_dataset.json

We have observed that results vary from machine to machine and are sensitive to GPU randomness. However you should be able to reproduce the results up to some variance and results on the same machine should be perfecty reproducible. Please open an issue if it is not the case. The script extract_config.py allows to go from a results file to a config.py. See the quickstart section to change parameters.

Containerized execution

A good step towards float-perfect reproducibility in your future benchmarks is to use docker. We give a base docker image and examples containing dataset download and benchmarking. For Fed-Heart-Disease, cd to the flamby dockers folder, replace myusername and mypassword with your git credentials (OAuth token) in the command below and run:

docker build -t flamby-heart -f Dockerfile.base --build-arg DATASET_PREFIX="heart" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
docker build -t flamby-heart-benchmark -f Dockerfile.heart .
docker run -it flamby-heart-benchmark

If you are convinced you will use many datasets with docker, build the base image using all_extra option for flamby's install, you will be able to reuse it for all datasets with multi-stage build:

docker build -t flamby-all -f Dockerfile.base --build-arg DATASET_PREFIX="all_extra" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
# modify Dockerfile.* line 1 to FROM flamby-all by replacing * with the dataset name of the dataset you are interested in
# Then run the following command replacing * similarly
#docker build -t flamby-* -f Dockerfile.* .
#docker run -it flamby-*-benchmark

Checkout Dockerfile.tcga. Similar dockerfiles can be theoretically easily built for the other datasets as well by replicating instructions found in each dataset folder following the model of Dockerfile.heart. Note that for bigger datasets execution can be prohibitively slow and docker can run out of time/memory.

Using FLamby with FL-frameworks

FLamby can be easily adapted to different frameworks as the pytorch abstractions are quite flexible. We give an example of interfacing with Fed-BioMed here, another one with FedML, here and last one with Substra there. All major FL-frameworks should be compatible with FLamby modulo some glue code. If you have a working example of using FLamby with another FL-framework please open a PR.

Heterogeneity plots

Most plots from the article can be reproduced using the following commands after having downloaded the corresponding datasets:

Fed-TCGA-BRCA

cd flamby/datasets/fed_tcga_brca
python plot_kms.py

Fed-LIDC-IDRI

cd flamby/datasets/fed_lidc_idri
python lidc_heterogeneity_plot.py

Fed-ISIC2019

In order to exactly reproduce the plot in the article, one needs to first deactivate color constancy normalization when preprocessing the dataset (change cc to False in resize_images.py) when following download and preprocessing instructions here. Hence one might have to download the dataset a second time, if it was already downloaded, and therefore to potentially update dataset_location.yaml files accordingly.

cd flamby/datasets/fed_isic2019
python heterogeneity_pic.py

Fed-IXITiny

cd flamby/datasets/fed_ixi
python ixi_plotting.py

Fed-KITS2019

cd flamby/datasets/fed_kits19/dataset_creation_scripts
python kits19_heterogenity_plot.py

Fed-Heart-Disease

cd flamby/datasets/fed_heart_disease
python heterogeneity_plot.py

Fed-Camelyon16

First concatenate as many 224x224 image patches extracted from regions on the slides containing matter from Hospital 0 and Hospital 1 (see what is done in the tiling script to collect image patches) as can be fit in the RAM. Then compute both histograms per-color-channel using 256 equally sized bins with the np.histogram function with density=True. Then save the results respectively as: histogram_0.npy, histogram_1.npy and bins_0.npy

cp -t flamby/datasets/fed_camelyon16 histograms_{0, 1}.npy bins_0.npy
cd flamby/datasets/fed_camelyon16
python plot_camelyon16_histogram.py

Deploy documentations

We use sphinx to create FLamby's documentation. In order to build the doc locally, activate the environment then:

cd docs
make clean
make html

This will generate html pages in the folder _builds/html that can be accessed in your browser:

open _build/html/index.html

Contributing

Extending FLamby

FLamby is a living project and contributions by the FL community are welcome.

If you would like to add another cross-silo dataset with natural splits, please fork the repository and do a Pull-Request following the guidelines described below.

Similarly, you can propose pull requests introducing novel training algorithms or models.

Guidelines

After installing the package in dev mode (pip install -e .[all_extra]) You should also initialize pre-commit by running:

pre-commit install

The pre-commit tool will automatically run black and isort and check flake8 compatibility. Which will format the code automatically making the code more homogeneous and helping catching typos and errors.

Looking and or commenting the open issues is a good way to start. Once you have found a way to contribute the next steps are:

Following the installation instructions but using the -e option when pip installing
Installing pre-commit
Creating a new branch following the convention name_contributor/short_explicit_name-wpi: git checkout -b name_contributor/short_explicit_name-wpi
Potentially pushing the branch to origin with : git push origin name_contributor/short_explicit_name-wpi
Working on the branch locally by making commits frequently: git commit -m "explicit description of the commit's content"
Once the branch is ready or after considering you have made significant progresses opening a Pull Request using Github interface, selecting your branch as a source and the target to be the main branch and creating the PR in draft mode after having made a detailed description of the content of the PR and potentially linking to related issues. Rebasing the branch onto main by doing git fetch origin and git rebase origin/main, solving potential conflicts adding the resolved files git add myfile.py then continuing with git rebase --continue until the rebase is complete. Then pushing the branch to origin with git push origin --force-with-lease.
Waiting for reviews then commiting and pushing changes to comply with the reviewer's requests
Once the PR is approved click on the arrow on the right of the merge button to select rebase and click on it

FAQ

How can I do a clean slate?

To clean the environment you must execute (after being inside the FLamby folder cd FLamby/):

conda deactivate
make clean

I get an error when installing Flamby

error: [Errno 2] No such file or directory: 'pip'

Try running:

conda deactivate
make clean
pip3 install --upgrade pip

and try running your make installation option again.

I am installing Flamby on a machine equipped with macOS and an intel processor

In that case, you should use

make install-mac

instead of the standard installation. If you have already installed the flamby environment, just run

conda deactivate
make clean

before running the install-mac installation again. This is to avoid the following error, which will appear when running scripts.

error : OMP: Error #15

I or someone else already downloaded a dataset using another copy of the flamby repository, my copy of flamby cannot find it and I don't want to download it again, what can I do ?

There are two options. The safest one is to cd to the flamby directory and run:

python create_dataset_config.py --dataset-name fed_camelyon16 OR fed_heart_disease OR ... --path /path/where/the/dataset/is/located

This will create the required dataset_location.yaml file in your copy of the repository allowing FLamby to find it.

One can also directly pass the data_path argument when instantiating the dataset but this is not recommended.

from flamby.datasets.fed_heart_disease import FedHeartDisease
center0 = FedHeartDisease(center=0, train=True, data_path="/path/where/the/dataset/is/located")

Collaborative work on FLamby: I am working with FLamby on a server with other users, how can we share the datasets efficiently ?

The basic answer is to use the answer just above to recreate the config file in every copy of the repository.

It can possibly become more seamless in the future if we introduce checks for environment variables in FLamby, which would allow to setup a general server-wise config so that all users of the server have access to all needed paths. In the meantime one can fill/comment the following bash script after downloading the dataset and share it with all users of the server:

python create_dataset_config.py --dataset-name fed_camelyon16 --path TOFILL
python create_dataset_config.py --dataset-name fed_heart_disease --path TOFILL
python create_dataset_config.py --dataset-name fed_lidc_idri --path TOFILL
python create_dataset_config.py --dataset-name fed_kits19 --path TOFILL
python create_dataset_config.py --dataset-name fed_isic2019 --path TOFILL
python create_dataset_config.py --dataset-name fed_ixi --path TOFILL

Which allows users to set all necessary paths in their local copies.

Can I run clients in different threads with FLamby? How does it run under the hood?

FLamby is a lightweight and simple solution, designed to allow researchers to quickly use cleaned datasets with a standard API. As a consequence, the benchmark code performing the FL simulation is minimalistic. All clients run sequentially in the same python environment, without multithreading. Datasets are assigned to clients as different python objects.

Does FLamby support GPU acceleration?

FLamby supports GPU acceleration thanks to the underlying deep learning backend (pytorch for now).

Team

This repository was made possible thanks to numerous contributors. We list them in the order of the companion article, following the CREDIT framework: Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Régis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva, Maria Telenczuk, Shadi Albarqouni, Salman Avestimehr, Aurélien Bellet, Aymeric Dieuleveut, Martin Jaggi, Sai Praneeth Karimireddy, Marco Lorenzi, Giovanni Neglia, Marc Tommasi, Mathieu Andreux.

Acknowledgements

FLamby was made possible thanks to the support of the following institutions:

Citing FLamby

@inproceedings{NEURIPS2022_232eee8e,
 author = {Ogier du Terrail, Jean and Ayed, Samy-Safwan and Cyffers, Edwige and Grimberg, Felix and He, Chaoyang and Loeb, Regis and Mangold, Paul and Marchand, Tanguy and Marfoq, Othmane and Mushtaq, Erum and Muzellec, Boris and Philippenko, Constantin and Silva, Santiago and Tele\'{n}czuk, Maria and Albarqouni, Shadi and Avestimehr, Salman and Bellet, Aur\'{e}lien and Dieuleveut, Aymeric and Jaggi, Martin and Karimireddy, Sai Praneeth and Lorenzi, Marco and Neglia, Giovanni and Tommasi, Marc and Andreux, Mathieu},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
 pages = {5315--5334},
 publisher = {Curran Associates, Inc.},
 title = {FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings},
 url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/232eee8ef411a0a316efa298d7be3c2b-Paper-Datasets_and_Benchmarks.pdf},
 volume = {35},
 year = {2022}
}

flamby's People

Contributors

Stargazers

Watchers

flamby's Issues

Factorize calc_aggregated_delta_weights across fed_avg and fed_opt

calc_aggregated_delta_weights should be implemented in fed_avg and used there and in fed_opt.

Synthetic data?

Synthetic datasets could extend the range of available datasets, and allow generation of controllable datasets, that could be used by people in early research phases. This was done e.g. in LEAF (see the paper, appendix A), but I think their generation process could be improved, possibly using the procedure described there: https://arxiv.org/pdf/1909.06335.pdf.
Additionally, if no synthetic data is present in FLamby, the end user would need to import both FLamby and a concurrent like LEAF. I think this could potentially stop some people from using FLamby.

On the other hand, FLamby aims at being a repository of real (and thus non-controllable and heterogeneous) datasets. Proposing synthetic datasets could make it harder for us to differentiate from other initiatives so we would have to be very careful with this.

I am thus a bit puzzled on this point. Do you think synthetic datasets should be included in FLamby?

Add dummydataset for testing purposes returning random batches

Currently tests are implemented using existing datasets that need for the most part to be downloaded. We should have a class:
FedDummyDataset returning random batches.

All README.md of datasets do not mention the need to cd inside dataset_creation_scripts before download

Self-explanatory.

should have different log path per strategy

For the moment, we save the logs for all strategies under the name fed-avg, because of the way the log writer is initialized in _Model. (Line 94 in strategies\utils.py). We should maybe pass the log save path during the initialization of the _Model, then have a different name for each strategy

Add proper testing for AdaGrad, Yogi and FedAvg based on FedAdam testing

It would be nice if one client only testing could be used with allclose statements to check perfect equality between federated versions and their torch.optim counterparts.
This is done for now only for FedAdam in tests/strategies/test_fed_adam.py

Improve all README.md with toctrees such as in LIDC-IDRI

Toctrees are good to give an overview on what the data is gonna look lie and are very helpful each dataset should have one. Example: README.md of LIDC-IDRI dataset directory

Error when downloading LIDC-IDRI: unable to convert some DICOM files

Issue description

9 (out of 1035) DICOM folders are not converted to niftis.

What's the expected result?

All DICOMs converted to nifti files.

Steps to reproduce the issue

cd to flamby/datasets/fed_lidci_idri/dataset_creation_scripts
run python download_ct_scans.py -o ~/LIDC-dataset

What's the actual result?

Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.110499927630654433643791451680, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/187/161.xml
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.526570782606728516388531252230, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/187/259.xml
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.329334252028672866365623335798, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/189/138.xml
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.410251741986998833890312367579, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/186/049.xml
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.103115201714075993579787468219, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/186/007.xml
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.245181799370098278918756923992, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/186/283.xml
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.156990013635454707781600846659, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/185/238.xml
Indexing issue at file /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/188/212.xml, session 0
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.220041426531925632952954881401, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/188/126.xml
Issue at dicomdir /hdd3/data/FLamby/TESTS/LIDC-TEST/1.3.6.1.4.1.14519.5.2.1.6279.6001.331079701650130000691651987949, xml /hdd3/data/FLamby/TESTS/LIDC-TEST/LIDC-XML-only/tcia-lidc-xml/188/017.xml
1026/1035 DICOMs folders successfully converted.

Comments

Those error/warnings are raised by the convert_to_niftis method which can be found in dataset_creation_scripts/process_raw.py.

It is unclear whether this bug is caused by the code, or by possibly corrupt data. Some CT scans were excluded from the LUNA16 challenge, which tends to corroborate the latter alternative.

I will investigate this if I find some time.

Add tensorboard support for ISIC-2019

Make benchmarks more homogeneous across datasets.

Making most relative imports absolute

On any __init__.py files and others files there shouldn't be imports starting from a dot. All imports should be done from flamby.

Remove temporary files created by MNIST download in tests

As @omarfoq noticed when we test strategies against MNIST in the tests folder we never clean up the MNIST files that were downloaded. We should add a cleanup function to the test.

LIDC-IDRI downloading stalls

download_ct_scans.py stalls before the dataset is fully downloaded, without exiting or throwing an error.

Could this be a TCIAClient issue?

Note that this is not too much of a problem as running download_ct_scans.py again resumes the download from where it stopped.

Edit : this requires manually deleting the zip files that were not fully downloaded

Benchmarking Track's WP1: How do we measure performances ?

Proposal:

Test sets must be static and separated out beforehand and must include multiple centers (>1)
Test centers are the same as train centers (in order to move away from Out of Domain generalization Benchmarks)
Test sets use the competition split whenever it is applicable or easy
For each task we report the value returned by its implemented Metric for each center and for the pooled test (gathering artificially all test datasets in one), in order to highlight the differences between local testing and pooled testing, Metric is a function producing a scalar defined in each task, the function reports min, max and average across tests as well.
we give users easy ways to build 5 repeated 5-fold cross-validation splits for each dataset but do not enforce the use of this specific validation process

Add a script to create a config file from scratch by providing the path towards the fully preprocessed dataset and the dataset_name

Collaborative work on a single server might require a user to access the datasets already created by another user.
The second user knowing the path to the fully preprocessed dataset should be able to create a config file easily in his repository so that he can access it.
This script should be found easily and used for any dataset:

python create_config_from_path.py -o mypath -d fed_isic2019

Do we consider epoch or stochastic gradient step as a unit for local steps?

In the current implementation num_updates (in strategies\fed_avg.py) corresponds to the number of stochastic gradient updates at each round. This is correct as most known convergence results are provided when num_updates is constant across clients, however I have previously seen papers, including the first FL papers, considering a num_epochs as unit for local steps. Do we want to keep the current definition for num_updates, or do we want to give the possibility to choose between the two options?

How do we deal with Federated Cross-validation ?

In order for the users not to overfit the test when fiddling with hyperparameters it is important to use cross-validation (5 repeated 5 folds cross-validation for instance).
There are two easy ways to provide cross-validation capabilities to users:

Define static/hardcoded cross-val splits in text files and providing an example script to use them in conjunction with FedDatasets
Provide the users with a seedable utils to do center-stratified cross-validations dynamically from the pooled data.

Add Federated Datasets - IXI Dataset

Requirement

As per done by CAMELYON dataset, add a federated class that allows center indexation.

Expected behaviour

from flamby.datasets.fed_ixi import FedT1ImagesIXIDataset

center1 = FedT1ImagesIXIDataset(center=0)
# Also allow string indexation
center1 = FedT1ImagesIXIDataset(center='HH')  # Same behaviour as above

Extend to the other existing dataset classes.

Creation of `dataset_creation_scripts` - IXI Dataset

Create script for data downloading only.

Requirement

A download script called download.py must be created in flamby/datasets/fed_ixi/dataset_creation_scripts
Script must be executable (must have execution permission)
Expected usage:
- $ ./download.py --output-folder [datset-folder] --debug
- Where:
  - output-folder is the dataset folder
  - --debug a flag for fast testing. For now it should raise a NotImplementedError

Precompute tiling for Camelyon16

Currently the tiling script extracting ResNet features from tiles taken on the matter regions of all slides of Camelyon16 takes way too much time to be usable in practice.
One should cache the coordinates of the extraction once in a csv so that it is possible to speed-up the extraction greatly (x10-x50 probably is possible).

Simplify Loss for Camelyon16

The current code is very verbose and could be simplified as we do not use a custom loss but one of pytorch predefined losses.

Fix line wrapping on error messages

Black forces the developper to break lines that are too long even in print statements. This makes the error messages span multiple lines with multiple unwanted line breaks. This is the case for the check_dataset_config function for instance.

Potential Bug in Fed_heart_disease

When running the following lines:

from flamby.datasets.fed_heart_disease import Baseline, FedHeartDisease, BaselineLoss, LR, NUM_CLIENTS, metric
from torch.utils.data import DataLoader as dl

training_dls = [dl(FedHeartDisease(center=i, train=True, pooled=False), shuffle=True, batch_size=32, num_workers=0) for i in range(NUM_CLIENTS)]
test_dls = [dl(FedHeartDisease(center=i, train=False, pooled=False), shuffle=False, batch_size=32, num_workers=0) for i in range(NUM_CLIENTS)]

from flamby.strategies import FedProx

import torch

m = Baseline()

from flamby.utils import evaluate_model_on_tests
print(evaluate_model_on_tests(m, test_dls, metric))

I get {'client_test_0': 0.5, 'client_test_1': nan, 'client_test_2': nan, 'client_test_3': 0.5}with 2 nans. I guess it might be because center 1 and 2 have only one class.
@pmangold could you see if you have properly stratified train test split on the class per client and further investigate this issue ?

Issue resizing ISIC2019

I downloaded ISIC2019 running python3 dataset_creation_scripts/download_isic.py --output-folder DATASETS/ISIC_2019/. It sounds good as I obtain the following final message:

Datacenters
BCN_nan              12413
HAM_vidir_molemax     3954
HAM_vidir_modern      3363
HAM_rosendahl         2259
MSK4nan                819
HAM_vienna_dias        439
Name: dataset, dtype: int64
Number of lines in Metadata 23247
Number of lines in GroundTruth 23247
Number of lines in MetadataFL 23247
Number of images 23247
Download OK

Next I run python dataset_creation_scripts/resize_images.py.

But I got the following error message caused by image n°22783:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "dataset_creation_scripts/resize_images.py", line 76, in <module>
    Parallel(n_jobs=32)(
  File "/home/constantin/anaconda3/envs/flamby/lib/python3.8/site-packages/joblib/parallel.py", line 1056, in __call__
    self.retrieve()
  File "/home/constantin/anaconda3/envs/flamby/lib/python3.8/site-packages/joblib/parallel.py", line 935, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/constantin/anaconda3/envs/flamby/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/constantin/anaconda3/envs/flamby/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/home/constantin/anaconda3/envs/flamby/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
OSError: image file is truncated (19 bytes not processed)

Is this error due to a download error on my side that have not been raised and thus, that I could have missed? Or is it due to a problem either in the resize script, either in the dowloading script?

Potential bug in FedTcgaBrca

@regloeb when trying to run a strategy on FedTcgaBrca I run into:

File ~/Desktop/FLamby/flamby/strategies/fed_avg.py:160, in FedAvg.run(self)
    156 """This method performs self.nrounds rounds of averaging
    157 and returns the list of models.
    158 """
    159 for _ in tqdm(range(self.nrounds)):
--> 160     self.perform_round()
    161 return [m.model for m in self.models_list]

File ~/Desktop/FLamby/flamby/strategies/fed_avg.py:118, in FedAvg.perform_round(self)
    113 for _model, dataloader_with_memory, size in zip(
    114     self.models_list, self.training_dataloaders_with_memory, self.training_sizes
    115 ):
    116     # Local Optimization
    117     _local_previous_state = _model._get_current_params()
--> 118     self._local_optimization(_model, dataloader_with_memory)
    119     _local_next_state = _model._get_current_params()
    121     # Recovering updates

File ~/Desktop/FLamby/flamby/strategies/fed_prox.py:97, in FedProx._local_optimization(self, _model, dataloader_with_memory)
     86 def _local_optimization(self, _model: _Model, dataloader_with_memory):
     87     """Carry out the local optimization step.
     88
     89     Parameters
   (...)
     95         method.
     96     """
---> 97     _model._prox_local_train(dataloader_with_memory, self.num_updates, self.mu)

File ~/Desktop/FLamby/flamby/strategies/utils.py:192, in _Model._prox_local_train(self, dataloader_with_memory, num_updates, mu)
    190 # Compute prediction and loss
    191 _pred = self.model(X)
--> 192 _prox_loss = self._loss(_pred, y)
    193 # We preserve the true loss before adding the proximal term
    194 # and doing the backward step on the sum.
    195 _loss = _prox_loss.detach()

File /Users/jeanduterrail/anaconda3/envs/flamby/lib/python3.8/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
   1106 # If we don't have any hooks, we want to skip the rest of the logic in
   1107 # this function, and just call forward.
   1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110     return forward_call(*input, **kwargs)
   1111 # Do not call functions when jit is used
   1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Desktop/FLamby/flamby/datasets/fed_tcga_brca/loss.py:24, in BaselineLoss.forward(self, scores, truth)
     22 def forward(self, scores, truth):
     23     # The Cox loss calc expects events to be reverse sorted in time
---> 24     a = torch.stack((torch.squeeze(scores), truth[:, 0], truth[:, 1]), dim=1)
     25     a = torch.stack(sorted(a, key=lambda a: -a[2]))
     26     scores = a[:, 0]

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Can you investigate ?

[ecosystem] For wider usability, can we integrate FLamby to the FedML framework (https://FedML.ai)?

Our Github:

https://github.com/fedml-AI/FedML

Bug in Fed_tcga_brca

@regloeb as we discussed during the FedAdam PR tcga_brca test sets are not stratified in terms of censorships so there is one center with no admissible pairs. You have to redo the splits.

RawDataset shouldn't be filtered by training subsets

On ISIC and TCGA-BRCA the rawdataset uses a train argument to be filtered by training subsets. The raw dataset class should have access to every paths.
@regloeb can you do take care of this PR ?

Improving and testing sampling bricks from LIDC-IDRI

The sampling bricks of LIDC use complex logic and could be tested properly on some simple examples. In addition the general flow of the code could be improved (less ambiguous naming and comments mainly).

[Newcomers] Guide on adding a new strategy

One should add (if it does not exist yet) a folder called strategies:
FLamby/flamby/strategies
Inside this folder you should have a file called strategy_name.py (aka fed_avg.py/fed_yogi.py, scaffold.y).
This file should contain a class StrategyName with the following methods and init:

import torch
import copy
from tqdm import tqdm



class StrategyName():
    def __init__(self, training_dataloaders: list[torch.dataset], model: torch.nn.Module, loss: torch.nn.modules.loss._Loss, learning_rate: float, nrounds: int, additional_strategy_specific_parameters):
        self.training_dataloaders = training_dataloaders
        self.models_list = [copy.deepcopy(model) for _ in range(len(training_dataloaders))]
        self.loss = loss
        self.lr = learning_rate
        self.nrounds = nrounds
    def perform_round(self):
        # do stuff and update models
    def run(self):
        for _ in tqdm(range(self.nrounds)):
            self.perform_round()
        return self.models_list[0]

What is important are the signature of the methods and the attributes names. In order to monitor bits exchanged the clients outputs at each round should be clearly visible so that we can do PRs to add bits monitoring easily.
This class should be tested on one of the datasets (either ISIC or Camelyon16 as LIDC is a bit computation intensive).
By doing something like:

from flamby.datasets.fed_isic2019 import FedIsic2019, Baseline, BaselineLoss, LR, get_nb_max_rounds, BATCH_SIZE, NUM_CLIENTS, metric
from flamby.utils import evaluate_model_on_tests
from torch.utils.data import DataLoader as dl



training_dls = [dl(FedIsic2019(train=True, center=i), batch_size=BATCH_SIZE, shuffle=True, num_workers=10) for i in range(NUM_CLIENTS)]
test_dls = [dl(FedIsic2019(train=False, center=i), batch_size=BATCH_SIZE, shuffle=False, num_workers=10) for i in range(NUM_CLIENTS)]
loss = BaselineLoss()
m = Baseline()
NUM_UPDATES = 50
nrounds = get_nb_max_rounds(NUM_UPDATES)
s = StrategyName(training_dls, m, loss, LR,  nrounds)
m = s.run()
print(evaluate_model_on_tests(m, test_dls, metric))

Sphinx documentation

One should add .rst files and setup.py options to build a sphinx documentation by parsing the docstrings of each functions.
Good documentation is a must for a dataset repository.
We will see about hosting the documentation later on.

Download method in `IXIDataset`

Add method to IXIDataset class to download the IXI dataset

Include the argument debug: bool to enable a light version download. (hosting synthetic data, in discussion)

Benchmarking Track's WP2: How do we ensure reproducibility and robust comparisons ?

Proposal:

the repository comes with seeded scripts performing benchmarks included in the article
each training strategy is evaluated with 5 different seeds we report standard deviation in the article
each task comes with a maximum number of communication rounds to use in the training strategy. This number is determined approximately by the number of epochs needed in pooled to get acceptable performances translated to a number of rounds using num_updates=50 and the current BATCH_SIZE:

MAX_NUM_ROUNDS = TRAIN_SIZE_POOLED // BATCH_SIZE * NB_EPOCHS // num_updates

as each task has a different sized models we set a task dependent hard limit on the amount of bits communicated per-client and per round e.g. 10 times the size of the gradient of the model (to allow for communicating few control variates)
no notion of privacy is enforced only good faith, we mention DP in the limitations and next avenues for our benchmark

Volunteer needed to assemble LIZARD (WP1 and 2)

LIZARD is a great cell-segmentation dataset with the metadata of the 5 centers accessible.
The license is Attribution Non Commercial-Share Alike 4.0 and it is available here.
A reasonable baseline would be a U-net, the state of the art seems to be HoverNet which has an implementation available here.
The column FileName in info.csv in lizard_labels.zip gives the information on the center of origin.

[Newcomers] Guide on adding a new pooled benchmark (WP2)

Make sure WP1 is validated by someone from owkin
Once WP1 is validated you should write the following empty file: loss.py, model.py, benchmark.py, metric.py
loss.py should implement a class called BaselineLoss from torch.nn.modules.loss._Loss naming are important
model.py should implement a class called Baseline inheriting from torch.nn.Module, Baseline should be able to forward the X batch from a dataset.
metric.y should implement a function called metric to assess the performance of the allgorithm
Everything listed below should be importable by using from flamby.datasets.fed_my_dataset import ...
benchmark.py should provide a code sufficient to run the performance and test on the test set with pooled=True meaning the script should begin and end by:

from flamby.datasets.fed_my_dataset import FedMyDataset, Baseline, Baselineloss, metric, NUM_CLIENTS, BATCH_SIZE, LR
from flamby.utils import evaluate_model_on_test
# the written code should do something like this
train_dataset = FedMyDataset(train=True, pooled=True)
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_dataset = FedMyDataset(train=False, pooled=True)
test_dataloader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)
m = Baseline()
l = BaselineLoss()
for e in epochs:
    for X, y in train_dataloader:
        y_pred = m(X)

        ...
evaluate_model_on_tests(m, [test_dataloader], metric, use_gpu=True)

Note that BATCH_SIZE, LR should be both defined in common.py and importable with from flamby.datasets.fed_my_dataset import ...

Benchmarking script for the article

In order to obtain results for the different datasets of pooled vs local vs different FL strategies. The following not tested script can be used to launch everything if we add the Optimizer class to the common.py file of all the datasets.
@regloeb and @maikia can you use this in your benchmark PR and maybe fix bugs/improve it a little ?

import flamby.strategies as strats
from flamby.utils import evaluate_model_on_tests
from torch.utils.data import DataLoader as dl
# Only two lines to change to evaluate different datasets (except for LIDC where the evaluation function is custom)
# Still some datasets might require specific augmentation strategies or collate_fn functions in the data loading part
from flamby.datasets.fed_tcga_brca import FedTcgaBrca as FedDataset
from flamby.datasets.fed_tcga_brca import NUM_CLIENTS, LR, Baseline, BaselineLoss, BATCH_SIZE, Optimizer, NUM_EPOCHS_POOLED, get_nb_max_rounds, metric
import copy

NAME_RESULTS_FILE = "results_benchmark.csv"

strategy_names = ["FedAvg", "FedProx", "Cyclic", "FedAdam", "FedYogi", "FedAdaGrad", "Scaffold"]

# One might need to iterate on the hyperparameters to some extents if performances are seriously degraded with default ones
# We can addd parameters or change them on the go, in the future an argparse could be used to make the process easier
strategy_specific_hp_dict = {}
strategy_specific_hp_dict["FedAvg"] = {}
strategy_specific_hp_dict["FedProx"] = {"learning_rate": LR / 10., "mu": 1e-10}
strategy_specific_hp_dict["Cyclic"] = {"learning_rate": LR / 100.}
strategy_specific_hp_dict["FedAdam"] = {"beta1": 0.9, "beta2": 0.999, "optimizer_class": torch.optim.SGD}
strategy_specific_hp_dict["FedYogi"] = {"optimizer_class": torch.optim.SGD}
strategy_specific_hp_dict["FedAdaGrad"] = {"learning_rate": LR * 10., "optimizer_class": torch.optim.SGD}
strategy_specific_hp_dict["Scaffold"] = {"server_learning_rate": 1., "update_rule": "II"}

columns_names = ["Test", "Method", "Metric"]
# We need to add strategy hyperparameters columns to the benchmark
hp_additional_args = []
for _, v in strategy_specific_hp_dict.items():
    for name, _ in v.items():
        hp_additional_args.append(name)
columns_names += hp_additional_args

# We instantiate all train and test dataloaders required including pooled ones

training_dls = [dl(FedDataset(center=i, train=True, pooled=False), batch_size=BATCH_SIZE, shuffle=True, num_workers=10) for i in range(NUM_CLIENTS)]
test_dls = [dl(FedDataset(center=i, train=False, pooled=False), batch_size=BATCH_SIZE, shuffle=False, num_workers=10) for i in range(NUM_CLIENTS)]
train_pooled = dl(FedDataset(train=True, pooled=True), batch_size=BATCH_SIZE, shuffle=True, num_workers=10)
test_pooled = dl(FedDataset(train=True, pooled=True), batch_size=BATCH_SIZE, shuffle=False, num_workers=10)

# We use the same initiaization for everyone in order to be fair
torch.set_random_seed(42)
global_init = Baseline()

# We check if some results are already computed
if os.path.exist(NAME_RESULTS_FILE):
    df = pd.DataFrame(NAME_RESULTS_FILE)
    # If we added additional hyperparameters we update the df
    for col_name in columns_names:
        if col_name not in df.columns:
            df[col_name] = None
    perf_lines_dicts = df.to_dict('records')

else:
    base_dict = {col_name: None for col_name in columns_names}
    df = pd.DataFrame({k: [v] for k, v in base_dict.items()})
    perf_lines_dicts = []


# Single client baseline computation
# We use the same set of parameters as found in the corresponding flamby/datasets/fed_mydataset/benchmark.py

# Pooled Baseline
# Throughout the experiments we only launch training if we do not have the results already
# Note that pooled and local baselines do not use hyperparameters

index_of_interest = df.loc[df["Method"] == "Pooled Training"].index
# an experiment is finished if there are num_clients + 1 rows
if len(index_of_interest) < (NUM_CLIENTS + 1):
    # dealing with edge case that shouldn't happen
    # If some of the rows are there but not all of them we redo the experiments
    if len(index_of_interest) > 0:
        df.drop(index_of_interest, inplace=True)
        perf_lines_dicts = df.to_dict('records')

    m = copy.deepcopy(gobal_init)
    l = BaselineLoss()
    opt = Optimizer(m.parameters(), lr=LR)
    for e in range(NB_EPOCHS_POOLED):
        for X, y in train_pooled:
            opt.zero_grad()
            y_pred = m(X)
            loss = l(y_pred, y)
            loss.backward()
            opt.step()

    perf_dict = evaluate_model_on_tests(m, test_dls, metric)
    pooled_perf_dict = evaluate_model_on_tests(m, [test_pooled], metric)
    for k, v in perf_dict.items():
        # Make sure there is no weird inplace stuff
        current_dict = copy.deepcopy(base_dict)
        current_dict["Test"] = k
        current_dict["Metric"] = v
        current_dict["Method"] = "Pooled Training"
        perf_lines_dict.append(current_dict)
    current_dict = copy.deepcopy(base_dict)
    current_dict["Test"] = "Pooled Test"
    current_dict["Metric"] = pooled_perf_dict["client_test_0"]
    current_dict["Method"] = "Pooled Training"
    perf_lines_dicts.append(current_dict)
    # We update csv and save it when the results are there
    df = pd.DataFrame.from_dict(perf_lines_dicts)
    df.to_csv(NAME_RESULTS_FILE)

# Local Baselines
for i in range(NUM_CLIENTS):
    # we only launch training if it's not finished already
    index_of_interest = df.loc[df["Method"] == f"Local {i}"].index
    # an experiment is finished if there are num_clients + 1 rows
    if len(index_of_interest) < (NUM_CLIENTS + 1):
        # dealing with edge case that shouldn't happen
        # If some of the rows are there but not all of them we redo the experiments
        if len(index_of_interest) > 0:
            df.drop(index_of_interest, inplace=True)
            perf_lines_dicts = df.to_dict('records')
        m = copy.deepcopy(gobal_init)
        l = BaselineLoss()
        opt = Optimizer(m.parameters(), lr=LR)
        for e in range(NB_EPOCHS_POOLED):
            for X, y in training_dls[i]:
                opt.zero_grad()
                y_pred = m(X)
                loss = l(y_pred, y)
                loss.backward()
                opt.step()
        perf_dict = evaluate_model_on_tests(m, test_dls, metric)
        pooled_perf_dict = evaluate_model_on_tests(m, [test_pooled], metric)
        for k, v in perf_dict.items():
            # Make sure there is no weird inplace stuff
            current_dict = copy.deepcopy(base_dict)
            current_dict["Test"] = k
            current_dict["Metric"] = v
            current_dict["Method"] = f"Local {i}"
            perf_lines_dict.append(current_dict)
        current_dict = copy.deepcopy(base_dict)
        current_dict["Test"] = "Pooled Test"
        current_dict["Metric"] = pooled_perf_dict["client_test_0"]
        current_dict["Method"] = f"Local {i}"
        perf_lines_dicts.append(current_dict)
        # We update csv and save it when the results are there
        df = pd.DataFrame.from_dict(perf_lines_dicts)
        df.to_csv(NAME_RESULTS_FILE)

# Strategies
for num_updates in [50, 100, 500]:
    for sname in strategy_names:
        # Base arguments
        args = {"training_dataloaders": training_dls, "model": m, "loss": l, "optimizer_class": Optimizer, "learning_rate": LR, "num_updates": num_updates, "nrounds": get_nb_max_rounds(num_updates)}
        strategy_specific_hp_dict = strategy_specific_hp_dict[sname]
        # Overwriting arguments with strategy specific arguments
        for k, v in strategy_specific_hp_dict.items():
            args[k] = v
        # we only launch training if it's not finished already maybe FL hyperparameters need to be tuned
        hyperparameters = {}
        for k in columns_names:
            if k in args:
                hyperparameters[k] = str(args[k])
            else:
                hyperparameters[k] = None
    
        index_of_interest = df.loc[(df["Method"] == (sname + str(num_updates))) & (df[list(hyperparameters)] == pd.Series(hyperparameters)).all(axis=1)].index
        # an experiment is finished if there are num_clients + 1 rows
        if len(index_of_interest) < (NUM_CLIENTS + 1):
            # dealing with edge case that shouldn't happen
            # If some of the rows are there but not all of them we redo the experiments
            if len(index_of_interest) > 0:
                df.drop(index_of_interest, inplace=True)
                perf_lines_dicts = df.to_dict('records')
            m = copy.deepcopy(gobal_init)
            l = BaselineLoss()
            # Base arguments
            args = {"training_dataloaders": training_dls, "model": m, "loss": l, "optimize_class": Optimizer, "learning_rate": LR, "num_updates": num_updates, "nrounds": get_nb_max_rounds(num_updates)}
            strategy_specific_hp_dict = strategy_specific_hp_dict[sname]
            # Overwriting arguments with strategy specific arguments
            for k, v in strategy_specific_hp_dict:
                args[k] = v
            # We run the FL strategy
            s = getattr(strats, sname)(**args)
            m = s.run()[0]
            perf_dict = evaluate_model_on_tests(m, test_dls, metric)
            for k, v in perf_dict.items():
                # Make sure there is no weird inplace stuff
                current_dict = copy.deepcopy(base_dict)
                current_dict["Test"] = k
                current_dict["Metric"] = v
                current_dict["Method"] = sname + str(num_updates)
                # We add the hyperparameters used
                for k2, v2 in hyperparameters.items():
                    current_dict[k2] = v2
                perf_lines_dict.append(current_dict)
            current_dict = copy.deepcopy(base_dict)
            current_dict["Test"] = "Pooled Test"
            current_dict["Metric"] = pooled_perf_dict["client_test_0"]
            current_dict["Method"] = sname + str(num_updates)
            # We add the hyperparamters used
            for k2, v2 in hyperparameters.items():
                current_dict[k2] = v2
            # We update csv and save it when the results are there
            df = pd.DataFrame.from_dict(perf_lines_dicts)
            df.to_csv(NAME_RESULTS_FILE)

Enhance datasets description in README

I think we should add more details in the README of each dataset in order to provide a better description of them. I suggest adding in the description table the following information for each dataset:

the size of the dataset (in G),
if it needs to be downloaded (TCGA-BRCA doesn't need to be downloaded),
the nature of the features: e.g. tabular OR pictures, vector dimension,
the nature of the labels: e.g. discrete OR continuous, the range of values, their physical meanings (for instance for TCGA-BRCA I can't understand what the two-dimensional labels correspond to, which hinders me from correctly applying my script analyzing heterogeneity).
the number of centers, the range of values, and their physical meaning (for instance, I don't think that these information are clear for Camelyon16, for ISIC the physical meaning is also not clear)

Furthermore, I would also add a table in the main README comparing the different datasets. This will help users to understand the purpose of each dataset in the blink of an eye.

Overall, I think it will help users to familiarize themselves with FLamby and make easier the usage of each dataset.

Error when downloading LIDC-IDRI dataset in debug mode

Issue description

Error raised when downloading LIDC-IDRI using the --debug flag.

Steps to reproduce the issue

python download_ct_scans.py -o ~/LIDC-dataset --debug

What's the expected result?

Downloaded DICOMs with no error

What's the actual result?

Converting to NiFTIs...: 100%|██████████| 10/10 [00:00<00:00, 2629.49it/s]
Traceback (most recent call last):
  File "/Users/ssilvari/PycharmProjects/FLamby/flamby/datasets/fed_lidc_idri/dataset_creation_scripts/download_ct_scans.py", line 277, in <module>
    main(args.output_folder, args.debug, args.keep_dicoms)
  File "/Users/ssilvari/PycharmProjects/FLamby/flamby/datasets/fed_lidc_idri/dataset_creation_scripts/download_ct_scans.py", line 247, in main
    LIDC_to_niftis(patientXseries, debug=debug)
  File "/Users/ssilvari/PycharmProjects/FLamby/flamby/datasets/fed_lidc_idri/dataset_creation_scripts/download_ct_scans.py", line 240, in LIDC_to_niftis
    write_value_in_config(config_file, "preprocessing_complete", True)
  File "/Users/ssilvari/PycharmProjects/FLamby/flamby/utils.py", line 174, in write_value_in_config
    raise FileNotFoundError(
FileNotFoundError: The config file doesn't exist.             Please create the config file before updating it.

RFC: Should we have different options in the setup.y to only install libraries related to the use of the datasets the user is interested in ?

Currently the setup installs everything that is needed for each of the dataset. It might be cleaner to provide the following options:

"lidc", "isic", etc.
"all"
To allow the user not to install everything. However the default should be all but we can add warnings saying that lighter installations are also possible.

Potential problem in evaluation

In the current code for Camelyon16 and Heart_Disease we use AUC_ROC as a metric to evaluate the model. We rely on sklearn.metrics.roc_auc_score, which requires the vector of predictions to sum-up to one. This is an issue because our models (see for example in Baseline in datasets/fed_camelyon16/model.py) return the logits not the probabilities; this point is confirmed by the fact that, in datasets/fed_camelyon16/loss.py for example, we use BCEWithLogitsLoss. We should either apply a sigmoid / softmax activation 1) inside the model itself (e.g., datasets/fed_camelyon16/model.py ), 2) after computing the outputs (i.e., line 49 in utils.py) or 3) inside the function metric (e.g., in datasets/fed_camelyon16/metric.py ).

I think the third option is the most suitable, because it ensure that the functions and classes in , for example, datasets/fed_camelyon16 is self consistent.

setup failure due to `hashlib`

Pip installation of hashlib fails. According to pypi, hashlib is for Python 2.4 and below. Python 2.5 and above comes with hashlib included. Maybe, hashlib should be removed from install_requires list in setup.py.

Add README - IXI DAtaset

Add README.md with the description of the dataset nature and expected class utilization. For the IXI Dataset.

How to quantify the Heterogeneity of the datasets in this repository ?

Is there a way to show by means of a metric that the datasets in this repository are more heterogeneous or that their heterogeneity is "better" than the ones introduced by synthetic post-hoc partitioning ?

Fixes in environment setup that lead to reproducibility issues

Errors raised

ImportError: No module named  nibabel
ImportError: No module named dicom_numpy
FileNotFoundError dataset_..._debug.yaml

Last error is covered in Issue #20.

Issues

using pip install flamby does not allow dynamic changes in the package.
No simple way to execute commands (e.g. through make)
Dependency errors

Define IXI dataset classes

Create centralised PyTorch DataLoaders for IXI Dataset
.

Different classes are suggested given that this dataset is multi-centric and multi-modality

IXIDataset Parent class that wraps common methods that can be inherited
T1ImagesIXIDataset for T1 MRI
T2ImagesIXIDataset for T2 MRI
PDImagesIXIDataset for Proton Density (PD) MRI
MRAImagesIXIDataset for MR Angiography (MRA)
DTIImagesIXIDataset for Diffusion MRI
MultiModalIXIDataset for multimodal data (multi-view/multi-modal modeling)
Include some Unit testing to debug

Benchmarking Track's WP4: What strategies do we benchmark ?

For now we need a minima:

FedAvg
FedProx
Scaffold
Local training/Pooled training
Cyclical

Given time we could add as well:

Second order methods (FedNewton, etc.)
FedDyn
FedDANE
ProxSkip

Compute LUNA16 score instead of DICE

The LUNA16 challenge provides a script to evaluate the detection performance on LIDC that is located here. This scripts would allow to assess if the DICE we have is reasonable or too low.

Benchmarking Track's WP3: What libraries do we use for benchmarks ?

The whole dataset API aims at being FL framework agnostic therefore we need to show examples of how to use at least 3 different FL frameworks in the repository.
In order to guide our choice of framework we need to assess:

How easy it is to use/insert in the dataset repository ? (all FL framework candidates need to come packaged, be as lightweight as possible (no need for docker containers for faster benchmarking) and have pytorch support)
Are all strategies, we want to test already implemented in that framework and easily accessible ?

Considering the small number of strategies to implement we think it might be quicker to go and implement all of them in low-level pytorch as it would not take very long.
This could be done by adding a strategy folder to the repository:

- strategies
     fedavg.py
     fedprox.py
     scaffold.py
     cyclic_training.py
     local.py
     ensemble.py

With regards to the strategy API it would take a minima a list of dataloaders, a loss, a batch-size, possibly an optimizer and some logging parameters.
If anyone has experience using other FL frameworks it would be great to benchmark different FL frameworks for each task, however a framework's result will be reported only if we have time to test it on all datasets.

Factorize GPU util

The util to either choose a GPU or purposefully choose to disable GPU use althogether should be factorized as it is in every benchmarks.

Improving performance on LIDC-IDRI

The current V-Net implementation achieves a DICE score of ~30 after 100 epochs. This seems fairly low, even though it is hard to assess due to the apparent lack of comparable works. Any suggestions on how to improve this are welcome.
Here are a few that I've thought about:

use data augmentation on patches (e.g. random rotations and shearing)
segment the lung area and apply a lung mask prior to feeding inputs to the model
restrict the dataset to the CT scans that were used in the LUNA16 challenge

[Newcomers] Guide on adding a new dataset (WP1)

All datasets used in this repository should be Open Source with permissive licence and contain only natural splits with less than 50 centers. One can also assemble it from multiple OS sources if needed.
We do not host any of the "heavy" data (larger than big text files) on this repository so users can git clone it quickly. We do not use AWS or GCP to host part of the data ourselves either
The new dataset should be located inside FLamby/flamby/datasets/fed_dataset_name
The first file to add/commit is a README.md (FLamby/flamby/datasets/fed_dataset_name/README.md) with proper attribution to the dataset's owners respecting the licence and a table describing the data and centers (see example in fed_camelyon16)
This README should also contain instructions on how to download the dataset. The download experience should be as smooth as possible and be done only in Python/bash if possible.
The user should be able to know in advance the size of the dataset and to provide a path to it on his/her machine (--output-folder) to store it. The path provided by the user should be written automatically in a config file dataset_location.yaml as is done in fed_camelyon16/dataset_creation_scripts so that after download the preprocessing and the subsequent instantiations of the dataset objects can be done without giving the path a second/third time.
The instructions for download and preprocessing should be not too long easy to follow and use mostly python/bash scripts
A dataset.py file should be written with implementations for DatasetNameRaw and FedDatasetName respecting the API described in the slides from the launch meeting (and following fed_camelyon/dataset.py)
Both datasets wrappers should use the function from flamby.utils, check_dataset_from_config and have a debug argument to download/preprocess only the first files/images (see fed_camelyon16 for an example) and create a dataset_location_debug.yaml file inside ./flamby/datasets/fed_mydataset/dataset_creation_scripts.
DatasetNameRaw should provide all necessary metadata in its attributes (paths to samples/labels and centers)
One should write a mapping to center metadata to integers from 0 to n-1 with n the number of centers
Both datasets wrappers should be importable by running:

from flamby.datasets.fed_dataset_name import DatasetNameRaw, FedDatasetName

Therefore it requires __init__.py files at the appropriate locations
13. The datasets are then instantiated by:

# for instance
d = FedDatasetName(center=0, pooled=True, train=True)

The train/test split separation should be static i.e. not seeded but hardcoded and either: using the test set from the original competition if possible or if not use a stratified split on the centers to ensure one can compute siloed metric on each test centers.
One should write a common.py file on which are defined the static variable: NUM_CLIENTS and others that are needed

owkin / flamby Goto Github PK

flamby's Introduction

FLamby

Table of Contents

Overview

Dataset suite

Installation

Update environment

In case you don't have the make command (e.g. Windows users)

Accepting data licensing

Quickstart

Reproduce benchmark and figures from the companion article

Benchmarks

Containerized execution

Using FLamby with FL-frameworks

Heterogeneity plots

Deploy documentations

Contributing

Extending FLamby

Guidelines

FAQ

How can I do a clean slate?

I get an error when installing Flamby

error: [Errno 2] No such file or directory: 'pip'

I am installing Flamby on a machine equipped with macOS and an intel processor

error : OMP: Error #15

I or someone else already downloaded a dataset using another copy of the flamby repository, my copy of flamby cannot find it and I don't want to download it again, what can I do ?

Collaborative work on FLamby: I am working with FLamby on a server with other users, how can we share the datasets efficiently ?

Can I run clients in different threads with FLamby? How does it run under the hood?

Does FLamby support GPU acceleration?

Team

Acknowledgements

Citing FLamby

flamby's People

Contributors

Stargazers

Watchers

Forkers

flamby's Issues

Requirement

Expected behaviour

Requirement

Issue description

Steps to reproduce the issue

What's the expected result?

What's the actual result?

Errors raised

Issues

Recommend Projects

Recommend Topics

Recommend Org

In case you don't have the `make` command (e.g. Windows users)