Giter Club home page Giter Club logo

Comments (9)

philipco avatar philipco commented on May 28, 2024 1

Hello,

We agree that FedAvg, FedProx and Scaffold seem to be a good set of simple baselines of interest. To our understanding, as our goal is not yet to achieve an optimized performance on the datasets, but possibly to have high level take-aways, these original methods seem best suited.

Regarding potential other baselines, maybe the question can be phrased in the following way:

a. What are the solutions that have been proposed to tackle heterogeneity in FL?
b. Which one deserves to be considered as baselines in our framework and why?

Regarding a., we think beyond the ones you suggested, the following methods are intended to tackle heterogeneity: Fednova, MIME, FedCD, FedAdams/FedAdagrad/FedYogi (Adaptive Federated Optimization). This is a preliminary list that should probably be updated.

Regarding b., we see the following potential arguments:

  1. is widely recognized as a reference for heterogeneous FL. [Could be based on #citations, or your expertise]
  2. was used as a reference in a similar paper. [e.g. Federated Learning on Non-IID Data Silos: An Experimental Study (https://arxiv.org/pdf/2102.02079.pdf) uses FedNova]
  3. substantially differs in terms of approach from FedProx (prox term) or Scaffold (control variates), or exhibits a feature that allows to somehow ``describe’’ heterogeneity [as control variates for Scaffold]
  4. Other ideas?

Cheers,
Constantin and Aymeric

from flamby.

philipco avatar philipco commented on May 28, 2024 1

To initiate the discussion, we propose below a table (that anyone can participate in editing) with some algorithms meant to tackle heterogeneous FL-algorithms.

  # citations Designed for non-i.i.d. data Major features
FedAvg https://arxiv.org/pdf/1602.05629.pdf 4504 NO 1) local update 2) weighted average
MOCHA https://proceedings.neurips.cc/paper/7029-federated-multi-task-learning 855 YES Alternating optimization of model weights and task relationship matrix
FedProx https://arxiv.org/pdf/1812.06127.pdf 807 YES 1) generalization of FedAvg 2) add a proximal term 3) restrict local update to be close to the initial (global) model
Scaffold https://arxiv.org/pdf/1910.06378.pdf 356 YES 1) correct client drift 2) control-variates
FedAdams/FedYogi/FedAdagrad https://arxiv.org/abs/2003.00295 248 YES 1) federated versions of adaptive optimizer
Cyclical Weight Transfer https://pubmed.ncbi.nlm.nih.gov/29617797/ 178 KINDOF ensures that each client is sufficiently visited
Clustered Federated Learning https://ieeexplore.ieee.org/abstract/document/9174890 178 YES Clusters silos after FL has converged
FedNova https://arxiv.org/pdf/2007.07481.pdf 140 YES 1) focus on heterogeneous number of local updates 2) FedProx and FedAvg as part. cases 3) flexibility to choose any local solver
Ditto https://proceedings.mlr.press/v139/li21h.html 62 YES Each node trains a local model at the same time as everyone jointly trains the global model. At each round, the distance from the current state of the evolving global model is used as a regularization term in the local training.
MIME https://arxiv.org/pdf/2008.03606.pdf 47 YES 1) correct client drift 2) control-variates 3) server-level optimizer state (momentum, adaptive step size) 4) for cross-device settings
FedCD https://arxiv.org/pdf/2006.09637.pdf 10 YES 1) clones and deletes models to dynamically group devices with similar data

FEEL FREE TO UPDATE THE ABOVE TABLE.

Again, the focus of the paper being the datasets, it may just be sufficient to consider FedAvg, Scaffold and FedProx.

Constantin and Aymeric

from flamby.

bellet avatar bellet commented on May 28, 2024

The provided list sounds good and natural.

In the future one could also consider personalized FL approaches? For instance based on fine-tuning/MAML (eg FedAvg+ https://arxiv.org/pdf/1909.12488.pdf) or regularization to the mean (https://arxiv.org/pdf/2010.02372.pdf) which are both closely related to simple FedAvg. There are also popular Federated MTL approaches based on pairwise regularization or cluster/mixture assumptions.

from flamby.

Grim-bot avatar Grim-bot commented on May 28, 2024

I thought about adding Ditto https://proceedings.mlr.press/v139/li21h.html (58 citations) to philipco's table above, but their experiments are mainly cross-device (100+ devices). Nevertheless, they also report good results on the Vehicle dataset (23 devices) - is that too many to be considered cross-silo ?

from flamby.

jeandut avatar jeandut commented on May 28, 2024

The provided list sounds good and natural.

In the future one could also consider personalized FL approaches? For instance based on fine-tuning/MAML (eg FedAvg+ https://arxiv.org/pdf/1909.12488.pdf) or regularization to the mean (https://arxiv.org/pdf/2010.02372.pdf) which are both closely related to simple FedAvg. There are also popular Federated MTL approaches based on pairwise regularization or cluster/mixture assumptions.

We will keep that in mind it shouldn't be very hard to modify the code in that sense. FLamby was thought of to be extensible.

I thought about adding Ditto https://proceedings.mlr.press/v139/li21h.html (58 citations) to philipco's table above, but their experiments are mainly cross-device (100+ devices). Nevertheless, they also report good results on the Vehicle dataset (23 devices) - is that too many to be considered cross-silo ?

Personally I put the threshold at 50 but it was never formalized maybe this vehicle dataset is worth looking at ? Do they use natural splits ?

from flamby.

bellet avatar bellet commented on May 28, 2024

For information on vehicle and some other datasets with natural splits but that are more 'cross-device' than 'cross-silo' in spirit, see page 20 of http://researchers.lille.inria.fr/abellet/papers/aistats20_graph_supp.pdf
The school dataset may be of interest but has 140 centers (schools)

from flamby.

Grim-bot avatar Grim-bot commented on May 28, 2024

Personally I put the threshold at 50 but it was never formalized maybe this vehicle dataset is worth looking at ? Do they use natural splits ?

  • If you put the limit at 50, then maybe we could consider Ditto as a strategy to include in the benchmark. It seems to work pretty well and they tested it on a data set that we can interpret as cross-silo. I added it to the table of strategies above and re-ordered that table by descending citation count.
  • After reading the description in the vehicle dataset paper, I believe each "node" in the data set corresponds to one physical sensor. To create this data set, they put a bunch of seismic and acoustic sensors on the ground next to a road and then drove by with various types of military vehicles. The data set is from 2004. It might not be the most interesting, well-suited, and up-to-date data set we can find.
    One positive about this dataset is that it's time series data from two types of sensors, so that would make FLamby more diverse...

from flamby.

jeandut avatar jeandut commented on May 28, 2024

@Grim-bot the vehicle dataset was already mentioned in the related works in the overleaf. We should have already enough datasets with @pmangold adding this one + @sssilvar and @AyedSamy working on IXI plus @regloeb working on TCGA-survival.

from flamby.

jeandut avatar jeandut commented on May 28, 2024

ProxSkip was accepted at ICML might be worth implementing to get some hype, but closing in the mean time.

from flamby.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.