Giter Club home page Giter Club logo

pfedme's Introduction

  • πŸ‘‹ Hi, I’m Dr. Canh T. Dinh (@CharlieDinh).
  • πŸ‘€ I’m interested in Federated Machine Learning, NLP, Computer Vision, etc, ...
  • 🌱 I obtained Phd in Privacy Machine Learning at The University of Sydney 2023.
  • 🌱 I’m Machine Learning Engineer at Canva.
  • πŸ’žοΈ I’m looking to collaborate on Federated Machine Learning research or any ML research.
  • πŸ“« How to reach me: [email protected].

pfedme's People

Contributors

charliedinh avatar darlinghang avatar joshnguyen99 avatar sshpark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pfedme's Issues

Client's train method

Hi, I have a question about the train method in the clients. Normally, in each epoch the complete dataset is trained using batches, but I have seen in your code that in each epoch, only a single batch is trained. Is this correct? Best regards.

error

pfedme Optimizer Probelem

Hello,
I want to ask:
Why is different, code and _algorithm?
That is in fedoptimizer.py line 64:
p.data = p.data - group['lr'] * (p.grad.data + group['lamda'] * (p.data - localweight.data) + group['mu']*p.data)
and Algorithm 1 line 8
ζ•θŽ·

About the Hessian Approximation

Dear authors,
I have read the two implementations on pFedme and pFedAvg.
One problem to me is that the both implementation missing the Hessian Approximation according to per-FedAvg research in the meta-update phase.
Is this critical in the per-FedAvg and pFedme settings?

Is Per-FedAvg implemented properly?

In the code, when training Per-FedAvg, there are two steps, and each step sample a batch of data and perform parameter update. But in the MAML framework, I think the first step is to obtain a fast weight, and the second step is to update the parameters based on the fast weight of the first step. So why do you update the parameters two times? Are the fundamental differences between Per-FedAvg and FedAvg lie in that the former performs two steps update and the latter performs a one-step update? Is this fair for FedAvg?

Some mistakes in generating niid mnist data

Thanks to the author for modifying some old errors in the file two months ago, but there are still some errors that need attention.

屏幕ζˆͺε›Ύ(3)_LI

  1. Line 39: "l = (user * NUM_USERS + j) % 10" should be changed to "l = (user * NUM_LABELS + j) % 10". The former will cause data allocation errors, and all users are assigned data with the same label.
  2. Line 81: The code to calculate "l" should be same in Line 39 and Line 81.
  3. Line 86: "if idx[l] + num_samples < len(mnist_data[l]):" the "<" should be modified to "<=". The former will cause the last part of the data set of each label to not be correctly assigned to the user. (This problem occurs because the author modified an old error on line 87, which has changed "mnist_data[l][idx[l]:num_samples]" to "mnist_data[l][idx[l]:idx[l]+num_samples]". οΌ‰

Some questions about your peper and your code

Hi , I am very interested in your work. I have few questions.

  1. Does the model expressive power have great influence to pFedMe.
    You propose a personalized FL method so that clients with differen data statistics can train personalized models. pFedMe send the same parameter to each selected clients at the beginning of each glob_iteration, and each client begins to train their local model from this w. If the model has strong expressive power to fit most training data across many clients, will these clients still train their personalized models? Will pFedMe still outperform FedAvg?
  2. The code
    You create variables local_model, persionalized_model, persionalized_model_bar for personalized FL, but it seems that you have never used persionalized_model and persionalized_model_bar is just a copy of local_model. Is there anything I missed?

A question in PerAvg algorithm

Thank you for your code. I have a question about the code of PerAvg algorithm.
When utilizing evaluate_one_step (in serverperavg.py) function to evaluate the performance of PerAvg, the function first executes
for c in self.users: c.train_one_step()
to train personalized models for one step. However, in the function train_one_step, it seems that it utilizes testing data to update the personalized model. Is it right?
Source code:
```
def train_one_step(self):
self.model.train()
#step 1
X, y = self.get_next_testbatch()
self.optimizer.zero_grad()
output = self.model(X)
loss = self.loss(output, y)
loss.backward()
self.optimizer.step()
#step 2
X, y = self.get_next
test_batch()
self.optimizer.zero_grad()
output = self.model(X)
loss = self.loss(output, y)
loss.backward()
self.optimizer.step(beta=self.beta)

Looking forward to your reply! Thank you!

Something maybe wrong in data/mnist/generate_niid_20users.py

I think the code "l = (user * NUM_USERS + j) % 10" in line 39 should be modified to "l = (user * NUM_LABELS + j) % 10".
Using "user * NUM_USERS" will cause all users to share a data set with the same label, because user * NUM_USERS% 10 = 0.

Cifar-10 running error

When I use Cifar-10 and Netcifar model, it shows me the following error:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [16, 1, 2, 2], but got 3-dimensional input of size [1, 28, 28] instead.

Unable to generate non-iid MNIST Data

Describe the bug
While generation of the non-iid MNIST data, generate_niid_20users.py runs into an error

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'data/Mnist'
  2. Run python generate_niid_20users.py

Trace

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 38.74it/s]

Numb samples of each label:
 [6903, 7877, 6990, 7141, 6824, 6313, 6876, 7293, 6825, 6958]
idx 0        False
1        False
2        False
3        False
4         True
         ...  
69995    False
69996    False
69997    False
69998    False
69999    False
Name: class, Length: 70000, dtype: bool
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 20/20 [00:00<00:00, 135300.13it/s]
--------------
[0 1 2 3 4 5 6 7 8 9] [4 4 4 4 4 4 4 4 4 4]
6903
[2441, 1127, 1575, 1760]
7877
[2671, 1946, 1367, 1893]
6990
[2358, 1070, 841, 2721]
7141
[2630, 2202, 715, 1594]
6824
[1721, 1934, 1101, 2068]
6313
[2169, 1080, 1102, 1962]
6876
[2043, 1364, 1255, 2214]
7293
[2211, 2518, 598, 1966]
6825
[1506, 2480, 574, 2265]
6958
[1710, 1878, 1208, 2162]
--------------
[[2441, 1127, 1575, 1760], [2671, 1946, 1367, 1893], [2358, 1070, 841, 2721], [2630, 2202, 715, 1594], [1721, 1934, 1101, 2068], [2169, 1080, 1102, 1962], [2043, 1364, 1255, 2214], [2211, 2518, 598, 1966], [1506, 2480, 574, 2265], [1710, 1878, 1208, 2162]]
[2441, 1127, 1575, 1760]
[2671, 1946, 1367, 1893]
[2358, 1070, 841, 2721]
[2630, 2202, 715, 1594]
[1721, 1934, 1101, 2068]
[2169, 1080, 1102, 1962]
[2043, 1364, 1255, 2214]
[2211, 2518, 598, 1966]
[1506, 2480, 574, 2265]
[1710, 1878, 1208, 2162]
[2441, 1127, 1575, 1760]
[2671, 1946, 1367, 1893]
[2358, 1070, 841, 2721]
[2630, 2202, 715, 1594]
[1721, 1934, 1101, 2068]
[2169, 1080, 1102, 1962]
[2043, 1364, 1255, 2214]
[2211, 2518, 598, 1966]
[1506, 2480, 574, 2265]
[1710, 1878, 1208, 2162]
[2441, 1127, 1575, 1760]
[2671, 1946, 1367, 1893]
[2358, 1070, 841, 2721]
[2630, 2202, 715, 1594]
[1721, 1934, 1101, 2068]
[2169, 1080, 1102, 1962]
[2043, 1364, 1255, 2214]
[2211, 2518, 598, 1966]
[1506, 2480, 574, 2265]
[1710, 1878, 1208, 2162]
[2441, 1127, 1575, 1760]
[2671, 1946, 1367, 1893]
[2358, 1070, 841, 2721]
[2630, 2202, 715, 1594]
[1721, 1934, 1101, 2068]
[2169, 1080, 1102, 1962]
[2043, 1364, 1255, 2214]
[2211, 2518, 598, 1966]
[1506, 2480, 574, 2265]
[1710, 1878, 1208, 2162]
--------------
[2441, 2671, 2358, 2630, 1721, 2169, 2043, 2211, 1506, 1710, 1127, 1946, 1070, 2202, 1934, 1080, 1364, 2518, 2480, 1878, 1575, 1367, 841, 715, 1101, 1102, 1255, 598, 574, 1208, 1760, 1893, 2721, 1594, 2068, 1962, 2214, 1966, 2265, 2162]
  0%|                                                                                                                                           | 0/20 [00:00<?, ?it/s]value of L 0
value of count 0
  0%|                                                                                                                                           | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "generate_niid_20users.py", line 86, in <module>
    X[user] += mnist_data[l][idx[l]:num_samples].tolist()
  File "/Users/sharadchitlangia/miniconda3/envs/FL/lib/python3.6/site-packages/pandas/core/frame.py", line 2881, in __getitem__
    indexer = convert_to_index_sliceable(self, key)
  File "/Users/sharadchitlangia/miniconda3/envs/FL/lib/python3.6/site-packages/pandas/core/indexing.py", line 2132, in convert_to_index_sliceable
    return idx._convert_slice_indexer(key, kind="getitem")
  File "/Users/sharadchitlangia/miniconda3/envs/FL/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3159, in _convert_slice_indexer
    self._validate_indexer("slice", key.start, "getitem")
  File "/Users/sharadchitlangia/miniconda3/envs/FL/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 5000, in _validate_indexer
    self._invalid_indexer(form, key)
  File "/Users/sharadchitlangia/miniconda3/envs/FL/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3271, in _invalid_indexer
    f"cannot do {form} indexing on {type(self).__name__} with these "
TypeError: cannot do slice indexing on Int64Index with these indexers [False] of type bool_

some questions about the results

thanks for ur code!These days i tryed to run ur code with Non-iid MNIST dataset. However,my ultimate results didn't match that presented in ur paper.I'm wondering why this happened.It seems that FedAvg did better than pFedAvg.I did try different model to train that one,apparently,it didn't work.Maybe i need some suggestions for that.Sorry for bothering!
FedAvg_pFedMe_Com_accuracy_3_18
FedAvg_pFedMe_Com_loss_3_18

UserpFedMe class

Hi , thanks for sharing your code. I have few questions. First is about the model update inside UserpFedMe class. Specifically I don't quite understand this part

self.update_parameters(self.local_model)
. What this line is doing is basically updating the personalized parameters (self.model.parameters()) to the final updated parameters of self.local_model. Can you please explain this further ? I don't know, maybe I have missed something, but the self.model.paramters() are already updated inside the inner optimization as done here
p.data = p.data - group['lr'] * (p.grad.data + group['lamda'] * (p.data - localweight.data) + group['mu']*p.data)
, and I don't understand why we do need to set them back to final local weight! Second Question is I don't understand this
old_param.data = new_param.data.clone()
. It makes sense to update local_param but do not quite get why we need to update old_param. Please correct me if I am wrong but the only reason I can think of is because we are evaluating on the final aggregated model. Third question is here
for user in self.users:
where we train on all users not selected users. Why is that ? thanks much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.