Dear authors, The evaluate_dice_on_tests</

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Mismatching evaluation code for FedKiTS19 about flamby HOT 4 CLOSED

akash-07 commented on June 21, 2024 1

Mismatching evaluation code for FedKiTS19

from flamby.

Comments (4)

jeandut commented on June 21, 2024

Hello @akash-07 !
For models working on data modalities that are too big to fit in RAM we have functions that batch the inference such as evaluate_dice_on_tests to measure prediction/ground truth match at the sample level, this is also the case for Fed-LIDC. They are the ones that are being used in the benchmark script.
I agree that it's not really clear. The metric functions also "work" but they are patch-wise.
Maybe @ErumMushtaq can provide more info ?

from flamby.

jeandut commented on June 21, 2024

So long-story short evaluate_dice_on_tests is the "true" function to use to replicate benchmark numbers in the article, see here: https://github.com/owkin/FLamby/blob/main/flamby/benchmarks/benchmark_utils.py#:~:text=elif%20dataset_name%20%3D%3D%20%22fed_kits19,compute_ensemble_perf%20%3D%20False line 589 to 610 with a batch size of 2.

from flamby.

akash-07 commented on June 21, 2024

Thanks @jeandut, that helps !

I think most users of the repo would attempt using evaluate_model_on_tests. Adding a note or some documentation regarding which functions to use per dataset would be helpful.

As another option, fixing evaluate_model_on_tests also seems easier.

from flamby.

jeandut commented on June 21, 2024

You are completely right about the lack of documentation on loss funtions I will open an issue about it.
However the goal of FLamby is not to impose metrics or anything upon the user it is to be a playground for FL research.

from flamby.

Recommend Projects

Mismatching evaluation code for FedKiTS19 about flamby HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent