Comments (8)
Thank @philipco for looking into this issue. That looks like a download error the fact that it is not handled means the logic needs to be improved to check integrity of all images. We will look into that next week !
from flamby.
@philipco I cannot reproduce. Do you have enough space to hold the data ?
from flamby.
I have tested commit b9f26aacab7383daff2c0a77504a3c11cdf570a0
with a fresh install.
from flamby.
Here is a dump of my environment:
absl-py==1.0.0
albumentations==1.1.0
astor==0.8.1
attrs==21.4.0
autograd==1.4
autograd-gamma==0.5.0
cachetools==5.0.0
certifi==2021.10.8
cfgv==3.3.1
charset-normalizer==2.0.12
cloudpickle==2.0.0
cycler==0.11.0
dask==2022.5.0
dicom-numpy==0.6.2
distlib==0.3.4
efficientnet-pytorch==0.7.1
filelock==3.6.0
-e git+https://github.com/owkin/FLamby.git@b9f26aacab7383daff2c0a77504a3c11cdf570a0#egg=flamby
fonttools==4.33.3
formulaic==0.3.4
fsspec==2022.3.0
future==0.18.2
google-api-core==2.7.3
google-api-python-client==2.47.0
google-auth==2.6.6
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
googleapis-common-protos==1.56.0
grpcio==1.46.0
histolab==0.5.1
httplib2==0.20.4
identify==2.5.0
idna==3.3
imageio==2.19.1
importlib-metadata==4.11.3
iniconfig==1.1.1
interface-meta==1.3.0
joblib==1.1.0
kiwisolver==1.4.2
large-image==1.14.3
large-image-source-openslide==1.14.3
lifelines==0.27.0
locket==1.0.0
Markdown==3.3.7
matplotlib==3.5.2
networkx==2.8
nibabel==3.2.2
nodeenv==1.6.0
numpy==1.22.3
oauth2client==4.1.3
oauthlib==3.2.0
opencv-python-headless==4.5.5.64
openslide-python==1.1.2
packaging==21.3
palettable==3.3.0
pandas==1.4.2
partd==1.2.0
Pillow==9.1.0
platformdirs==2.5.2
pluggy==1.0.0
pre-commit==2.19.0
protobuf==3.20.1
psutil==5.9.0
py==1.11.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pydicom==2.3.0
PyDrive==1.3.1
pyparsing==3.0.8
pytest==7.1.2
python-dateutil==2.8.2
pytz==2022.1
PyWavelets==1.3.0
PyYAML==6.0
qudida==0.0.4
requests==2.27.1
requests-oauthlib==1.3.1
rsa==4.8
scikit-image==0.19.2
scikit-learn==1.0.2
scipy==1.8.0
six==1.16.0
tensorboard==2.9.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
threadpoolctl==3.1.0
tifffile==2022.5.4
tifftools==1.3.4
toml==0.10.2
tomli==2.0.1
toolz==0.11.2
torch==1.11.0
torchvision==0.12.0
tqdm==4.64.0
typing_extensions==4.2.0
uritemplate==4.1.1
urllib3==1.26.9
virtualenv==20.14.1
Werkzeug==2.1.2
wget==3.2
wrapt==1.14.1
zipp==3.8.0
Pillow is 9.10.0 can you check if you can visualize/open the image in order to narrow down the issue ? Maybe you can retry the download ?
from flamby.
Can you inspect the image visually by doing:
from PIL import ImageFile, Image
ImageFile.LOAD_TRUNCATED_IMAGES = True
im = Image.open(path_to_faulty_image)
?
from flamby.
Hello Jean,
I retried the download. As before, I had the final validation message saying that the process is complete. But this time, the script python dataset_creation_scripts/resize_images.py
ran correctly. Thus, I think that during my first tryn there has been an unraised download error.
Furthermore, I was wondering which sizes are supposed to have the data after the script? Indeed, I observe that centers might have different sizes:
Center 0: torch.Size([9930, 3, 224, 224])
Center 1: torch.Size([3163, 3, 224, 298])
Center 2: torch.Size([2691, 3, 224, 298])
Center 3: torch.Size([1807, 3, 224, 298])
Center 5: RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 337] at entry 0 and [3, 224, 334] at entry 3
Center 5: torch.Size([351, 3, 224, 298])
In fed_isic2019/benchmarck.py
, I see that there is a cropping: albumentations.RandomCrop(sz, sz)
with sz = 200
. I guess that it is a mandatory to load all the pictures with this cropping to get a size of [3, 200, 200]
?
I would mention in the README the necessity to crop the image before loading them. May I also ask why you are doing a RandomCrop
and not a CenterCrop
?
Cheers.
from flamby.
The preprocessing step fixes the image width to 224 as you can see while keeping aspect ratio intact (no hard resizing, which would impact the shape of the naevi).
We need to have a better default for the transform used in the dataset you are right so that it crops images by default. I'll open an issue about that.
RandomCrop is the data augmentation version of the CenterCrop to introduce more variability into the training images.
from flamby.
Thanks Jean! I think it is worth adding these details to the isic's readme.
from flamby.
Related Issues (20)
- Downloading IXI dataset: link broken HOT 5
- CI issue: numba does not support Python3.11 HOT 4
- fed-ixi dataset download issue HOT 4
- Question about FedAvg strategy. HOT 4
- Pip install with or without -e should install the full FLamby suite
- Dummy Dataset is nor reproducible nor flexible enough HOT 2
- Strategies should accept optimizer arguments
- Doc enhancement: explain more clearly datasets and associated hyperparameters HOT 1
- Strategies Monitoring Improvements: average loss and metrics
- RFC: Should we allow to do epochs instead of batch-updates in FLamby's strategies ?
- C-index computation should not be batched HOT 1
- Fed_KiTS19 code generates negative loss values HOT 6
- Mismatching evaluation code for FedKiTS19 HOT 4
- Adding docs on metrics and evaluation function for each dataset
- Setuptools is set to an old version HOT 1
- Mismatch in python version between `environment.yml` and the CI
- CI is not operational HOT 1
- KITS results mismatch with paper HOT 13
- Caching preprocessed features in Kits19 HOT 1
- RuntimeError: Discrete mean differs significantly from continuous mean. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flamby.