galavineri / isic-archive-downloader Goto Github PK
View Code? Open in Web Editor NEWA script to download the ISIC Archive of lesion images
License: Apache License 2.0
A script to download the ISIC Archive of lesion images
License: Apache License 2.0
The following command to choose between benign or malignant from the datasets does not work:
python download_archive.py --filter benign
Getting the following error:
Traceback (most recent call last):
File "download_archive.py", line 254, in
main(sys.argv[1:])
File "download_archive.py", line 248, in main
seg_skill=args.seg_skill, num_processes=args.p)
File "download_archive.py", line 36, in download_archive
descs_dir=descs_dir)
File "download_archive.py", line 138, in download_descriptions_and_filter
ImgDownloader.save_description(description, descs_dir)
AttributeError: type object 'LesionImageDownloader' has no attribute 'save_description'
It gets stuck at 196 for me. When I set offset 200 and num-image 3000 it gets stuck at 100. I read your comment on trying to fix it tried to implement didn't work. Please fix this asap.
I am using this script for my MSc thesis (thanks!) as a subtree module
git subtree add --prefix ISIC-Archive-Downloader [email protected]:GalAvineri/ISIC-Archive-Downloader.git master --squash;
But, of course, I don't want to version control the downloaded data. It would be wise to .gitignore
the ./Data
directory specifically. While you're at it, consider a good .gitignore
from https://www.gitignore.io/api/vim,osx,emacs,linux,python,windows,webstorm,sublimetext,visualstudiocode
Some images have more than one segmantation available and if first one is not available which was marked as 'failed' the next one can be downloaded but currently it is not.
Rather than just "benign" and "malignant" i'd like to be able to filter by diagnosis.
I.e only file melanoma, Nevus, seborrheic keratosis.
Premises: I am a newbie and this is my first issue I am opening in GitHub.
I am trying to use your downloader because I need ISIC archive to build up and test GANs analysis by using Keras and TensorFlow packages. I am using Colab so as to exploit the GPU provided by Google and I am also having all scripts and data on GoogleDrive shared folder Assignment_1. Here is my code:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p drive
!google-drive-ocamlfuse drive
!pip install -r drive/Assignment_1/requirements.txt
!python3 drive/Assignment_1/download_dataset.py 13000
Question: Where are all images stored? Once I execute the code I get a psycache folder with some .pyc files in it but I really don't know what do they stand for. The point is I tried to go through download_dataset_subset.py and download_dataset.py but I am not understanding it. When I type:
!python3 drive/Assignment_1/download_dataset.py 13000
I just can see:
Collecting the images ids
Downloading images and descriptions
43% (5635 of 13000) |########
as first message and then updating messages until I reach 100% so I am pretty sure the download is complete.
How can I download dataset mask or segmentation images?
This is what I get when I try to run the script:
File "download_archive.py", line 93
def download_descriptions(ids: list, descs_dir: str, num_processes: int) -> list:
^
SyntaxError: invalid syntax
macOS Catalina
Python 3.7.3
This error appears in the runtime. Could you have a look
Sometimes the script hangs while trying to download a specific description or image. When the user requests a specific number k of samples, it would be nice if the script skipped or retried samples that are taking too long to download.
Right now I've been waiting for a while to download 250 malignant samples because it has been stuck trying to download the 197th for a few hours.
For the record, after I gave up and killed the process the exceptions revealed
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='isic-archive.com', port=443): Max retries exceeded with url: /api/v1/image/54e7ddbbbae4780ec59cde5f (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x10f3b2518>: Failed to establish a new connection: [Errno 60] Operation timed out',))
when i run "download_archive.py" it gives me this error:
File "download_archive.py", line 1, in
from download_single_item import LesionImageDownloader as ImgDownloader, SegmentationDownloader as SegDownloader
File "download_single_item.py", line 97
img : Image.Image = Image.open(image_path)
^
SyntaxError: invalid syntax
It would be better if you can add requirements.txt. :)
Thank you for creating the script.
I tried running the script with all initial requirements installed. But getting below error:-
C:\Users\Supriya Singh\Documents\3rdSEM\ISIC-Archive-Downloader-master>python download_archive.py --filter benign
Traceback (most recent call last):
File "download_archive.py", line 1, in
from download_single_item import LesionImageDownloader as ImgDownloader, SegmentationDownloader as SegDownloader
File "C:\Users\Supriya Singh\Documents\3rdSEM\ISIC-Archive-Downloader-master\download_single_item.py", line 8, in
from PIL import Image
File "C:\Users\Supriya Singh\Anaconda3\lib\site-packages\PIL\Image.py", line 64, in
from . import _imaging as core
ImportError: DLL load failed: The specified module could not be found.
C:\Users\Supriya Singh\Documents\3rdSEM\ISIC-Archive-Downloader-master>
Could you please help to find the cause?
First of all, thanks for sharing this great repo. It makes downloading the dataset way much easier.
When I tried to download the segmentation map using the code, it will always get stuck at 51%. The number of downloaded segmentation map is always 13779. I'm wondering if that's because the annotation is not complete or I'm missing something.
Right now if you wanted to download k samples of each class (malignant and benign) you would have to manually download the malignants first
python download_archive.py --num-images k --filter malignant
And then in a separate directory download the benigns
python download_archive.py --num-images k --filter benign
Otherwise you'd overwrite some of the images. And because some images will have the same filenames, you have to do some preprocessing to rename them all consistently before merging them together.
It would be nice if the script was able to do this in one go.
Is it possible to use the code you provided to perform a selective download like only benign or only malignant images?
After downloading the image descriptions with —filter benign it says it will start downloading None images despite downloading the correct number of images.
File "download_archive.py", line 84
def download_descriptions(ids: list, descs_dir: str, num_processes: int) -> list:
^
SyntaxError: invalid syntax
Hey, do you think it's possible for you to implement a feature where we can choose what dataset we want to download from? There are many datasets within the ISIC Archive, but I just want the HAM10000. Do you think that's possible?
Thanks for your help!
Under "Optional download abilities":
Fix the readme.md
Some images have multiple segmentation masks available.
As far as i've researched, their differ in their skill level.
Currently the system just chooses one of the masks, without consideration of the skill level.
It would be preferred if there were an option to choose the highest skill level available
Hello, I download all requirements and when I try to start script ( python download_archive.py / python3.6 download_archive.py) I get :
File "download_archive.py", line 93
def download_descriptions(ids: list, descs_dir: str, num_processes: int) -> list:
^
SyntaxError: invalid syntax
Could you please help to find the cause?
Hello, I'm getting this error while trying to run
python download_archive.py --num_images 1000
Am a bit new to dataset download and loading, so... sorry if this is to basic... :/
File "download_archive.py", line 86
def download_descriptions(ids: list, descs_dir: str, num_processes: int) -> list:
^
SyntaxError: invalid syntax
pip and pip3 updated, already checked Request, Pillow and tqdm.
Any idea? or I'm missing something...
Nice work tho! :) Pretty much what I was looking for!
Hi,
Is it possible (or can it be added) to download images filtered by lesion type (e.g. melanoma) just like filtering by malignancy?
Thanks
I am running the new code in Python 2.7.12 in Linux and I get the following error. In addition no image/description was downloaded:
$ python download_dataset.py
Collecting all images ids
Thread 0 started
Thread 1 started
downloading image (0/29)
downloading image (30/39)
url_image = https://isic-archive.com/api/v1/image/5436e3abbae478396759f0cf/download?contentDisposition=inline
...
downloading image (27/29)
url_image = https://isic-archive.com/api/v1/image/5436e3aebae478396759f105/download?contentDisposition=inline
downloading image (28/29)
url_image = https://isic-archive.com/api/v1/image/5436e3aebae478396759f107/download?contentDisposition=inline
Traceback (most recent call last):
File "download_dataset.py", line 95, in
download_dataset()
File "download_dataset.py", line 71, in download_dataset
print('Thread {0} finished'.format(thread._Thread__kwargs['thread_id']))
AttributeError: 'Thread' object has no attribute '_Thread__kwargs'
Let's say I already downloaded 1000 images and now I want to download 3000.
But I have to download 1000 images again which I already downloaded. Instead, there should be one more parameter in python download_dataset 3000
that tells offset from which number it should download images.
It would be nice if this script came bundled with a utility to filter out invalid data like https://github.com/GalAvineri/MelMedic/blob/master/data/format_data.py#L16
The other utilities are nice too but too specific to your particular project (e.g. I might prefer a different validation scheme) whereas surely everyone wants to filter out invalid data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.