Hi, I followed the instructions but the code for mini-imagenet keeps exiting becau

I used a single <a href="https://www.nvidia.com/en-us/geforce/products/10series/geforc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

fetch_data might not be working correctly for mini-Imagenet? about supervised-reptile HOT 19 OPEN

openai commented on July 21, 2024

fetch_data might not be working correctly for mini-Imagenet?

from supervised-reptile.

Comments (19)

unixpickle commented on July 21, 2024 1

I used a single 1080 Ti for most of the experiments. For all the benchmarks, training takes less than a day. The exact time depends on the hyper-parameters and dataset you use.

from supervised-reptile.

unixpickle commented on July 21, 2024

Thanks for reporting this. Are some of the ImageNet images valid? If so, is there a general pattern as to which ones are empty?

I'm not sure what could cause this. Perhaps the ImageNet server is doing some kind of rate-limiting. If so, it may be possible to modify the script to detect this and print an error.

I'd expect omniglot to be fine, since the omniglot download process is much simpler than that for Mini-ImageNet

from supervised-reptile.

nattari commented on July 21, 2024

While trying to download the image from the list of imagenet url, some of the image-ids do not exist. Any particular reason for that? For eg. in test data, n01930112_10035 is not there in the list. I have used "List of all image URLs of Fall 2011 Release".

TIA

from supervised-reptile.

unixpickle commented on July 21, 2024

@nattari some of the images are not in the 2011 release, since the dataset is from the 2012 release. That's why the download script extracts files from the 2012 tar file. If there is a better API for getting 2012 images, let me know.

from supervised-reptile.

nattari commented on July 21, 2024

Hmm, I am using the images from 2011 release at the moment. Since, I am more interested in understanding the algorithm so I guess that would work too. In case I find out, I would definitely share.

Could you please tell me, what GPU configuration do you use for training mini-imagenet and how long does it take you to train?

from supervised-reptile.

nattari commented on July 21, 2024

I started training on some other data of ImageNet. Everything works fine but I get this warning : "Possibly corrupt exif file". Training gets stuck after some iterations. Do you have any clue what could be the problem here? The only thing I change is data. Is it due to the warning ?

from supervised-reptile.

unixpickle commented on July 21, 2024

Is it possible that some class directories are empty or don't contain enough samples? I think it's possible to hang the training loop if there aren't enough samples to create a mini-batch, since it keeps looping over the data forever hoping to create a whole mini-batch.

from supervised-reptile.

nattari commented on July 21, 2024

I thought about it and made sure that all the class directories contain enough sample. So that doesn't seem to be the problem. What I fear atm is the warning! But not sure.

from supervised-reptile.

unixpickle commented on July 21, 2024

Huh, interesting. It would be nice to know where the program is stuck. When you kill the process, does Python print out a stack trace? If not (e.g. if the hang is inside the TF graph), maybe it will be helpful to attach a debugger to the process and look at a backtrace that way.

from supervised-reptile.

nattari commented on July 21, 2024

Believe it is stuck in TF graph, yes I am trying to debug now. But here is the screenshot in case you could find something fishy here.

from supervised-reptile.

nattari commented on July 21, 2024

I am observing very less GPU utilization for both Omniglot and MiniImageNet (~ 2-3% or even less). This shouldn't be the case, I believe?
Also, I am using 1080i and for MiniImageNet it is only utilzing ~500MB memory and doesn't change even if I change the batch size. Can you provide insights on this behaviour?
(I am using Python 3.6, Tensorflow 1.8 and Cuda 9.0)

TIA.

from supervised-reptile.

unixpickle commented on July 21, 2024

@nattari at first, things will be slow because the training pipeline is still loading the images into memory and resizing them on the fly. After training has run for a little while, the images will all be cached in memory, and you should start to see higher GPU utilization.

from supervised-reptile.

unixpickle commented on July 21, 2024

As for memory, I'm not entirely sure. If you're referring to GPU memory, I think TensorFlow allocates blocks of memory at once, so you might not see subtle changes. If Python memory, then this is expected, since Python's memory usage will be dominated by loading and caching images.

from supervised-reptile.

lampardwk commented on July 21, 2024

@unixpickle I had the same problem with the incomplete miniimagenet data downloaded from fetch_data.sh,most of folder is empty images. Could you send me a complete data set？My email address is [email protected],thanks.

from supervised-reptile.

Liuyubao commented on July 21, 2024

@unixpickle So sorry to bother that had the same problem with the incomplete mini-imagenet data downloaded from fetch_data.sh, most of folder is empty images. Could you also send me a complete data set？My email address is [email protected], thanks a lot for your time and patience.

from supervised-reptile.

eghouti commented on July 21, 2024

Hello @unixpickle,

First I would like to thank you for your excellent work that helps me a lot in my research. I would like to ask you if I can have the mini-imagent dataset you used to run these experiments. My email address is [email protected]

Best regards,

Ghouthi

from supervised-reptile.

ligeng0197 commented on July 21, 2024

Hi @unixpickle .

I find that MiniImageNet source url in fetch_script is already invalid and I tried to find another source on ImageNet website, and I do find one. (http://www.image-net.org/challenges/LSVRC/2012/dd31405981ef5f776aa17412e1f0c112/ILSVRC2012_img_train.tar).
However, after replacing the url in fetch_script and downloading images, I met empty images problem mentioned by others. I got 13 empty images in train and val datasets, and I decided to relpace them manually with same object images. Unfortunately, after replacing I still get stuck by (OSError: image file is truncated (26 bytes not processed)) when training. I believe its caused by some incomplete images in train dataset, but I am kind of tired to fix it by hand . So would you mind sharing the miniimagenet dataset to google drive or some other places we can download directly? Thx ahead.

P.S. when replacing the empty images, i find MiniImagenet taken here is a little different from what is taken in pytorch-MAML(https://github.com/dragen1860/MAML-Pytorch).

from supervised-reptile.

asd81310 commented on July 21, 2024

Did someone get the correct mini-imagenet in this experiments? If you did, can send share the dataset to me? Thanks very much for your help. My email address is My email address is [email protected].

from supervised-reptile.

XA23i commented on July 21, 2024

you can follow the instructions at this link https://github.com/dragen1860/MAML-Pytorch.
Then modify miniimagenet.py line 53:
names = [f for f in os.listdir(self.dir_path) if f.endswith('.jpg')]

from supervised-reptile.

fetch_data might not be working correctly for mini-Imagenet? about supervised-reptile HOT 19 OPEN

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent