alibaba-miil / imagenet21k Goto Github PK
View Code? Open in Web Editor NEWOfficial Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
License: MIT License
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
License: MIT License
Recently, do an object fine-grained detection task, is it convenient to transferred to object detection?
thanks
After using single label to pretrain on 21k data, what is the finetune strategy on 1k data? E.g. MobilenetV3_large_100
Hi,
Thank you for your great work! I am currently trying to run your code with a larger batch size to better parallelize the code. Could you provide some hints on how to adjust the learning rate with respect to the larger batch size? I noticed that in the DeiT paper, the authors propose to adjust the learning rate with the same scale of batch size, should I also do such scaling?
Thank you for your time!
i use vit_base_patch16_224_in21k
model, when i train with pretrained model,it has high top1.avg
.about 99 %.
but when i use same data(train data) to predict ,the resutl very bad!!,i dont know why,can you give me some idea?
Hi,
great work, I am curious: Could you please point out the precise definition of the mAP metric used for evaluation of the experiments on the MSCOCO dataset in the paper?
Thanks!
Why do we need to load pretrained ImageNet-1K model if we are going to pretrain based on the imagenet21k dataset?
In order to verify the effectiveness of ImageNet-21k dataset pretraining. I modified the Faster RCNN configuration file in mmdetection, only point the path of the backbone resnet50 pretraining weights to the 21k trained model but achieved worse results than the torchvision pretraining weights
@mrT23 Dear sir, when I use the pretrained imagenet21k model and set the lr = 0.01or 0.1 rather than default lr value 0.0003,the Accurancy is very low, such as 0.17% util train dozens of epochs.
if I use sgd optimizer with lr=0.01 or 0.1 , the result will be better than using adam with lr=0.0003 about 0.05% on my own dataset, , how to choose the best optimizer and corresponding lr ? Thank you very much!
after reading all the issues, I still don’t understand how to build the tree...
I want to know how to get multi-label information of each image. It doesn't contain these information in winter21_whole.tar.gz.
I'm very interested in this awesome work! Do you try your pretrained model on other tasks like COCO detection or cityscapes segmentation? The performance on the classification tasks is amazing. I wonder whether it can improve other tasks.
Could you provide the training parameters when using 8 * V100 with DDP?
When I use the following command line
python3 -u -m torch.distributed.launch --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_port 2221 train_semantic_softmax.py --data_path /data/imagenet22kp_fall/ --model_name mobilenetv3_large_100 --epochs 80 --weight_decay 1e-4 --batch_size 1024 --lr 3e-4 --num_classes 11221 --tree_path ./data/imagenet21k_miil_tree.pth --model_path=./mobilenetv3_large_100.pth
The accuracy is only 71.366%, which is far lower than 73.1% reported in the paper.
Hi, thanks very much for sharing the code! I have read the code and found that the current Adam optimizer is not the AdamW (true weight decay) you mentioned in Appendix B.1 of your paper. I wonder do I get something wrong here?
"ImageNet-21K-P processed dataset, based on ImageNet-21K winter release, is now available for easy downloading via the offical ImageNet site."
No responses or permission granted since last year. Need help
Thank you for this awesome work! I want to make sure that you only rescale the input image to 0-1 instead of normalizing them by (img-mean)/std like we usually do in ImageNet training. Is that right?
Hi, I have seen that you have updated single label pretraining script on in21k. This is really great work. I have some questions about pretraining ViT:
tresnet_m
, do you have the configs for vit-b-16
? Or it is actually the same?cheers,
ImageNet21K/src_files/semantic/metrics.py
Line 41 in 7cb9e67
I have got a casting error because of these two lines of code. num_valids_total
is int
.
rt /= n
RuntimeError: result type Float can't be cast to the desired output type Long
Hi!
Thanks for the great repo!
I'm using timm to load your pretrained models.
The "_in21k" variants (e.g tresnet_m_miil_in21k) output 11221 classes in the final layer,
which seems to be the amount of classes in the Fall 11 Version.
Where can I find the mapping from the model output to the class names/wordnet IDs?
Hi, in train_semantic_softmax.py
, line 60 calls model = to_ddp(model, args)
for the second time (it already calls the same in line 54.) I think this line should be removed.
Hi,
Can you please share the link to the weights file of model trained on Stanford cars data set. I am unable to get the expected results using https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/tresnet_l_v2_miil_21k.pth
Please guide.
Thanks,
In the vit paper, it says:
The classification head is implemented by a MLP with one hidden layer at pre-training
time and by a single linear layer at fine-tuning time
So if you are using timm package, they define the head like this:
https://github.com/rwightman/pytorch-image-models/blob/d3f744065088ca9b6b3a0f968c70e90ed37de75b/timm/models/vision_transformer.py#L293
Did you reach the stats in your paper using single layer head or head of one hidden layer?
1 machine 2 GPUs
I cant use command line to start
how to start in the wInternal code to use DDP
i cant find where to save mode, how to save in the DDP
Hello, thanks for your great work.
I am confused about different versions of the dataset. As far as I understand, we here have the following version:
fall11_whole.tar, size 1.3T, 21841 classes
ImageNet-21K-P based on fall1 release could be obtained using the processing_script.sh
, 11221 classes
winter21_whole.tar.gz, size 1.1T, 19167 classes
ImageNet-21K-P based on winter21 release could be (or not) using the processing_script.sh
, 10450 classes
My question is: can I use the processing_script.sh
to process winter21_whole.tar.gz to get the so-called variant that has 10450 classes?
As it is evident SE module can dramatically improve the accuracy of network such as ResNet, coule you please also provide some pretrained models?
Just like SE-ResNet101/SE-ResNet50... in this page.
https://github.com/Cadene/pretrained-models.pytorch#torchvision
Could you provide the ResNet50 model (pretrained on 21K) fine-tuned on the ILSVRC-2012 dataset?
Dear author :
Thank you for your awesome idea !
I have two questions :
1、If the label of the parent node does not appear in origin label set, how do you deal with it ?
2、When I use WordNet.hypernyms(), sometimes it will return multi parents for one synsetid, how do you deal with it?
3、How do you set the 0~11 level, if you just use the number of its parents, when a node have two parents, obviously it will get the wrong level, besides that, the labels may have some overlaps cross different levels, how do you get over it?
Looking for you apply
I have seen your code very nice. I want to ask that how to get the labeling and classes for the images / folders?
Hi @mrT23, thanks for your great work! Currently I use timm train.py to finetune the 'vit_base_patch16_224_miil_in21k' model on cifar100, however I can't get the reported result 94.2%.
Here is my running script.
python -m torch.distributed.launch --nproc_per_node=8 --master_port 6016 train.py \
/data/cifar-100-images/ \
-b=64 \
--img-size=224 \
--epochs=50 \
--color-jitter=0 \
--amp \
--lr=2e-4 \
--sched='cosine' \
--model-ema --model-ema-decay=0.995 --reprob=0.5 --smoothing=0.1 \
--min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0 \
--model=vit_base_patch16_224_miil_in21k \
--pretrained \
--num-classes=100 \
--opt=adamw --weight-decay=1e-4 \
--checkpoint-hist=1
I try several settings:
adam, lr=2e-4, wd=1e-4 92.44%
adamw, lr=2e-4, wd=1e-4 92.90%
sgd, lr=2e-4, wd=1e-4 87.49%
adamw, lr=4e-4, wd=1e-4 92.24%
adamw, lr=2e-4, wd=1e-2 93.08%
Could you give me some suggestions?
Has the ImageNet1K validation data and Imagenet21K training data been de-duplicated?
The latest version timm can't find "vit_base_patch16_224_in21k_miil"
Hi, I'm impressed by your great work! I'm wondering would it be possible for you to release the model checkpoints after you trained from the pre-trained ImageNet-21k-P weights by your semantic KD on ImageNet-1k, which are the models in your Table6. I believe it would be more convincing if others could directly test your model on ImageNet-1k.
HotelID is a dataset that has similar hierarchical architecture as Imagenet21k.
https://arxiv.org/abs/2106.05746
So I want to try to use this project to train on that dataset.
The question is in that case, each chain_id has many sub-hotels, this means the parent of the chain_id class is itself when genrating the 'semantic tree'.
Do you think it can give reasonable results?
I want to speed up the tresnet_m go a step further, if I modify the bottlebneck strcture"[3,4,11,3]" to "[3,4,8,3]",the network can work too. Do you compare the modified "[3,4,8,3]" or "[3,4,6,3]" to standard resnet50"[3,4,6,3]" about accurancy and speed? Thank you a lot
ImageNet21K/visualize_detector.py
Line 25 in 72c822a
error:
Unknown model (vit_base_patch16_224_miil_in21k)
how to run with my own wight
i tried to run visualize_detector.py, but there is a httperror
urllib.request.urlretrieve(url, filename)
Exception has occurred: HTTPError
HTTP Error 403: Forbidden
how can it be solved?
Dear author :
When I test resnet50 model you provide on imagenet21k-p val dataset, the ImageNet-21K-P semantic top-1 Accuracy is just 69%, but what you have claimed is 75.6%, but I indeed see the improvement on downstream task, what's the problem?
Hi @mrT23, I want to do the inference on ImageNet 1K but do know where can i download the metadatas. would you please tell me where can i find the metadata for ImageNet1K?
Thanks!
Hello,
Great work! Thanks for sharing. I was wondering if you can please share the training set accuracy and curves on ImageNet21K-P. In the appendix, I think you have reported the final validation set accuracy. It would be great if you can also share the final accuracy and training curves on the train set. It would help to debug any issue with other models.
Best,
Ankit
Hi, I try to transfer the ImageNet-21k model (resnet50_miil_21k) to ImageNet-1K according to the details provided in your paper and https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/Transfer_learning.md. But I only get 79.5% top-1 acc on ImageNet-1K, lower than the 82.0% in your readme. Could you give me some suggestions?
My training command is as follows:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py
data/ImageNet-1k/
-b=128
--img-size=224
--epochs=100
--color-jitter=0
--amp
--sched='cosine'
--model-ema --model-ema-decay=0.995 --reprob=0.5 --smoothing=0.1
--min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0
--pretrained
--lr=2e-4
--model=resnet50
--opt=adam --weight-decay=1e-4
Hi,
Thank you for sharing your great work! I am currently running with the Winter 21 Version of the ImageNet-21k-P dataset. But I cannot find the link to the semantic tree file winter21_imagenet21k_miil_tree.pth you mentioned in this page. Do you know where can I get this file?
Thank you for your time!
I can't find anywhere semantic tree for the winter_21 version with 10450 classes.
I don't think there's any working link provided in this repo or in image-net.org
Could you point me to the file?
Thank you
When run training, a warning appear:
UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in
the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first
value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
Since the requirement of this code is torch 1.7.1, I assume the code can be affected by the warning. Should I reorder the call? or just ignore the warning?
Hello, I see that you already have randaugment in you data preprocessing. Why do you in additional have cutoutPIL in transform knowing that randaugment has cutout? Does it have better performance?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.