Hello, I tried to reproduce the result on COCO, I implemented my own framework and mos

Questions about reproducing results on COCO about asl HOT 24 CLOSED

alibaba-miil commented on June 20, 2024

Questions about reproducing results on COCO

from asl.

Comments (24)

GhostWnd commented on June 20, 2024 1

Thank you very much.

from asl.

mrT23 commented on June 20, 2024

three observations to start with:

for testing convergence, use smaller resolution (224) and larger batch size (128)

i don't see any augmentations in your training code.
use RandAugment or AutoAugment at least + cutout

also, something weird with your scheduler:
scheduler = lr_scheduler.OneCycleLR(optimizer, max_lr = 0.0002, total_steps = total_step, epochs = 25)
its hardcoded to 25 epochs, yet you loop over only 5 epochs
add epochs as hyperparameter to arg list, and use it everywhere instead of hard-coded numbers. search for other hyper-parameters that should belong to arg list as well

from asl.

mrT23 commented on June 20, 2024

p.s. 1
also for testing and prototyping, use tresnet_m

p.s. 2
also you need to implement "true weight-decay" (not doing weight decay on bias and batch norm)

p.s. 3
i will probably notice other problems in the future, but we need to start from somewhere :-)

from asl.

GhostWnd commented on June 20, 2024

Thanks for your comment, I have tried to follow your instructions and the new train.py is available at https://github.com/GhostWnd/reproducingASL, the newest one is train_ver3.py, I will try to run it and report later.

I have run the train_ver3.py for around 600 iterations, and it appears that the training speed is much slower than the beginning, at the begining, it requires 3 second for each iteration, but after 600 iteration, it requires 9 second for an iteration. It puzzles me much, I doubu whether I have implemented the code right.

from asl.

mrT23 commented on June 20, 2024

i will take a look at the code and try to run It when i have the time.

good work so far, i think with joint forces we are on our way to finally have a modern multi-label code for the community to use, that vast majority of repo that exist are way outdated.

several more corrections and suggestions:

args.do_bottleneck_head = False (not True)

one more correction is that you are using the 2017 split. while this is not a "mistake" (and your results will be a little higher), in articles people use the 2014 split.

what about mixed precision ? with modern pytorch it is a few line of code ("with autocast():"...)

to improve speed, you don't have to update EMA every iteration. you cant update it every ~5 iterations with slightly higher decaey rate, and still ger similar results.

load a pretrained model, run only inference, and make sure you reproduce the article results (after switching to 2014 split)

make sure, especially in validation, that you are not building enormous vectors along the training that clog the RAM memory. sometimes its better to pre-allocate memory if you need to store large vectors

you have not implemented true WD correctly. this is not AdamW.
see example for true WD in:
https://github.com/rwightman/pytorch-image-models/blob/198f6ea0f3dae13f041f3ea5880dd79089b60d61/timm/optim/optim_factory.py
(def add_weight_decay...)

from asl.

GhostWnd commented on June 20, 2024

Thank you for your comment, I will try to correct it.
And if it's possible, could you please release the loss change when you run my code? Pure data is the best.
Thank you very much.

I have tried to correct true WD, it's now train_ver4.py, available at https://github.com/GhostWnd/reproducingASL

from asl.

mrT23 commented on June 20, 2024

Hi GhostWnd

I took a deeper look at the code. there are several major problems there.
make sure you understand whats the problem in each and every one, and apply proper corrections. don't skip a single one.
most of these problems are "deal-breakers".
after correcting all of them, repeat your runs, and we can compare results.
I hope i will have some results to compare until then (If i won't find more bugs)

don't get discouraged, we are making progress, and sometimes the journey is more educational than the destination.

problems:

currently not using randaugment (commented in train_loader)
using uninitilizaing model (for training and comparison to article, you should initialize model from relevant imagenet model https://github.com/Alibaba-MIIL/TResNet/blob/master/MODEL_ZOO.md)
using 2017 coco split is wrong (use instead 2014 coco split, different json files only )
Cutout(n_holes = 1, length = 16) -> Cutout(n_holes = 1, args.image_size/2)
validation should be done once an epoch, no more and no less
```
  preds.append(output.cpu())
  targets.append(target.cpu())
  
  ->
```
preds.append(output.cpu().detach())
targets.append(target.cpu().detach())

mAP_score = validate_multi(val_loader, model, args, ema)
->
model.eval()
mAP_score = validate_multi(val_loader, model, args, ema)
model.train()

calculate only mAP metrics. remove other metrics, they are only confusing during training

from asl.

mrT23 commented on June 20, 2024

just to give you motivation, i got a good score last night when running a corrected code...

from asl.

GhostWnd commented on June 20, 2024

Thank you for your comment and effort, I will try to correct the code and run it.
Thank you very much.

I have tried to fix the problems you mentioned , my code is train_ver5.py available at https://github.com/GhostWnd/reproducingASL

Other than train_ver5.py, I also edit helper_functions.py and to allow me to use 2014 json to train 2017 data.
Here is the change:

path = coco.loadImgs(img_id)[0]['file_name']
img = Image.open(os.path.join(self.root, path)).convert('RGB')
->
path = coco.loadImgs(img_id)[0]['file_name']
path = path.split('_')[-1] #remove 'MSCOCO_2014'
img = Image.open(os.path.join(self.root, path)).convert('RGB')

When I try to use 2014json to train 2017data, it seems that when validate, there are some images that in 2014 validate dataset while not in 2017 validate dataset, I would like to know, does the difference between 2014 and 2017 affect the result much?
Thank you very much.

from asl.

GhostWnd commented on June 20, 2024

Sorry to bother again, I know due to commercial issues you can't release your training code, but could you release the code you correct based on my train.py?

If that is not possible, could you please release the loss record of your corrected code based on my training code, so that I can compare the result myself?

Thank you very much.

from asl.

mrT23 commented on June 20, 2024

Hi GhostWnd

there were other problems in the code.
The two major ones:

sigmoid was done twice (!) - once in the direct prediction, second in the loss.
EMA was not performed correctly (its a separate model with separate validation)

anyway, this code fully reproduces the article results (i think it even surpasses it):
train_asl_reproduce.zip

i will attach logs for 224 and 448 trainings later

you are welcome to test it yourself and give me feedback.

thanks for the collaboration, together we will release the first publicly available modern multi-label code
:-)

from asl.

GhostWnd commented on June 20, 2024

Thank you so much! :-)
I will upload the train file to make it publicly available as well as test it by myself and give feedback to you as soon as possible.

from asl.

mrT23 commented on June 20, 2024

this is an example log file (notice - resolution 224, mtresnet)
mtresnet_224.txt

from asl.

mrT23 commented on June 20, 2024

Thank you so much! :-)
I will upload the train file to make it publicly available as well as test it by myself and give feedback to you as soon as possible.

do you have an objection that i will add the code also to
https://github.com/Alibaba-MIIL/ASL ?
i think it will help it gain more traction. there are very few (zero) modern multi-label code-bases like this, with top results.

i will of course share credit with you, i had made a lot of changes and enhancements to the code, but you provided the base implementation

from asl.

GhostWnd commented on June 20, 2024

No objection, it's my pleasure, thank you very much.

from asl.

GhostWnd commented on June 20, 2024

And I wonder whether you can put the mode based on tresnet_m and input size 224 into you pretrained model in https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md?

I would like to adjust some hyper parameters to test the influence of those hyperparameters.
And apply it to other dataset as pretrained model.
Thank you very much.

from asl.

mrT23 commented on June 20, 2024

i am not sure i fully understand your question.

models in
https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md
are standard imagenet models for downstream tasks. these are the models you should use to initialize training on COCO.

from asl.

GhostWnd commented on June 20, 2024

Well, I think if I don't make a mistake
models in ASL/blob/main/MODEL_ZOO.md are models trained on MSCOCO, the link is https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md
while models in TResNet/blob/master/MODEL_ZOO.md are standard imagenet models, the link is https://github.com/Alibaba-MIIL/TResNet/blob/master/MODEL_ZOO.md, right?

I just wonder whether you could upload the model you trained with tresnet_m with input size 224 into https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md, the ASL/blob/main/MODEL_ZOO.md one

from asl.

GhostWnd commented on June 20, 2024

Or could you please share the model you trained on tresnet_m and input size 224 with me?

I would like to adjust some hyper parameters to test the influence of those hyperparameters.
And apply it to other dataset as pretrained model.
Thank you very much.

from asl.

mrT23 commented on June 20, 2024

just to be clear:
tresnet_m 224 model trained on MS-COCO ?

from asl.

GhostWnd commented on June 20, 2024

yes, the one that produces the log file mtresnet_224.txt

from asl.

mrT23 commented on June 20, 2024

added to
https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md

from asl.

LOOKCC commented on June 20, 2024

this is an example log file (notice - resolution 224, mtresnet)
mtresnet_224.txt

Can you attach logs for 448 resolution with tresnet_l using this training code? I found it's hard for me to reprodect the 86.8mAP resault in paper.

from asl.

mrT23 commented on June 20, 2024

@LOOKCC
run
https://github.com/Alibaba-MIIL/ASL/blob/main/train.py

from asl.

Questions about reproducing results on COCO about asl HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent