hello! I'm find your batch_size=32 in paper while batch_size=16 in github. Why are the

batch_size? about gsm HOT 5 CLOSED

swathikirans commented on August 23, 2024

batch_size?

from gsm.

Comments (5)

swathikirans commented on August 23, 2024

Hi,
we use slow updating of parameters. The parameters are updated after waiting for 'iter_size' iterations. We set the batch size as 16 and iter_size as 2. Thus, the effective batch size is 32. You can change the batch_size as 32 and iter_size as 1 to have the same effect.

from gsm.

FloydEdwin commented on August 23, 2024

Hi,
we use slow updating of parameters. The parameters are updated after waiting for 'iter_size' iterations. We set the batch size as 16 and iter_size as 2. Thus, the effective batch size is 32. You can change the batch_size as 32 and iter_size as 1 to have the same effect.

hello! Thanks for your reply.
I test your something-v1_RGB_InceptionV3_avg_segment8_checkpoint.pth.tar and got the same result as in your paper.（49.01% top1.However when i using follow shell to train model by myself:

#!/usr/bin/env bash
python main.py something-v1 RGB --arch InceptionV3 \
               --num_segments 8 --consensus_type avg \
               --batch-size 32 --iter_size 1 --dropout 0.5 \
               --lr 0.01 --warmup 10 --epochs 60 --eval-freq 5 \
               --gd 20 --run_iter 1 -j 16 --npb --gsm

I got the 48% acc. If using --batch_size 16 --iter_size=2 I get the result about 48% not the 49% acc reported in your paper too. For BNInception I got the 46% acc(it is 47% in paper!).Maybe there are some details i have ignored. Such as BN? I using 2 2080Ti GPUS.
1.Is there anything I should be aware of when using your GSM code?
2.How many GPUS used duing your training time?
3.The acc 49.01% is a normal vaule or just a maximum of many experiments?
Thanks! Looking forward to your reply!

from gsm.

swathikirans commented on August 23, 2024

Due to the stochastic nature of training, there will be some minor differences in the result. All the models are trained using 2 1080TI gpus. We report the accuracy obtained from a single run.

from gsm.

FloydEdwin commented on August 23, 2024

Due to the stochastic nature of training, there will be some minor differences in the result. All the models are trained using 2 1080TI gpus. We report the accuracy obtained from a single run.

Hello! Thanks for your help! I have get the 49% result of sthv1 through my many experiments! I think 49% is not easy to get out. Anyway, it is a good job. Thanks for your share!
And have you ever train GSM with sthv2? Why not writer it into paper?

from gsm.

swathikirans commented on August 23, 2024

Thank you for the nice words.

I recently noticed that the BN statistics are update in a different way when slow updating of parameters are used (batch_size=16, iter_size=2). However, I am not sure if this causes a significant impact in the final result.

Regarding sthv2, we did not train the model on this dataset since it a superset of sthv1 with less label noise and more samples. We will do an evaluation on sthv2.

from gsm.

Recommend Projects

batch_size? about gsm HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent