Giter Club home page Giter Club logo

Comments (16)

shicai avatar shicai commented on May 13, 2024 3

if you want stable bn training, you'd better set batch size to 16 or even larger. but for detection tasks, batch size is always set to 1 or 2, due to memory reasons.

from mobilenet-caffe.

shicai avatar shicai commented on May 13, 2024 2

I suggest not training this model from scratch using caffe, since caffe use group to implement channel-wise convolution, which is very very slow and inefficient.
If possible, you can use lr=1e-3 and wd=1e-4 to finetune the pretrained model for your own task.

from mobilenet-caffe.

shicai avatar shicai commented on May 13, 2024 1

If you use the pretrained weights for detection, I sugguest you fixing all the BN parameters by setting lr_mult = 0 and decay_mult = 0.

from mobilenet-caffe.

shicai avatar shicai commented on May 13, 2024 1

@ryusaeba btw, to fix all the BN parameters, you should also set use_global_stats: true in batch_norm_param so as to keep bn mean/variance unchanged during fine tuning stage.

from mobilenet-caffe.

handong1587 avatar handong1587 commented on May 13, 2024

Thanks for your advice!

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

Hi @shicai

If I would like to fine tune the pretrained model, What number would you suggest for Convolution, BatchNorm and Scale layer? According to your above suggestion, I guess that would be
lr=1e-3 and wd=1e-4 for Convolution.

For BatchNorm, would be shown as below
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
}

Scale layer would be
layer {
name: "conv1/scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}

Please help me to check param { lr_mult and decay_mult }. Thanks :)

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

Thanks for your suggestion. I will finetune Convolution layer only and fix all the BN parameters 👍

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

wow, that is a great helpful reminder. many many thanks :)

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

@shicai
If we fix all BN parameters (lr_mult=decay_mult=0 and use_global_stats: true), we also don't want to finetune the convolution at base network , right? I have this question is because the mean/variance maybe different when we finetune the convolution at base network.
Please correct me if my understanding is incorrect. Very appreciated.

from mobilenet-caffe.

shicai avatar shicai commented on May 13, 2024

It's ok to fine tune conv layers when fixing bn parameters, since bn mean/var parameters are not stable during detection training stage.

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

@shicai
So if our target is classification, the finetune setting for BN parameters would be like
what I posted before and with use_global_stats: false ? (#2 (comment))

from mobilenet-caffe.

shicai avatar shicai commented on May 13, 2024

yes.

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

@shicai
Great thanks! Your experience really help me a lots 👍 I am glad to have discussion with you.

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

@shicai
I have one more question about mean/var parameters. Why are these parameters not stable during detection training stage? Please share your experience with me. Many Thanks:)
Originally, I think this is because detection network use negative sample to do training but I am not sure the real reason.

from mobilenet-caffe.

shicai avatar shicai commented on May 13, 2024

I think it is mainly because batch size for training detection models is very small.

from mobilenet-caffe.

ryusaeba avatar ryusaeba commented on May 13, 2024

Could you share me a rough value about batch size? what number is belonging to small or large?

from mobilenet-caffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.