Hi, thanks for sharing this MobileNets! I am just wandering if I can reproduce sam

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How can I reproduce the results on Caffe? about mobilenet-caffe HOT 16 CLOSED

shicai commented on May 13, 2024

How can I reproduce the results on Caffe?

from mobilenet-caffe.

Comments (16)

shicai commented on May 13, 2024 3

if you want stable bn training, you'd better set batch size to 16 or even larger. but for detection tasks, batch size is always set to 1 or 2, due to memory reasons.

from mobilenet-caffe.

shicai commented on May 13, 2024 2

I suggest not training this model from scratch using caffe, since caffe use group to implement channel-wise convolution, which is very very slow and inefficient.
If possible, you can use lr=1e-3 and wd=1e-4 to finetune the pretrained model for your own task.

from mobilenet-caffe.

shicai commented on May 13, 2024 1

If you use the pretrained weights for detection, I sugguest you fixing all the BN parameters by setting lr_mult = 0 and decay_mult = 0.

from mobilenet-caffe.

shicai commented on May 13, 2024 1

@ryusaeba btw, to fix all the BN parameters, you should also set use_global_stats: true in batch_norm_param so as to keep bn mean/variance unchanged during fine tuning stage.

from mobilenet-caffe.

handong1587 commented on May 13, 2024

Thanks for your advice!

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

Hi @shicai

If I would like to fine tune the pretrained model, What number would you suggest for Convolution, BatchNorm and Scale layer? According to your above suggestion, I guess that would be
lr=1e-3 and wd=1e-4 for Convolution.

For BatchNorm, would be shown as below
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
}

Scale layer would be
layer {
name: "conv1/scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}

Please help me to check param { lr_mult and decay_mult }. Thanks :)

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

Thanks for your suggestion. I will finetune Convolution layer only and fix all the BN parameters 👍

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

wow, that is a great helpful reminder. many many thanks :)

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

@shicai
If we fix all BN parameters (lr_mult=decay_mult=0 and use_global_stats: true), we also don't want to finetune the convolution at base network , right? I have this question is because the mean/variance maybe different when we finetune the convolution at base network.
Please correct me if my understanding is incorrect. Very appreciated.

from mobilenet-caffe.

shicai commented on May 13, 2024

It's ok to fine tune conv layers when fixing bn parameters, since bn mean/var parameters are not stable during detection training stage.

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

@shicai
So if our target is classification, the finetune setting for BN parameters would be like
what I posted before and with use_global_stats: false ? (#2 (comment))

from mobilenet-caffe.

shicai commented on May 13, 2024

yes.

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

@shicai
Great thanks! Your experience really help me a lots 👍 I am glad to have discussion with you.

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

@shicai
I have one more question about mean/var parameters. Why are these parameters not stable during detection training stage? Please share your experience with me. Many Thanks:)
Originally, I think this is because detection network use negative sample to do training but I am not sure the real reason.

from mobilenet-caffe.

shicai commented on May 13, 2024

I think it is mainly because batch size for training detection models is very small.

from mobilenet-caffe.

ryusaeba commented on May 13, 2024

Could you share me a rough value about batch size? what number is belonging to small or large?

from mobilenet-caffe.

How can I reproduce the results on Caffe? about mobilenet-caffe HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent