Has anyone successfully run xnor-net? I run the code dozens of times, but it has never

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

XNOR net doesn't converge about xnor-net HOT 11 OPEN

allenai commented on September 28, 2024

XNOR net doesn't converge

from xnor-net.

Comments (11)

mrastegari commented on September 28, 2024 1

Ok try to fix the precision by adding
gradParameters:mul(1e+5)
after line 184 in train.lua

from xnor-net.

mrastegari commented on September 28, 2024

Lets double check few things first:
1- Could you get the same accuracy with the pretrained models?
2- Could you train the BWN?
3- I have noticed in some versions of cudnn the precision of division makes issues in convergence. If you are using adam you can multiply all the gradients by a large number to prevent the precision error which leads to NaN.

from xnor-net.

zhaoweicai commented on September 28, 2024

hi @mrastegari
The accuracy I get for two pretrained models are 56.67 and 42.37 respectively. I can train BWN, but I stopped at epoch #30, top-1 accuracy is 25.57. But the training was several weeks ago before you fixed some bugs. But for XNOR-net, I am not able to make training converge all the time. I don't know if others encounter the same issue.

from xnor-net.

zhaoweicai commented on September 28, 2024

Just to make sure, add gradParameters:mul(1e+5) after updateBinaryGradWeight(convNodes), right? It still doesn't work for me. Has anyone experienced the same issue?

from xnor-net.

mrastegari commented on September 28, 2024

After how many iteration you see the divergence? Also try to follow the paper by replacing the updateBinaryGradWeight function by:

function updateBinaryGradWeight(convNodes)
   for i =2, #convNodes-1 do
    local n = convNodes[i].weight[1]:nElement()
    local s = convNodes[i].weight:size()
    convNodes[i].gradWeight[convNodes[i].weight:le(-1)]=0;
    convNodes[i].gradWeight[convNodes[i].weight:ge(1)]=0;
    convNodes[i].gradWeight:add(1/(n)):mul(1-1/s[2]);
   end
   if opt.nGPU >1 then
    model:syncParameters()
   end
end

from xnor-net.

zhaoweicai commented on September 28, 2024

Hi @mrastegari
Thanks for your help. But it still doesn't work for me. The training starts to diverge at the very beginning with err=nan. I start to retrain BinaryNet now. BinaryNet seems to work very well for now. XnorNet never works for me.

from xnor-net.

mrastegari commented on September 28, 2024

I just pushed a modification can you check that?

from xnor-net.

zhaoweicai commented on September 28, 2024

Thanks for your help. At first, I change '-cache' to './cache/'. It still doesn't work. Error becomes 'nan' at the beginning all the time even I run the experiments dozens of times and with different random seeds. Has anyone successfully reproduce the XNOR experiments yet? I am confused. BTW, I re-run the Binary-Net experiment, I can get the accuracy of 51.65% in the end. Does the xnor code work very well for you? What problem do you think it is?

from xnor-net.

mrastegari commented on September 28, 2024

There is definitely something wrong with your setup. I asked a friend to try on his machine and he could reproduce the same result ~43%. Which version of Binary-Net are you using? 51.65% top-1 is too good for binary-input-and-binary-weight. Do you have a code for that?

from xnor-net.

zhaoweicai commented on September 28, 2024

hi @mrastegari
I found the problem, which is running multiple gpus. When I switched to 1 gpu, the model started to converge. For multiple-gpu version, maybe I used different CUDA and cuDNN versions. Could you share which version do you use? Thanks!

from xnor-net.

mrastegari commented on September 28, 2024

I use cuda 7.5 and cudnn 5. I also had this problem with GPUs on some of the machines that had mainboard incompatibility with GPUs

from xnor-net.

XNOR net doesn't converge about xnor-net HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent