itayhubara / binarynet.pytorch Goto Github PK

View Code? Open in Web Editor NEW

491.0 491.0 126.0 26 KB

Binarized Neural Network (BNN) for pytorch

Python 100.00%

binarynet.pytorch's People

Contributors

Stargazers

Watchers

Forkers

shuixo ivansong1988 roeemz hq-liu mkusner jaiabhayk sunmy-seu kanbo0409 zhaoluo hsiaoyinn afcarl kushioo cassiehuanghahaha zyang22 zjtgit panda1230 pedronahum niluanwudidadi ying1016 shubhampachori12110095 zhixiangwang-cn simonsleo rpersie csyhhu deeptechlabs laituan245 biaominhk yuan776 0ax3 pkadambi enderdead oswinoswin xgzhang11 666dzy666 sunpengfei1122 progressforever wnma3mz deepspike sakulaki johnchoi44 mostafaelhoushi mygit007hub nigamaa lily977 unclecao yxinjiang icsalab blyucs sbuschjaeger ericallen16 changqing1234 meowfet rnri xubaozhao changewow qingzengsong qiaok nce3xin yu-cognomotiv aelkhamsi talalwasim dilinwang820 joo-ji newcodevelop tonystark-001 avecplezir mengjian0502 nononowow zhangyedi chenny0808 soumenms2015 gedeonmuhawenayo cd70zyx jiangwx locardsx xiaozhewen gaojinxiao kartheekkumar65 zhiyong-zhou muzixilin mthcom soohchoe billqwer1687 jawaechan amiraliebrahimi zmddzf xueyue404 jlenssen tonylin52 ngocqn nerdneilsfield neronjust2017 hehuanxiang john777100 joezyx chlee98 jxmorris12 urkang cpak00 colinshane

binarynet.pytorch's Issues

Questions regarding MNIST

Hello,

The last layer for MNIST is Linear, not BinarizeLinear, this will cause the weights to not necessarily be binary, correct?

Also, for the batch normalization layer, its parameters are not binary. Correct?

In BasicBlock defined in resnet_binary.py, during the forward propagation, residual is cloned from the input X as Line 47 shows. And the residual is added to the results of Convs. Why need this addition? BinaryNet is supposed to work in binary form, however, the residual is in form of floating point representation. These seem to be contradicted.

tensor.sign? no backward?

See https://github.com/itayhubara/BinaryNet.pytorch/blob/master/models/binarized_modules.py

I want to know what is the backward process of this function. Only use tensor.sign()? What is the backward values of it？

Quantize function tensor.clamp_()

In the Quantize function (binarized_modules.py, line 57), I don't quite understand why the range for tensor.clamp_() is from -128 to 128 if I want to quantize them with numBits=8. Since all the outputs from previous layers go through a Hardtanh function, should they be in the range [-1, 1] instead? Also, how are they converted to 8 bits if they are in the range [-128, 128]? e.g. if the input tensor is 127.125 and numBits=8, tensor.mul(2**(numBits-1)).round().div(2**(numBits-1)) gives me 127.1250. How is that stored in 8 bits?

Change activation to SELU

Hi ,

I want to implement shifted relu or SELU on the resnet_binary code. But when I change the code to use SELU or even ReLu I get the following error. Could you please give me some hints about what else I might have to change to replace hardtanh to SELU? Any pointers would be really appreciated.

/Users/Desktop/BNN-Imagenet/models/resnet_binary.py(59)forward()
-> residual = self.downsample(residual)
(Pdb)

code question about details

In the file main_binary.py line254, there is a attr as 'org' in p.
What does it means and when is it assigned?
I cannot find any clues from the whole project.

Is Gradient Clippping in the code as it is on the paper?

Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train():

optimizer.zero_grad()
loss.backward()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)
optimizer.step()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))
If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you

what is the exact meaning of iterating in the parameters ?

Hi , I just wanted to know what is the exact effect of these instructions:

for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)

for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))

Is there any reduction in memory?

Hi, Thank you for your pytorch version of BinaryNet.

I am wondering is there any reduction in memory. I call the function Quantize() in the file binary_modules so that I can compact each parameter to 8 bits. However, CPU still allocate 32bits to each float number, as aresult, there is no memory reduction ? Do you have any ideas?

Looking forward to your reply

Something I don't understand about the structure of alexnet

nn.Hardtanh(inplace=True),
BinarizeConv2d(int(192*self.ratioInfl), int(384*self.ratioInfl), kernel_size=3, padding=1),

this is a sample code from alexnet binary.py, what i don't understand is since you already binarize the input in

BinarizeConv2d function,

so what is the point of using hardtanh activation?

Activations in this model are ternary {-1,0,1}, not binary {-1,1}

This code uses tensor.sign() to binarize the activations and weights.

BinaryNet.pytorch/models/binarized_modules.py

Line 13 in f5c3672

return tensor.sign()

The desired behavior is to always return -1 or 1, but sign() returns 0 for values that are 0.

Batch normalization makes 0 less probable, but it can still happen. The code should probably force every activation to be either -1 or 1.

Shift based batch normalization

Is there any implementation of shift based batch normalization in Pytorch version of BinaryNet?

shift-based bn code of other version is hard to read for me..

Does the Binarize() function use STE？

Does the Binarize() function use STE？
I haven't seen the STE algorithm in this whole project.

About the network inflation factors

As I find out in the code, layers in VGG network, ResNet have an inflation factor.
Could someone please help clarify this?
Why need to inflate the network? Is there a reference to address this question?
I also checked the tensorflow repo for the BNN network, there's no inflation factor.

nn.Hardtanh as activation function

I see you use nn.Hardtanh as activation, so only weights are binarized, right?

Loss function in main_mnist.py

Line 86, log softmax
Line 94, Cross Entropy Loss

In the mnist example, you combine Cross Entropy Loss with log softmax, why not using NLLLoss + logsoftmax?

"input.size(1) != 3" and "if input.size(1) != 784" problem

binarized_modules.py
Hello, the author, I have some code in this document, I want to express anything else.
input.size(1) != 784 in the binarizelinear class,input.size(1) != 3in the BinarizeConv2D class.
What do they want to express?

resnet_binary.py Bottleneck Class Issue

First of all, please note that: I'm not really good at coding, especially with python. Hence I'm probably making some mistakes.

I have some issues with the Bottleneck class. Could you please check these out? According to the class initializer (in resnet_binary.py file):

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = BinarizeConv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = BinarizeConv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = BinarizeConv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.tanh = nn.Hardtanh(inplace=True)
        self.downsample = downsample
        self.stride = stride

I can't see any defined variables, self.do_bntan (line 103) & self.tanh2 (line 105). Both of these were defined in BasicBlock class, but Bottleneck is not the super class of it. And I can't find any connection between these 2 classes. Hence can't figure out how these variables used, starting from line 103

Thank you for your help and attention

no backward pass?

i'm printing the weights of the network and they are not changing. It makes sense since all the binarization is happening only on the data (not in the graph, so weight will not update)

how can this code trains a networks for scratch with binarization?

The inputs are float

Hello, thank you for making PyTorch version of Binary Networks available. Now it gets easier to do research.

In the paper, the input features are given below.

But in this implementation, the inputs are float like [0.26962968707084656, 0.14762534201145172, -1.804444432258606...]. I just print the input features.

I'd like to know your idea on this. Thank you.

Internal state is float between 0 and 1, not binary?

Hi, I noticed that the activations are not binary, but floats between 0 and 1, and I was wandering if there is a bug.
The usage of floats is due to the fact that, also in the binary models, the hard tanh function is used, e.g.:

self.tanh2 = nn.Hardtanh(inplace=True)

In the paper, however, it is mentioned that the activation function should behave as a sign function in the forward step - is this correct? Thanks,

clamp problem

It seems that for cifar10, there is no clamp_(-1,1) for updated weights.

Do you train 2500 epochs for resnet18 cifar10?

The default value for epochs is 2500 (https://github.com/itayhubara/BinaryNet.pytorch/blob/master/main_binary.py#L48) and in the readme there is no specification of number of epochs.

ImageNet code in resnet_binary(bug report)

The class ResNet_imagenet in file resnet_binary.py line 155, the bn2,bn3,tanh1,tanh2 and logsoftmax are missing.
Also, would you like to share the training log of imagenet on resnet18?

why does the alexnet_binary has Hardtanh activation when alexnet has ReLU activation?

@itayhubara : I noticed that all the binarized neural network files alexnet_binary.py, resnet_binary.py, vgg_cifar10_binary.py have Hardtanh activation function whereas their respective parent architectures in the files alexnet.py, resnet.py, vgg_cifar_10 have ReLU activation function. Is there any specific reason for this? However the Theano implementaion of Binary Connect code here uses ReLU activation when we binarize just the weights.

Can the code be used for 01 activation?

I wonder this code can be used for outputing only 0 or 1 for weights?How

The weights, and biases of the conv and bn are not binary (They are floats) !

Hi, after training the model, I checked the weights, and the biases of each conv and bn layers and they are floats. I am not sure what I am missing here, but the paper specifically talked about the weights and activations being constrained to +1/-1, which is not the case !! I appreciate any help here !

How to use main_binary_hinge.py

@itayhubara
Hi,
I wonder what the file main_binary_hinge.py is used for?
It looks similar to the main_binary.py.
How can I use it ?
When I run the code, it says NameError: global name 'search_binarized_modules' is not defined.
Thank you.

binary_alexnet ratioInfl=3

In your binary_alexnet implementation you set self.ratioInfl=3 here. Is this inflation used to obtain the 41.8% top-1 accuracy on ImageNet reported in your JMRL paper?

how about the speedup and memory saving

hi, thanks for your great jobs.
i want to know whether the BinOp have noticeable effect on model size and inference speed compared to NIN model without BinOp

Activation for BinaryNet

Hello,

I noticed that torch.nn.Hardtanh is used for activation functions in BinaryNet. This is meant to make the model trainable, as introduced in the BNN paper. However, in the inference phase (the validate() function in main_binary.py), shoudn't the activation function be changed to sign function so that the intermediate results are binary?

Thanks!