Giter Club home page Giter Club logo

binarynet.pytorch's People

Contributors

enderdead avatar itayhubara avatar ussamazahid96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

binarynet.pytorch's Issues

Is there any reduction in memory?

Hi, Thank you for your pytorch version of BinaryNet.

I am wondering is there any reduction in memory. I call the function Quantize() in the file binary_modules so that I can compact each parameter to 8 bits. However, CPU still allocate 32bits to each float number, as aresult, there is no memory reduction ? Do you have any ideas?

Looking forward to your reply

Internal state is float between 0 and 1, not binary?

Hi, I noticed that the activations are not binary, but floats between 0 and 1, and I was wandering if there is a bug.
The usage of floats is due to the fact that, also in the binary models, the hard tanh function is used, e.g.:

self.tanh2 = nn.Hardtanh(inplace=True)

In the paper, however, it is mentioned that the activation function should behave as a sign function in the forward step - is this correct? Thanks,

How to use main_binary_hinge.py

@itayhubara
Hi,
I wonder what the file main_binary_hinge.py is used for?
It looks similar to the main_binary.py.
How can I use it ?
When I run the code, it says NameError: global name 'search_binarized_modules' is not defined.
Thank you.

Change activation to SELU

Hi ,

I want to implement shifted relu or SELU on the resnet_binary code. But when I change the code to use SELU or even ReLu I get the following error. Could you please give me some hints about what else I might have to change to replace hardtanh to SELU? Any pointers would be really appreciated.

/Users/Desktop/BNN-Imagenet/models/resnet_binary.py(59)forward()
-> residual = self.downsample(residual)
(Pdb)

Activation for BinaryNet

Hello,

I noticed that torch.nn.Hardtanh is used for activation functions in BinaryNet. This is meant to make the model trainable, as introduced in the BNN paper. However, in the inference phase (the validate() function in main_binary.py), shoudn't the activation function be changed to sign function so that the intermediate results are binary?

Thanks!

ImageNet code in resnet_binary(bug report)

The class ResNet_imagenet in file resnet_binary.py line 155, the bn2,bn3,tanh1,tanh2 and logsoftmax are missing.
Also, would you like to share the training log of imagenet on resnet18?

Questions regarding MNIST

Hello,

The last layer for MNIST is Linear, not BinarizeLinear, this will cause the weights to not necessarily be binary, correct?

Also, for the batch normalization layer, its parameters are not binary. Correct?

Is Gradient Clippping in the code as it is on the paper?

Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train():

optimizer.zero_grad()
loss.backward()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)
optimizer.step()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))
If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you

no backward pass?

i'm printing the weights of the network and they are not changing. It makes sense since all the binarization is happening only on the data (not in the graph, so weight will not update)

how can this code trains a networks for scratch with binarization?

resnet_binary.py Bottleneck Class Issue

First of all, please note that: I'm not really good at coding, especially with python. Hence I'm probably making some mistakes.

I have some issues with the Bottleneck class. Could you please check these out? According to the class initializer (in resnet_binary.py file):

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = BinarizeConv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = BinarizeConv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = BinarizeConv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.tanh = nn.Hardtanh(inplace=True)
        self.downsample = downsample
        self.stride = stride

I can't see any defined variables, self.do_bntan (line 103) & self.tanh2 (line 105). Both of these were defined in BasicBlock class, but Bottleneck is not the super class of it. And I can't find any connection between these 2 classes. Hence can't figure out how these variables used, starting from line 103

Thank you for your help and attention

clamp problem

It seems that for cifar10, there is no clamp_(-1,1) for updated weights.

The inputs are float

Hello, thank you for making PyTorch version of Binary Networks available. Now it gets easier to do research.

In the paper, the input features are given below.

2018-01-15 11 40 05

But in this implementation, the inputs are float like [0.26962968707084656, 0.14762534201145172, -1.804444432258606...]. I just print the input features.

I'd like to know your idea on this. Thank you.

Loss function in main_mnist.py

Line 86, log softmax
Line 94, Cross Entropy Loss

In the mnist example, you combine Cross Entropy Loss with log softmax, why not using NLLLoss + logsoftmax?

Something I don't understand about the structure of alexnet

nn.Hardtanh(inplace=True),
BinarizeConv2d(int(192*self.ratioInfl), int(384*self.ratioInfl), kernel_size=3, padding=1),

this is a sample code from alexnet binary.py, what i don't understand is since you already binarize the input in

BinarizeConv2d function,

so what is the point of using hardtanh activation?

why does the alexnet_binary has Hardtanh activation when alexnet has ReLU activation?

@itayhubara : I noticed that all the binarized neural network files alexnet_binary.py, resnet_binary.py, vgg_cifar10_binary.py have Hardtanh activation function whereas their respective parent architectures in the files alexnet.py, resnet.py, vgg_cifar_10 have ReLU activation function. Is there any specific reason for this? However the Theano implementaion of Binary Connect code here uses ReLU activation when we binarize just the weights.

how about the speedup and memory saving

hi, thanks for your great jobs.
i want to know whether the BinOp have noticeable effect on model size and inference speed compared to NIN model without BinOp

Quantize function tensor.clamp_()

In the Quantize function (binarized_modules.py, line 57), I don't quite understand why the range for tensor.clamp_() is from -128 to 128 if I want to quantize them with numBits=8. Since all the outputs from previous layers go through a Hardtanh function, should they be in the range [-1, 1] instead? Also, how are they converted to 8 bits if they are in the range [-128, 128]? e.g. if the input tensor is 127.125 and numBits=8, tensor.mul(2**(numBits-1)).round().div(2**(numBits-1)) gives me 127.1250. How is that stored in 8 bits?

About the network inflation factors

As I find out in the code, layers in VGG network, ResNet have an inflation factor.
Could someone please help clarify this?
Why need to inflate the network? Is there a reference to address this question?
I also checked the tensorflow repo for the BNN network, there's no inflation factor.

Shift based batch normalization

Is there any implementation of shift based batch normalization in Pytorch version of BinaryNet?

shift-based bn code of other version is hard to read for me..

code question about details

In the file main_binary.py line254, there is a attr as 'org' in p.
What does it means and when is it assigned?
I cannot find any clues from the whole project.

what is the exact meaning of iterating in the parameters ?

Hi , I just wanted to know what is the exact effect of these instructions:

for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)

for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))

"input.size(1) != 3" and "if input.size(1) != 784" problem

binarized_modules.py
Hello, the author, I have some code in this document, I want to express anything else.
input.size(1) != 784 in the binarizelinear class,input.size(1) != 3in the BinarizeConv2D class.
What do they want to express?

add Residual in basicblock

In BasicBlock defined in resnet_binary.py, during the forward propagation, residual is cloned from the input X as Line 47 shows. And the residual is added to the results of Convs. Why need this addition? BinaryNet is supposed to work in binary form, however, the residual is in form of floating point representation. These seem to be contradicted.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.