itayhubara / binarynet.pytorch Goto Github PK
View Code? Open in Web Editor NEWBinarized Neural Network (BNN) for pytorch
Binarized Neural Network (BNN) for pytorch
Hello,
The last layer for MNIST is Linear, not BinarizeLinear, this will cause the weights to not necessarily be binary, correct?
Also, for the batch normalization layer, its parameters are not binary. Correct?
In BasicBlock defined in resnet_binary.py, during the forward propagation, residual is cloned from the input X as Line 47 shows. And the residual is added to the results of Convs. Why need this addition? BinaryNet is supposed to work in binary form, however, the residual is in form of floating point representation. These seem to be contradicted.
See https://github.com/itayhubara/BinaryNet.pytorch/blob/master/models/binarized_modules.py
I want to know what is the backward process of this function. Only use tensor.sign()? What is the backward values of it?
In the Quantize function (binarized_modules.py, line 57), I don't quite understand why the range for tensor.clamp_() is from -128 to 128 if I want to quantize them with numBits=8. Since all the outputs from previous layers go through a Hardtanh function, should they be in the range [-1, 1] instead? Also, how are they converted to 8 bits if they are in the range [-128, 128]? e.g. if the input tensor is 127.125 and numBits=8, tensor.mul(2**(numBits-1)).round().div(2**(numBits-1)) gives me 127.1250. How is that stored in 8 bits?
Hi ,
I want to implement shifted relu or SELU on the resnet_binary code. But when I change the code to use SELU or even ReLu I get the following error. Could you please give me some hints about what else I might have to change to replace hardtanh to SELU? Any pointers would be really appreciated.
/Users/Desktop/BNN-Imagenet/models/resnet_binary.py(59)forward()
-> residual = self.downsample(residual)
(Pdb)
In the file main_binary.py line254, there is a attr as 'org' in p.
What does it means and when is it assigned?
I cannot find any clues from the whole project.
Is Gradient clipping gr = gq1|r|≤1 still used in the code?
The only part I see clipping is p.org.copy_(p.data.clamp_(-1,1)) in def train():
optimizer.zero_grad()
loss.backward()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)
optimizer.step()
for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))
If it is a gradient clipping, shouldn't that be used before optimizer.step() ?
And I also don't get the meaning of p.org.copy_(p.data.clamp_(-1,1)) since p.org is Binarized later afterall (Same result if p.data is not clamped).
Thank you
Hi , I just wanted to know what is the exact effect of these instructions:
for p in list(model.parameters()):
if hasattr(p,'org'):
p.data.copy_(p.org)
for p in list(model.parameters()):
if hasattr(p,'org'):
p.org.copy_(p.data.clamp_(-1,1))
Hi, Thank you for your pytorch version of BinaryNet.
I am wondering is there any reduction in memory. I call the function Quantize() in the file binary_modules so that I can compact each parameter to 8 bits. However, CPU still allocate 32bits to each float number, as aresult, there is no memory reduction ? Do you have any ideas?
Looking forward to your reply
nn.Hardtanh(inplace=True),
BinarizeConv2d(int(192*self.ratioInfl), int(384*self.ratioInfl), kernel_size=3, padding=1),
this is a sample code from alexnet binary.py, what i don't understand is since you already binarize the input in
BinarizeConv2d function,
so what is the point of using hardtanh activation?
This code uses tensor.sign()
to binarize the activations and weights.
The desired behavior is to always return -1 or 1, but sign()
returns 0 for values that are 0.
Batch normalization makes 0 less probable, but it can still happen. The code should probably force every activation to be either -1 or 1.
Is there any implementation of shift based batch normalization in Pytorch version of BinaryNet?
shift-based bn code of other version is hard to read for me..
Does the Binarize() function use STE?
I haven't seen the STE algorithm in this whole project.
As I find out in the code, layers in VGG network, ResNet have an inflation factor.
Could someone please help clarify this?
Why need to inflate the network? Is there a reference to address this question?
I also checked the tensorflow repo for the BNN network, there's no inflation factor.
I see you use nn.Hardtanh as activation, so only weights are binarized, right?
Line 86, log softmax
Line 94, Cross Entropy Loss
In the mnist example, you combine Cross Entropy Loss with log softmax, why not using NLLLoss + logsoftmax?
binarized_modules.py
Hello, the author, I have some code in this document, I want to express anything else.
input.size(1) != 784
in the binarizelinear class,input.size(1) != 3
in the BinarizeConv2D class.
What do they want to express?
First of all, please note that: I'm not really good at coding, especially with python. Hence I'm probably making some mistakes.
I have some issues with the Bottleneck class. Could you please check these out? According to the class initializer (in resnet_binary.py file):
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = BinarizeConv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = BinarizeConv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = BinarizeConv2d(planes, planes * 4, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.tanh = nn.Hardtanh(inplace=True)
self.downsample = downsample
self.stride = stride
I can't see any defined variables, self.do_bntan (line 103) & self.tanh2 (line 105). Both of these were defined in BasicBlock class, but Bottleneck is not the super class of it. And I can't find any connection between these 2 classes. Hence can't figure out how these variables used, starting from line 103
Thank you for your help and attention
i'm printing the weights of the network and they are not changing. It makes sense since all the binarization is happening only on the data (not in the graph, so weight will not update)
how can this code trains a networks for scratch with binarization?
Hello, thank you for making PyTorch version of Binary Networks available. Now it gets easier to do research.
In the paper, the input features are given below.
But in this implementation, the inputs are float like [0.26962968707084656, 0.14762534201145172, -1.804444432258606...]. I just print the input features.
I'd like to know your idea on this. Thank you.
Hi, I noticed that the activations are not binary, but floats between 0 and 1, and I was wandering if there is a bug.
The usage of floats is due to the fact that, also in the binary models, the hard tanh function is used, e.g.:
self.tanh2 = nn.Hardtanh(inplace=True)
In the paper, however, it is mentioned that the activation function should behave as a sign function in the forward step - is this correct? Thanks,
It seems that for cifar10, there is no clamp_(-1,1)
for updated weights.
The default value for epochs is 2500 (https://github.com/itayhubara/BinaryNet.pytorch/blob/master/main_binary.py#L48) and in the readme there is no specification of number of epochs.
The class ResNet_imagenet in file resnet_binary.py line 155, the bn2,bn3,tanh1,tanh2 and logsoftmax are missing.
Also, would you like to share the training log of imagenet on resnet18?
@itayhubara : I noticed that all the binarized neural network files alexnet_binary.py, resnet_binary.py, vgg_cifar10_binary.py
have Hardtanh activation function whereas their respective parent architectures in the files alexnet.py, resnet.py, vgg_cifar_10
have ReLU activation function. Is there any specific reason for this? However the Theano implementaion of Binary Connect code here uses ReLU activation when we binarize just the weights.
I wonder this code can be used for outputing only 0 or 1 for weights?How
Hi, after training the model, I checked the weights, and the biases of each conv and bn layers and they are floats. I am not sure what I am missing here, but the paper specifically talked about the weights and activations being constrained to +1/-1, which is not the case !! I appreciate any help here !
@itayhubara
Hi,
I wonder what the file main_binary_hinge.py is used for?
It looks similar to the main_binary.py.
How can I use it ?
When I run the code, it says NameError: global name 'search_binarized_modules' is not defined.
Thank you.
In your binary_alexnet implementation you set self.ratioInfl=3
here. Is this inflation used to obtain the 41.8% top-1 accuracy on ImageNet reported in your JMRL paper?
hi, thanks for your great jobs.
i want to know whether the BinOp have noticeable effect on model size and inference speed compared to NIN model without BinOp
Hello,
I noticed that torch.nn.Hardtanh is used for activation functions in BinaryNet. This is meant to make the model trainable, as introduced in the BNN paper. However, in the inference phase (the validate() function in main_binary.py), shoudn't the activation function be changed to sign function so that the intermediate results are binary?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.