cooooorn / pytorch-xnor-net Goto Github PK
View Code? Open in Web Editor NEWXNOR-Net, with binary gemm and binary conv2d kernels, support both CPU and GPU.
License: BSD 3-Clause "New" or "Revised" License
XNOR-Net, with binary gemm and binary conv2d kernels, support both CPU and GPU.
License: BSD 3-Clause "New" or "Revised" License
Hi,
Thank you for providing this amazing code. When I ran my code on simple network, the training code crashes at binop.encode_rows(weight, bin_weight)
and I get this error torch.FatalError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58
I am using the same pytorch version to be consistent with your work. Could you please guide me on how to resolve this issue.
Thank you
When I run the Cifar10 I met such a problem. (Python 3.6, Pytorch 0.4.0)
Namespace(arch='VGG16', batch_size=128, cuda=True, epochs=300, evaluate=False, log_interval=100, lr=0.1, lr_epochs=100, momentum=0.9, no_cuda=False, pretrained=None, seed=1, test_batch_size=100, weight_decay=1e-05)
Files already downloaded and verified
Traceback (most recent call last):
File "main.py", line 341, in <module>
model_ori = models.VGG(name)
AttributeError: module 'models' has no attribute 'VGG'
I wrote init.py to fix it.
from .VGG import VGG
from .Bin_VGG import Bin_VGG_test
from .Bin_VGG import Bin_VGG_train
File "..\util\util.py", line 1, in
import binop
ModuleNotFoundError: No module named 'binop'
Thanks to the implement of XNOR by CUDA and pytorch, it really helps me. I'm now wondering if the implementation can really speed up the training process. After doing some experiment about MNIST, the speed of Bin_LeNet seems slower than LeNet, which seems unreasonable, so can you explain how to accelerate the training process? Thanks a lot.
Dear @cooooorn ,
Thanks for your helpful implementation. I have 2 following concerns about class BinConv2d:
This line: self.weight = nn.Parameter(torch.IntTensor(out_channels, 1 + ( in_channels * self.kernel_size[0] * self.kernel_size[1] - 1) // 32)). Why do we divide 32 in testing process? I notice that the number of weights in testing is reduces by 32. Could you clarify that?
I want to use the group convolution. How can I modify for BinConv2d?
Thank you very much.
Thanks,
Hai
Thanks for your great work!
I plan to work on bnn optimization as well for various application (generative model/classifier) on a powerful cpu.
I did preliminary work for a few hours to change the "micro_kernel" to use avx512, and it showed 4x speed up for simple one loop optimization (note -O3 won't do the optimization to vectorize). I wonder if you plan to work on this further ? and boost the performance further.
Hi,
Firstly thank you very much for providing the code for XNOR net. Just out of curiosity, I was visualizing the weight value of conv2 and fc1 layer of binary version of LeNet, but unfortunately I see that they do not have binary values. Could you kindly guide me on this?
I am visualizing it using model.conv2.weight.data
I've compiled all with no issue. When i 've tried to run VGG_Binary i got this issue:
AttributeError: module 'binop' has no attribute 'BinarySpatialConvolution_updateOutput'
Can u help me ?
No matter what i try, I can run the training. I have tried compiling binop, and it compiles fine, but running doenst work:
on Ubuntu LTS 18.04: (Python 3.6, Pytorch 4.0, no GPU)
python3 main.py --arch Bin_LeNet
Traceback (most recent call last):
File "main.py", line 18, in <module>
import models as models
File "/home/aoreskovic/GitHub/Pytorch-XNOR-Net-master/MNIST/models/__init__.py", line 2, in <module>
from .Bin_LeNet import Bin_LeNet_test
File "/home/aoreskovic/GitHub/Pytorch-XNOR-Net-master/MNIST/models/Bin_LeNet.py", line 6, in <module>
from util import BinLinear
File "../util/__init__.py", line 1, in <module>
from .util import *
File "../util/util.py", line 1, in <module>
import binop
ModuleNotFoundError: No module named 'binop'
on Win10: (Python 3.6, Pytorch 4.0, no GPU)
python main.py --arch Bin_LeNet
Traceback (most recent call last):
File "main.py", line 18, in <module>
import models as models
File "H:\Dropbox\NeuralXNOR\Pytorch-XNOR-Net\MNIST\models\__init__.py", line 2, in <module>
from .Bin_LeNet import Bin_LeNet_test
File "H:\Dropbox\NeuralXNOR\Pytorch-XNOR-Net\MNIST\models\Bin_LeNet.py", line 6, in <module>
from util import BinLinear
File "..\util\__init__.py", line 1, in <module>
from .util import *
File "..\util\util.py", line 1, in <module>
import binop
File "..\binop\__init__.py", line 3, in <module>
from ._binop import lib as _lib, ffi as _ffi
ModuleNotFoundError: No module named 'binop._binop'
It looks like there should be some module named binop.py that acts like a wrapper for _binop, but that isnt generated?
Your code is great! However, when I use the code in AlexNet model, an error occurred when saving the binary model after one epoch. The log is here:
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1523242347739/work/torch/csrc/generic/serialization.cpp line=38 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
File "main.py", line 396, in <module>
train_bin(epoch)
File "main.py", line 128, in train_bin
bin_save_state(args, model_train)
File "../util/util.py", line 36, in bin_save_state
torch.save(state, 'models/' + args.arch + '.pth')
File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 135, in save
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 117, in _with_file_like
return body(f)
File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 135, in <lambda>
return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 204, in _save
serialized_storages[key]._write_file(f)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1523242347739/work/torch/csrc/generic/serialization.cpp:38
The environment is same with yours, and I succeed in other arch you provide.
The binary AlexNet code is here:
import torch
import torch.nn as nn
import torch.nn.functional as F
import sys
sys.path.append("..")
from util import BinLinear
from util import BinConv2d
class Bin_AlexNet_train(nn.Module):
def __init__(self):
super(Bin_AlexNet_train, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
nn.BatchNorm2d(96, eps=1e-4, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
BinConv2d(96, 256, kernel_size=5, stride=1, padding=2, istrain=True),
nn.MaxPool2d(kernel_size=3, stride=2),
BinConv2d(256, 384, kernel_size=3, stride=1, padding=1, istrain=True),
BinConv2d(384, 384, kernel_size=3, stride=1, padding=1, istrain=True),
BinConv2d(384, 256, kernel_size=3, stride=1, padding=1, istrain=True),
nn.MaxPool2d(kernel_size=3, stride=2)
)
self.classifier = nn.Sequential(
BinLinear(256 * 6 * 6, 4096, istrain=True),
BinLinear(4096, 4096, istrain=True),
nn.BatchNorm1d(4096, eps=1e-3, momentum=0.1, affine=True),
nn.Linear(4096, 10)
)
def forward(self, x):
x = self.features(x)
x = x.view(-1, 256 * 6 * 6)
x = self.classifier(x)
return x
class Bin_AlexNet_test(nn.Module):
def __init__(self):
super(Bin_AlexNet_test, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
nn.BatchNorm2d(96, eps=1e-4, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
BinConv2d(96, 256, kernel_size=5, stride=1, padding=2, istrain=False),
nn.MaxPool2d(kernel_size=3, stride=2),
BinConv2d(256, 384, kernel_size=3, stride=1, padding=1, istrain=False),
BinConv2d(384, 384, kernel_size=3, stride=1, padding=1, istrain=False),
BinConv2d(384, 256, kernel_size=3, stride=1, padding=1, istrain=False),
nn.MaxPool2d(kernel_size=3, stride=2)
)
self.classifier = nn.Sequential(
BinLinear(256 * 6 * 6, 4096, istrain=False),
BinLinear(4096, 4096, istrain=False),
nn.BatchNorm1d(4096, eps=1e-3, momentum=0.1, affine=True),
nn.Linear(4096, 10)
)
def forward(self, x):
x = self.features(x)
x = x.view(-1, 256 * 6 * 6)
x = self.classifier(x)
return x
Also, the unbinarized AlexNet can run successfully.
Could you please tell me how to solve the problem? Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.