cooooorn / pytorch-xnor-net Goto Github PK

View Code? Open in Web Editor NEW

78.0 3.0 21.0 4.74 MB

XNOR-Net, with binary gemm and binary conv2d kernels, support both CPU and GPU.

License: BSD 3-Clause "New" or "Revised" License

Python 47.47% Makefile 0.96% C++ 21.08% C 24.95% Shell 0.18% Cuda 5.36%

xnor-net xnor-convolutions pytorch cuda c binary-op

pytorch-xnor-net's People

Contributors

Stargazers

Watchers

pytorch-xnor-net's Issues

the model of VGG accuracy is very low

the accurary of the model classification is very low when i execute the comand"$ python3 main.py --arch VGG16",only 10.0%;i want to ask about the soluton，thank you.(基于CIFAR-10的VGG模型的准确度非常低，在我还没对代码作出修改时，只有10%，想问下z造成问题的原因或是解决方案；MNIST数据集的实验结果与作者一致；谢谢)

Out of memory when saving binary weights

Hi,
Thank you for providing this amazing code. When I ran my code on simple network, the training code crashes at binop.encode_rows(weight, bin_weight) and I get this error torch.FatalError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58
I am using the same pytorch version to be consistent with your work. Could you please guide me on how to resolve this issue.
Thank you

AttributeError: module 'models' has no attribute 'VGG'

When I run the Cifar10 I met such a problem. (Python 3.6, Pytorch 0.4.0)

Namespace(arch='VGG16', batch_size=128, cuda=True, epochs=300, evaluate=False, log_interval=100, lr=0.1, lr_epochs=100, momentum=0.9, no_cuda=False, pretrained=None, seed=1, test_batch_size=100, weight_decay=1e-05)
Files already downloaded and verified
Traceback (most recent call last):
  File "main.py", line 341, in <module>
    model_ori = models.VGG(name)
AttributeError: module 'models' has no attribute 'VGG'

I wrote init.py to fix it.

from .VGG import VGG
from .Bin_VGG import Bin_VGG_test
from .Bin_VGG import Bin_VGG_train

No module named 'binop'

File "..\util\util.py", line 1, in
import binop
ModuleNotFoundError: No module named 'binop'

XNOR acceleration

Thanks to the implement of XNOR by CUDA and pytorch, it really helps me. I'm now wondering if the implementation can really speed up the training process. After doing some experiment about MNIST, the speed of Bin_LeNet seems slower than LeNet, which seems unreasonable, so can you explain how to accelerate the training process? Thanks a lot.

BinConv2D for group convolution

Dear @cooooorn ,

Thanks for your helpful implementation. I have 2 following concerns about class BinConv2d:

This line: self.weight = nn.Parameter(torch.IntTensor(out_channels, 1 + ( in_channels * self.kernel_size[0] * self.kernel_size[1] - 1) // 32)). Why do we divide 32 in testing process? I notice that the number of weights in testing is reduces by 32. Could you clarify that?
I want to use the group convolution. How can I modify for BinConv2d?
Thank you very much.

Thanks,
Hai

Further optimize gemm

Thanks for your great work!

I plan to work on bnn optimization as well for various application (generative model/classifier) on a powerful cpu.
I did preliminary work for a few hours to change the "micro_kernel" to use avx512, and it showed 4x speed up for simple one loop optimization (note -O3 won't do the optimization to vectorize). I wonder if you plan to work on this further ? and boost the performance further.

Query not an issue

Hi,
Firstly thank you very much for providing the code for XNOR net. Just out of curiosity, I was visualizing the weight value of conv2 and fc1 layer of binary version of LeNet, but unfortunately I see that they do not have binary values. Could you kindly guide me on this?
I am visualizing it using model.conv2.weight.data

AttributeError: module 'binop' has no attribute 'BinarySpatialConvolution_updateOutput'

I've compiled all with no issue. When i 've tried to run VGG_Binary i got this issue:

AttributeError: module 'binop' has no attribute 'BinarySpatialConvolution_updateOutput'

Can u help me ?

No module named 'binop'

No matter what i try, I can run the training. I have tried compiling binop, and it compiles fine, but running doenst work:

on Ubuntu LTS 18.04: (Python 3.6, Pytorch 4.0, no GPU)

python3 main.py --arch Bin_LeNet
Traceback (most recent call last):
  File "main.py", line 18, in <module>
    import models as models
  File "/home/aoreskovic/GitHub/Pytorch-XNOR-Net-master/MNIST/models/__init__.py", line 2, in <module>
    from .Bin_LeNet import Bin_LeNet_test
  File "/home/aoreskovic/GitHub/Pytorch-XNOR-Net-master/MNIST/models/Bin_LeNet.py", line 6, in <module>
    from util import BinLinear
  File "../util/__init__.py", line 1, in <module>
    from .util import *
  File "../util/util.py", line 1, in <module>
    import binop
ModuleNotFoundError: No module named 'binop'

on Win10: (Python 3.6, Pytorch 4.0, no GPU)

python main.py --arch Bin_LeNet
Traceback (most recent call last):
  File "main.py", line 18, in <module>
    import models as models
  File "H:\Dropbox\NeuralXNOR\Pytorch-XNOR-Net\MNIST\models\__init__.py", line 2, in <module>
    from .Bin_LeNet import Bin_LeNet_test
  File "H:\Dropbox\NeuralXNOR\Pytorch-XNOR-Net\MNIST\models\Bin_LeNet.py", line 6, in <module>
    from util import BinLinear
  File "..\util\__init__.py", line 1, in <module>
    from .util import *
  File "..\util\util.py", line 1, in <module>
    import binop
  File "..\binop\__init__.py", line 3, in <module>
    from ._binop import lib as _lib, ffi as _ffi
ModuleNotFoundError: No module named 'binop._binop'

It looks like there should be some module named binop.py that acts like a wrapper for _binop, but that isnt generated?

Is there any files lost in binop?

Use the code with AlexNet but fail when saving binary model

Your code is great! However, when I use the code in AlexNet model, an error occurred when saving the binary model after one epoch. The log is here:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1523242347739/work/torch/csrc/generic/serialization.cpp line=38 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
  File "main.py", line 396, in <module>
    train_bin(epoch)
  File "main.py", line 128, in train_bin
    bin_save_state(args, model_train)
  File "../util/util.py", line 36, in bin_save_state
    torch.save(state, 'models/' + args.arch + '.pth')
  File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 135, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 117, in _with_file_like
    return body(f)
  File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 135, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/fjb/miniconda3/envs/pytorch0.3/lib/python3.5/site-packages/torch/serialization.py", line 204, in _save
    serialized_storages[key]._write_file(f)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1523242347739/work/torch/csrc/generic/serialization.cpp:38

The environment is same with yours, and I succeed in other arch you provide.

The binary AlexNet code is here:

import torch
import torch.nn as nn
import torch.nn.functional as F
import sys
sys.path.append("..")
from util import BinLinear
from util import BinConv2d

class Bin_AlexNet_train(nn.Module):

    def __init__(self):
        super(Bin_AlexNet_train, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96, eps=1e-4, momentum=0.1, affine=True),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(96, 256, kernel_size=5, stride=1, padding=2, istrain=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(256, 384, kernel_size=3, stride=1, padding=1, istrain=True),
            BinConv2d(384, 384, kernel_size=3, stride=1, padding=1, istrain=True),
            BinConv2d(384, 256, kernel_size=3, stride=1, padding=1, istrain=True),
            nn.MaxPool2d(kernel_size=3, stride=2)
        )
        self.classifier = nn.Sequential(
            BinLinear(256 * 6 * 6, 4096, istrain=True),
            BinLinear(4096, 4096, istrain=True),
            nn.BatchNorm1d(4096, eps=1e-3, momentum=0.1, affine=True),
            nn.Linear(4096, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 256 * 6 * 6)
        x = self.classifier(x)
        return x

class Bin_AlexNet_test(nn.Module):

    def __init__(self):
        super(Bin_AlexNet_test, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96, eps=1e-4, momentum=0.1, affine=True),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(96, 256, kernel_size=5, stride=1, padding=2, istrain=False),
            nn.MaxPool2d(kernel_size=3, stride=2),
            BinConv2d(256, 384, kernel_size=3, stride=1, padding=1, istrain=False),
            BinConv2d(384, 384, kernel_size=3, stride=1, padding=1, istrain=False),
            BinConv2d(384, 256, kernel_size=3, stride=1, padding=1, istrain=False),
            nn.MaxPool2d(kernel_size=3, stride=2)
        )
        self.classifier = nn.Sequential(
            BinLinear(256 * 6 * 6, 4096, istrain=False),
            BinLinear(4096, 4096, istrain=False),
            nn.BatchNorm1d(4096, eps=1e-3, momentum=0.1, affine=True),
            nn.Linear(4096, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 256 * 6 * 6)
        x = self.classifier(x)
        return x

Also, the unbinarized AlexNet can run successfully.

Could you please tell me how to solve the problem? Thank you!

cooooorn / pytorch-xnor-net Goto Github PK

pytorch-xnor-net's People

Contributors

Stargazers

Watchers

Forkers

pytorch-xnor-net's Issues

Recommend Projects

Recommend Topics

Recommend Org