wenwei202 / caffe Goto Github PK

View Code? Open in Web Editor NEW

375.0 375.0 134.0 71.29 MB

Caffe for Sparse and Low-rank Deep Neural Networks

License: Other

CMake 2.65% Makefile 0.67% Shell 0.82% C++ 77.35% Python 11.68% MATLAB 0.85% Cuda 5.91% Dockerfile 0.07%

acceleration caffe compression deep-neural-networks low-rank-approximation sparse-convolution sparsity

caffe's People

Contributors

Stargazers

Watchers

Forkers

jspark1105 wangyandan418 zhaishengfu loliod paseam azhangwei tinyloop ilovecv codeaudit jiaobin boleejia anna0509 ouya-bytes xuguozhi hyzcn zoukaiwei mvpduncan collector-m akumar14 jren2017 westamine lijun900302 arsenluca sandy4321 kevin0932 ltj2013 liuwang1 kobeliu85 gurgese qingsong99 suzhenghang jdc08161063 tsingjinyun dengql millx2021 zengjianping xugithub1 liuguoyou poonono trantorrepository gelansheshed realwill ananddb90 jangkyung milestonesvn weitaoatvison starstylesky mahdaneh qzhong0605 hezhenjun123 blankit speedup4dl zyore kingofoz dengshuo fanxianyou xtanitfy lawrencewxj xhhong wxw420 xingjinglu zhang405744522 lyk125 ersanliqiao shi27feng zyang22 dreadlord1984 wangzhenhua2015 code1600 lanselott huangyingsong xiahaifeng1995 zhangleiedu reinaldomaslim taeyoung-syg wjwenfdu huangchaohuangchao legolas123 ewenwan wxbxj guoqiang01486 rockjicks pandinosaurus yuchaoli andrew05200 lqs19881030 k9sret taotaouncle deplench fjgexiang franciszchen sunshinejzj issac8huxley yellowaddice issac123huxley kujin66 wxinbeings anantshah200 simon821 furuame

caffe's Issues

nn_decomposer does not support kernel_h and kernel_w

nn_decomposer.py expects cur_layer_param.convolution_param.kernel_size._values to be set. This means the convolutional kernel needs to be square and kernel_h and kernel_w may not be different. Is there any special reason for this?

A pytorch re-implementation of Structured Sparsity Learning

hi @wenwei202 , Thank you for your sharing. Based on your paper, I implemented ZJCV/SSL using pytorch, including train-prune-finetuing for VGGNet and ResNet

For VGGNet, I realized filter/channel/filter_and_channel pruning;
For ResNet, I realized depth pruning.

From the experimental results, we can see that the pruning effect is very good. Thank you again for SSL

low rank deep neural networks

Issue summary

I am working on low rank deep neural networks, to speedup the testing for better deployability. Anyone is working on similar stuff?

Steps to reproduce

Code is in https://github.com/wenwei202/caffe/tree/sfm.
Related publication in ICCV 2017

No Learning for NiN on cifar10

Issue summary

When I define my network according to Network in Network architecture without using SSL regularization (similar to lenet_train_test.prototxt, which no SSL regularization has been used), the network can not train, however I played with LR and weight decays. Loss value during the training iterations is always constant value : Train net output #0: loss = 2.30259 (* 1 = 2.30259 loss)
Could you please help me with that?

Steps to reproduce

I am using docker container and here is my Dockerfile, which the image has been built from it:
`FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
LABEL maintainer [email protected]

RUN apt-get update && apt-get install -y --no-install-recommends
build-essential
cmake
git
wget
libatlas-base-dev
libboost-all-dev
libgflags-dev
libgoogle-glog-dev
libhdf5-serial-dev
libleveldb-dev
liblmdb-dev
libopencv-dev
libprotobuf-dev
libsnappy-dev
protobuf-compiler
python-dev
python-numpy
python-pip
python-setuptools
python-scipy &&
rm -rf /var/lib/apt/lists/*

ENV CAFFE_ROOT=/opt/caffe
WORKDIR $CAFFE_ROOT

ENV CLONE_TAG=1.0

RUN git clone -b scnn --depth 1 https://github.com/wenwei202/caffe.git . &&
pip install --upgrade pip &&
cd python && for req in $(cat requirements.txt) pydot; do pip install $req; done && cd .. &&
git clone https://github.com/NVIDIA/nccl.git && cd nccl && make -j install && cd .. && rm -rf nccl
#mkdir build && cd build &&
#cmake .. && \

RUN cp Makefile.config.example Makefile.config &&
echo 'INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/' >>./Makefile.config &&
echo 'LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/' >>./Makefile.config &&
make -j"$(nproc)" &&
make pycaffe -j"$(nproc)"

RUN pip install lmdb

ENV PYCAFFE_ROOT $CAFFE_ROOT/python
ENV PYTHONPATH $PYCAFFE_ROOT:$PYTHONPATH
ENV PATH $CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH
RUN echo "$CAFFE_ROOT/build/lib" >> /etc/ld.so.conf.d/caffe.conf && ldconfig

WORKDIR /workspace
`

Your system configuration

Operating system: Ubuntu 16.04 LTS
Compiler:
CUDA version (if applicable): V8.0.61
CUDNN version (if applicable):
BLAS: I can not find it by grep OPENBLAS_VERSION /usr/local/include/openblas_config.h
Python or MATLAB version (for pycaffe and matcaffe respectively): python 2.7

Training with Force Regularization

@wenwei202 I am trying to train my network (Resnet like architecture) with force regularization. But I could not find any documentation on how to enable force regularization during training using my network.
From examples I can understand that "force_mult: 1" can be added to Convolution layer "param" to enable it. Is it the correct way or are there any other steps required?

Cifar10 example training error

Hi, when i run
./examples/cifar10/train_script.sh 0.1 0.0001 0.0 0.0 0.0 0
template_resnet_solver.prototxt
encounter the following problem,what is the reason and how to solve this,if you have any sugesstion ,I will apprecriate it. thanks!
./build/tools/caffe.bin train --solver=examples/cifar10//0.1_0.0001_0.0_0.0_0.0_2017年_05月_03日_星期三_18-57-55_CST/solver.prototxt
./examples/cifar10/train_script.sh: 行 66: 14108 已放弃 (核心已转储) ./build/tools/caffe.bin train --solver=$solverfile > "${snapshot_path}/train.info" 2>&1

Getting Error While trying to convert to sparse Matrix

I am getting below error while trying to convert Dense Matrix to Sparse Matrix.
TypeError: Cannot cast scalar from dtype('O') to dtype('float32') according to the rule 'same_kind'

... vanilla_layer =net.params[layer][0].data
... #print('{}. Layer: {} Type: {} Shape: {} Min weight: {} Max weight: {}'.format(idx,layer,net.layers[idx].type,vanilla_layer.shape,vanilla_layer.min(),vanilla_layer.max()))
... if 'fc6' in layer:
... vanilla_layer_prune,before,after=prune_max_abs_connections_threshold(layer,vanilla_layer,prune_level_FC6)
... np.copyto(net.params[layer][0].data, sparse.csc_matrix(vanilla_layer_prune))
... if 'fc7' in layer:
... vanilla_layer_prune,before,after=prune_max_abs_connections_threshold(layer,vanilla_layer,prune_level_FC7)
... np.copyto(net.params[layer][0].data, sparse.csc_matrix(vanilla_layer_prune))
... if 'fc8' in layer:
... vanilla_layer_prune,before,after=prune_max_abs_connections_threshold(layer,vanilla_layer,prune_level_FC8)
... np.copyto(net.params[layer][0].data, sparse.csc_matrix(vanilla_layer_prune))
...
Traceback (most recent call last):
File "", line 6, in
TypeError: Cannot cast scalar from dtype('O') to dtype('float32') according to the rule 'same_kind'

cmake errors

Currently cmake is not supported. Please copy Makefile.config.example and make all at this stage.

cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python, or if cuDNN is desired)
make all

Trying to open this issue to merge what I did in Makefile and Makefile.config.example to cmake.

See commits by @wenwei202 on
https://github.com/wenwei202/caffe/commits/scnn/Makefile
https://github.com/wenwei202/caffe/commits/scnn/Makefile.config.example

where cifar10_full_train_test_kernel_shape.prototxt ?

sorry, but I can't find cifar10_full_train_test_kernel_shape.prototxt and ./examples/cifar10/train_script.sh in https://github.com/wenwei202/caffe/blob/scnn/examples/cifar10/readme.md

additional details on ResNet20 (low rank)

Would you please share speed up ratios and final ranks on ResNet20 (CIFAR) obtained in Coordinating Filters for Faster Deep Neural Networks paper?

Improve efficiency for group lasso

Dear @wenwei202 ,
I found training speed get much much slower with group lasso than without it. I believe you must had the same experience. I hacked your code and found this line should be responsible for this efficiency decline(about 30%). I replace this kind of dynamic inquery to device, which is proved to be time-comsuming, with simply macro CAFFE_CUDA_NUM_THREADS. The training speed is now comparable to previous no group lasso one.

Hope this helpful. Thank you

make runtest error

I organize the issues regarding make runtest here.
Please see some of them reported in BVLC/caffe#4328 by searching make runtest.

Code branch: https://github.com/wenwei202/caffe/tree/scnn

Training on ImageNet using ResNet-18 and not convergence.

Hi, wei, I have downloaded your code on my PC and use your code to test the training on ResNet18 using ImageNet Dataset to train from scratch. But I find it seems hard to convergence with the setting below:

net: "./models/resnet-18-lowrank/train.prototxt"
test_iter: 2000
test_interval: 5000
test_initialization: true
display: 30
base_lr: 0.005
lr_policy: "multistep"
stepvalue: 150000
stepvalue: 300000
gamma: 0.1
max_iter: 600000
momentum: 0.9
weight_decay: 0.0001
snapshot: 6000
snapshot_prefix: "./models/resnet-18-lowrank/resnet-18"
solver_mode: GPU
force_type: "Constant"
force_decay: 0.0001

And I also did a contrast experiment before that I use the same configure for resnet-18 training without force_type and force_decay, just set the base_lr to 0.05 as your paper say in section5.1, it seems it can convergence quickly. So, could you give me some advice?

@wenwei202 Can you give any suggestion about doing sparse computation in android platform?

Have you heard about a CNN lib named ncnn? The author tries to speed up the CNN using assembly language optimization in android platform. Someone claims that img2col+gemm is slower than ncnn using arm chip. It is claimed that img2col operation cost much time in android platform. I asked the authors whether someone had tried doing sparse matrix computation optimization using ncnn. The author told me that they have not done that. Another guy told me that they are trying to improve the compuation speed using winograd further. However, it seems that caffe2 or Tensor flow are using img2col+gemm.

So can you give any suggestion about doing sparse computation in android platform?I found that just Eigen supported sparse computation in android platform. If eigen is using img2col+gemm, is it slower than those assembly language optimized lib such as ncnn?

cifar10 baseline performance

Hi,

I was trying to reproduce the baseline performance of ConvNet (reported in your NIPS paper). But I tried the default setting (cifar10_full_multistep_solver.prototxt) as the provided one in the directory and only got around 79~80% accuracy. Can you give me some advices on how to boost for the extra 2 percentages on the test data?

I also tried to train the SSL from scratch (without initiating from a baseline model) and still got 79~80% accuracy.

Jianbo

I have a problem about make runtest

I have compiled successfully the order of make all -j8!
But executing the order of make runtest,I failed.
The faults as follow:
[ OK ] DeconvolutionLayerTest/1.TestSetup (0 ms)
[ RUN ] DeconvolutionLayerTest/1.TestGradient3D
*** Aborted at 1520849489 (unix time) try "date -d @1520849489" if you are using GNU date ***
PC: @ 0x7f491ef40ef2 (unknown)
*** SIGSEGV (@0x0) received by PID 6175 (TID 0x7f4928b09740) from PID 0; stack trace: ***
@ 0x7f491f29c390 (unknown)
@ 0x7f491ef40ef2 (unknown)
@ 0x7f491ef4553c cfree
@ 0x4789f5 caffe::Blob<>::~Blob()
@ 0x478a82 boost::detail::sp_counted_impl_p<>::dispose()
@ 0x475fba boost::detail::sp_counted_base::release()
@ 0x47a7fc std::vector<>::~vector()
@ 0x485d82 caffe::GradientChecker<>::CheckGradientSingle()
@ 0x4c9de3 caffe::GradientChecker<>::CheckGradientExhaustive()
@ 0x551725 caffe::DeconvolutionLayerTest_TestGradient3D_Test<>::TestBody()
@ 0x91d9f3 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x91700a testing::Test::Run()
@ 0x917158 testing::TestInfo::Run()
@ 0x917235 testing::TestCase::Run()
@ 0x91850f testing::internal::UnitTestImpl::RunAllTests()
@ 0x918833 testing::UnitTest::Run()
@ 0x46e1dd main
@ 0x7f491eee1830 __libc_start_main
@ 0x475c49 _start
@ 0x0 (unknown)
it appear in different place,
cuda8.0 and cudnn 5.1
I should how to solve it??????????

hi, what 'convq_layer' means in net_pruner.py and net_skipper.py?

hi,
In my opinion there should be some python scripts that can remove all-ZEROs-weights filters(row sparsity) directly to accelerate GPU inferences without any CPU subroutines, so are the net_pruner.py and net_skipper.py used for that? Or Can you give me some advises?
and I can not figure out what 'convq_layer' and 'convq_param_key' means in net_pruner.py and net_skipper.py, for example there obvioursly do not exit 'conv1q' key in src_net.params.
Thanks a lot for your help!
`

src_net = caffe.Net(srcproto,srcmodel, caffe.TEST)
print("src net:\n blobs {}\nparams {}\n".format(src_net.blobs.keys(), src_net.params.keys()))
src_net_parser = caffeparser.CaffeProtoParser(srcproto)
net_msg = src_net_parser.readProtoNetFile()

layer_idx = 0
loop_layers = net_msg.layer[:] #adding : implicitly makes a copy to avoid being modified in the loop
convxq_positions = []
convxq_m = []
convxq_add_layers = []
position_idx = 0

total_all_zero_counter = 0

# generate and save dst prototxt

for cur_layer in loop_layers:
    if 'Convolution'==cur_layer.type and re.match("^conv[0-9]+$",cur_layer.name):
        convq_layer = net_msg.layer._values[position_idx-1]
        convq_param_key = cur_layer.name+"q"
        param_key = cur_layer.name
        convx_ptr = net_msg.layer._values.pop(position_idx)
        convx_ptr.CopyFrom(cur_layer)
        convxq_ptr = net_msg.layer._values.pop(position_idx-1)
        convxq_ptr.CopyFrom(convq_layer)

        assert len(src_net.params[convq_param_key])==1
        weights_convxq = src_net.params[convq_param_key][0].data
        weights_convx = src_net.params[param_key][0].data
        assert weights_convx.shape[3]==1 and weights_convx.shape[2]==1

        orig_grp_num = weights_convxq.shape[0]/weights_convx.shape[1]
        cur_m = convq_layer.convolution_param.group
        orig_grp_num = cur_layer.convolution_param.group
        num_per_orig_grp = (cur_m/orig_grp_num)
        cur_sxs = weights_convx.shape[1]*orig_grp_num/cur_m

using any regularization causes ever increasing loss

Issue summary

Hi @wenwei202,

I'am currently trying to train a sparse network through SSL. But I have some big issues getting the training to converge. As soon as I add any kind of regularization (L1, L2, your SSL) the loss increases and the training diverges. This even happens if I set the weight_decay to something like 0.0000001.

The following log shows the behavior when trying to train the resnet baseline example from your cifar10 readme.

./examples/cifar10/train_script.sh 0.1 0.00001 0.0 0.0 0.0 0 template_resnet_solver.prototxt

I1117 11:30:51.336390   896 solver.cpp:348] Iteration 0, Testing net (#0)
I1117 11:30:52.452332   896 solver.cpp:415]     Test net output #0: accuracy = 0.1
I1117 11:30:52.452364   896 solver.cpp:415]     Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1117 11:30:52.624837   896 solver.cpp:231] Iteration 0, loss = 3.50511
I1117 11:30:52.624869   896 solver.cpp:247]     Train net output #0: loss = 3.50511 (* 1 = 3.50511 loss)
I1117 11:30:52.624882   896 sgd_solver.cpp:106] Iteration 0, lr = 0.1
I1117 11:30:52.653563   896 solver.cpp:260]     Total regularization terms: 2504.25 loss+regular. : 2507.76
I1117 11:31:22.397892   896 solver.cpp:231] Iteration 200, loss = 1.52217
I1117 11:31:22.398046   896 solver.cpp:247]     Train net output #0: loss = 1.52217 (* 1 = 1.52217 loss)
I1117 11:31:22.398053   896 sgd_solver.cpp:106] Iteration 200, lr = 0.1
I1117 11:31:22.443342   896 solver.cpp:260]     Total regularization terms: 2.1337e+09 loss+regular. : 2.1337e+09
I1117 11:31:52.203909   896 solver.cpp:231] Iteration 400, loss = 1.31369
I1117 11:31:52.203939   896 solver.cpp:247]     Train net output #0: loss = 1.31369 (* 1 = 1.31369 loss)
I1117 11:31:52.203946   896 sgd_solver.cpp:106] Iteration 400, lr = 0.1
I1117 11:31:52.249099   896 solver.cpp:260]     Total regularization terms: 7.16458e+09 loss+regular. : 7.16458e+09

Do you know by any chance what could cause this behavior? Or how I could fix this?

Steps to reproduce

Training any net with enabled regularization.

Your system configuration

Operating system: Ubuntu 16.04 or Arch
Compiler: gcc5.4 (Ubuntu) and gcc5.5 (Arch)
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5
BLAS: Atlas
Python or MATLAB version (for pycaffe and matcaffe respectively): 3.5 (Ubuntu) 3.6 (Arch)

One trivial modification, save my training

Dear @wenwei202 ,
I found it is so hard to converage with group lasso term using scnn branch. The task is face classification with my own data and loss is simply SoftmaxWithLoss. At the very begining of training, the loss drops smoothly. When the process just reaches to about 2K iterations, the loss suddenly becomes to be 87.33, which is caused by some nan weights accroding to my log. Oh, one more thing, I train from scratch.
I think it is really ridiculous. After some hack into code, I found this line maybe problematic. I modify this to if(res > 0). Actually, It is inspired by the cpu version of the same math function. Then, every thing goes normal.

Although I think they are almost equal, the result are so diffrent. It only can be explained by numerical stability, But dont know why and how. Would you please to shed a light to this? Thank you.

How to see the speed up on GPU?

Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help.
Do not post such requests to Issues. Doing so interferes with the development of Caffe.

Please read the guidelines for contributing before submitting this issue.

Issue summary

I tried to use examples/caffenet_classifier.py but didn't see the speed up on my GPU (GTX 1080). Following are my results:
python examples/caffenet_classifier.py models/bvlc_reference_caffenet/deploy.prototxt caffenet_SSL_0.4469.caffemodel
I0412 13:00:27.347717 29124 base_conv_layer.cpp:855] conv1 group 0: 64.352 us (Dense Scheme Timing)
I0412 13:00:27.347939 29124 base_conv_layer.cpp:855] conv2 group 0: 48.128 us (Dense Scheme Timing)
I0412 13:00:27.348006 29124 base_conv_layer.cpp:855] conv2 group 1: 50.176 us (Dense Scheme Timing)
I0412 13:00:27.348263 29124 base_conv_layer.cpp:855] conv3 group 0: 91.136 us (Dense Scheme Timing)
I0412 13:00:27.348376 29124 base_conv_layer.cpp:855] conv4 group 0: 34.816 us (Dense Scheme Timing)
I0412 13:00:27.348429 29124 base_conv_layer.cpp:855] conv4 group 1: 37.152 us (Dense Scheme Timing)
I0412 13:00:27.348539 29124 base_conv_layer.cpp:855] conv5 group 0: 32.768 us (Dense Scheme Timing)
I0412 13:00:27.348592 29124 base_conv_layer.cpp:855] conv5 group 1: 36.832 us (Dense Scheme Timing)

python examples/caffenet_classifier.py models/bvlc_reference_caffenet/deploy_csrmm.prototxt caffenet_SSL_0.4469.caffemodel
I0411 20:03:28.505044 14113 base_conv_layer.cpp:813] conv1 group 0: 479.008 us (Compressed Row Storage Timing)
I0411 20:03:28.505533 14113 base_conv_layer.cpp:813] conv2 group 0: 286.528 us (Compressed Row Storage Timing)
I0411 20:03:28.505679 14113 base_conv_layer.cpp:813] conv2 group 1: 124.928 us (Compressed Row Storage Timing)
I0411 20:03:28.505998 14113 base_conv_layer.cpp:813] conv3 group 0: 141.312 us (Compressed Row Storage Timing)
I0411 20:03:28.506122 14113 base_conv_layer.cpp:813] conv4 group 0: 38.016 us (Compressed Row Storage Timing)
I0411 20:03:28.506234 14113 base_conv_layer.cpp:813] conv4 group 1: 94.144 us (Compressed Row Storage Timing)
I0411 20:03:28.506362 14113 base_conv_layer.cpp:813] conv5 group 0: 45.152 us (Compressed Row Storage Timing)
I0411 20:03:28.506521 14113 base_conv_layer.cpp:813] conv5 group 1: 140.288 us (Compressed Row Storage Timing)

Steps to reproduce

If you are having difficulty building Caffe or training a model, please ask the caffe-users mailing list. If you are reporting a build error that seems to be due to a bug in Caffe, please attach your build configuration (either Makefile.config or CMakeCache.txt) and the output of the make (or cmake) command.

Your system configuration

Operating system: Ubuntu
Compiler: GCC
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: cuBlas
Python or MATLAB version (for pycaffe and matcaffe respectively): 2.7

SCNN can't run for the errors of compute_shape

Issue summary

When I ran the scnn branch for the method on NIPS 2016, it crashed down

Your system configuration

Operating system: CentOS-7.3
Compiler: gcc-4.8.5 and g++-4.8.5
CUDA version (if applicable): CUDA-9.0
CUDNN version (if applicable): cudnn-7.3
BLAS: openblas

Detailed problem

When I just tested the model , it crashed down on the convolution layer for the compute_shape function. The following is the dump of the call stack:

(gdb) bt
#0 0x00000000004212fc in __gnu_cxx::new_allocator::construct(int*, int const&) (this=0x1076fe90, __p=0x4, __val=@0x7ffd0f353d2c: 149)
at /usr/include/c++/4.8.2/ext/new_allocator.h:130
#1 0x000000000041f110 in __gnu_cxx::__alloc_traits<std::allocator >::construct(std::allocator&, int*, int const&) (__a=..., __p=0x4, __arg=@0x7ffd0f353d2c: 149) at /usr/include/c++/4.8.2/ext/alloc_traits.h:216
#2 0x000000000041d068 in std::vector<int, std::allocator >::push_back(int const&) (this=0x1076fe90, __x=@0x7ffd0f353d2c: 149)
at /usr/include/c++/4.8.2/bits/stl_vector.h:905
#3 0x00007fce7d23acc0 in caffe::ConvolutionLayer::compute_output_shape() (this=0x1076f9b0) at src/caffe/layers/conv_layer.cpp:20
#4 0x00007fce7d2496f4 in caffe::BaseConvolutionLayer::Reshape(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&) (this=0x1076f9b0, bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...}) at src/caffe/layers/base_conv_layer.cpp:385
#5 0x00007fce7d197cd1 in caffe::Layer::SetUp(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&) (this=0x1076f9b0, bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...}) at ./include/caffe/layer.hpp:73
#6 0x00007fce7d2e0471 in caffe::Net::Init(caffe::NetParameter const&) (this=0x7ffd0f3548e0, in_param=...) at src/caffe/net.cpp:151
#7 0x00007fce7d2debb8 in caffe::Net::Net(std::string const&, caffe::Phase, int, std::vector<std::string, std::allocatorstd::string > const*, caffe::Net const*) (this=0x7ffd0f3548e0, param_file="/home/zhibin/qzhong/caffe/models/ilsvrc12_inceptionv4/inceptionv4_train_val.prototxt", phase=caffe::TEST, level=0, stages=0x7ffd0f354e40, root_net=0x0) at src/caffe/net.cpp:47
#8 0x00000000004195e3 in test() () at tools/caffe.cpp:285
#9 0x000000000041af6b in main(int, char**) (argc=2, argv=0x7ffd0f3550f8) at tools/caffe.cpp:451

Is there anything changed on the compute_shape function of convolution layer ?

How to prune the zero-weights

HOW to prune the zero-weight in the conv layers after train the model using block_group_lasso or breadth_decay_mult?
I found the sparsity rate in each conv layer apparently increased. Like 36% sparsity in conv1_1. But i dont know how to remove these invalid weights.

error occurs running on gpu when i transplant the code to windows

I parse models/resnet/caffenet_train_iter_2000000.caffemodel under the caffe. I only transplant the part of sparse matrix calculations to caffe(for example, the caffe_gpu_sparse_dense2csr function interface),but CUSPARSE_CHECK () reports error when running to the caffe_gpu_sparse_mmcsr function interface。What parts do I need to transplant to run correctly? I need yorur help,Thank you.

File "python/resnet_generator.py", line 21, in add_conv_layer conv_layer.bottom._values.append(bottom) AttributeError: 'google.protobuf.pyext._message.RepeatedScalarConta' object has no attribute '_values'

When trying to execute
python python/resnet_generator.py --net_template examples/cifar10/resne
t_template.prototxt --n 3 --force_regularization

got the error below:

Traceback (most recent call last):
File "python/resnet_generator.py", line 281, in
add_conv_layer(net_msg,name='conv1',bottom='data',num_output=16,pad=1,kernel_size=3,stride=1,connectivity_mode=connectivity_mode)
File "python/resnet_generator.py", line 21, in add_conv_layer
conv_layer.bottom._values.append(bottom)
AttributeError: 'google.protobuf.pyext._message.RepeatedScalarConta' object has no attribute '_values'

Kindly suggest how to go ahead

用scnn训练完后，测试模型时如果不用sparseblas会影响结果吗？

我的目的是借用scnn来压缩模型参数，但是在某些环境下（如安卓系统）缺乏sparseblas库；此时只能采用常规的卷积操作；如果我不用sparseblas测试压缩后的模型，会不会影响测试精度？

CSRMM Not Implemented yet

Hi @wenwei202 ,
After finishing the three steps of Resnet20 baseline, SSL and finetuning, I am having trouble evaluating the performance of finetuned models.
I changed the conv mode to LOWERED_CSRMM but got an error as "Not Implemented yet". Besides, I have also tried the python script cifar10_classifier.py but it wouldn't work neither.
I am wondering how could I evaluate the inference performance of sparsified models?

FYI, here's the error info:
CSRMM:
I0727 12:35:50.174033 1970 base_conv_layer.cpp:17] layer conv1 has sparsity of 0.0578704
I0727 12:35:50.174360 1970 base_conv_layer.cpp:29] ConvolutionParameter_ConvMode_LOWERED_CSRMM
F0727 12:35:50.174373 1970 math_functions.cpp:411] Not Implemented Yet
*** Check failure stack trace: ***
@ 0x7fb2ad0255cd google::LogMessage::Fail()
@ 0x7fb2ad027433 google::LogMessage::SendToLog()
@ 0x7fb2ad02515b google::LogMessage::Flush()
@ 0x7fb2ad027e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fb2ad6ca9d0 caffe::caffe_cpu_sparse_dense2csr<>()
@ 0x7fb2ad7fe3c9 caffe::BaseConvolutionLayer<>::WeightAlign()
@ 0x7fb2ad69308b caffe::Net<>::CopyTrainedLayersFrom()
@ 0x7fb2ad69d5f5 caffe::Net<>::CopyTrainedLayersFromBinaryProto()
@ 0x7fb2ad69d68e caffe::Net<>::CopyTrainedLayersFrom()
@ 0x40c222 time()
@ 0x407520 main
@ 0x7fb2abf95830 __libc_start_main
@ 0x407d49 _start
@ (nil) (unknown)

CCNMM:
I0727 11:48:19.621824 24561 base_conv_layer.cpp:17] layer res_grp1_1_conv1 has sparsity of 1
I0727 11:48:19.622687 24561 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM
I0727 11:48:19.622701 24561 base_conv_layer.cpp:80] concatenating weight matrix
I0727 11:48:19.622706 24561 base_conv_layer.cpp:88] res_grp1_1_conv1 left_cols=0 left_rows=0
I0727 11:48:19.622711 24561 base_conv_layer.cpp:91] squeezing weight matrix
I0727 11:48:19.622715 24561 base_conv_layer.cpp:102] res_grp1_1_conv1 squeezing to 0x0
F0727 11:48:19.622720 24561 blob.cpp:131] Check failed: data_
*** Check failure stack trace: ***
@ 0x7f18331595cd google::LogMessage::Fail()
@ 0x7f183315b433 google::LogMessage::SendToLog()
@ 0x7f183315915b google::LogMessage::Flush()
@ 0x7f183315be1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f18337ea41b caffe::Blob<>::mutable_cpu_data()
@ 0x7f18339330e5 caffe::BaseConvolutionLayer<>::WeightAlign()
@ 0x7f18337c709b caffe::Net<>::CopyTrainedLayersFrom()
@ 0x7f18337d1605 caffe::Net<>::CopyTrainedLayersFromBinaryProto()
@ 0x7f18337d169e caffe::Net<>::CopyTrainedLayersFrom()
@ 0x40c222 time()
@ 0x407520 main
@ 0x7f18320c9830 __libc_start_main
@ 0x407d49 _start
@ (nil) (unknown)

Best Regards,
Leo

Your system configuration

Operating system: Ubuntu 16.04
Compiler:
CUDA version (if applicable): 9.0
CUDNN version (if applicable): 5
BLAS: MLK
Python or MATLAB version (for pycaffe and matcaffe respectively):

Evaluate caffe model uploaded in caffe model zoo

Issue summary

Evaluating your uploaded models in general caffe (https://github.com/BVLC/caffe/tree/master/src/caffe) results in following:

I0503 16:43:44.044796 19898 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_SSL_0.4259.caffemodel
I0503 16:43:48.726794 19898 caffe.cpp:251] Starting Optimization
I0503 16:43:48.726846 19898 solver.cpp:279] Solving AlexNet
I0503 16:43:48.726857 19898 solver.cpp:280] Learning Rate Policy: step
I0503 16:43:48.751430 19898 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 16:43:49.218724 19898 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:44:20.341884 19898 solver.cpp:404] Test net output #0: loss = 2.91344 (* 1 = 2.91344 loss)
I0503 16:44:20.342052 19898 solver.cpp:404] Test net output #1: top1_accuracy = 0.40986
I0503 16:44:20.342062 19898 solver.cpp:404] Test net output #2: top5_accuracy = 0.658339

I0503 16:51:35.119240 26706 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_SSL_0.4469.caffemodel
I0503 16:51:43.908475 26706 caffe.cpp:251] Starting Optimization
I0503 16:51:43.908524 26706 solver.cpp:279] Solving AlexNet
I0503 16:51:43.908535 26706 solver.cpp:280] Learning Rate Policy: step
I0503 16:51:43.916864 26706 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 16:51:44.117143 26706 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:52:13.693686 26706 solver.cpp:404] Test net output #0: loss = 2.72754 (* 1 = 2.72754 loss)
I0503 16:52:13.693859 26706 solver.cpp:404] Test net output #1: top1_accuracy = 0.43084
I0503 16:52:13.693868 26706 solver.cpp:404] Test net output #2: top5_accuracy = 0.67862

I0503 16:50:10.291841 25409 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_L1_0.4251.caffemodel
I0503 16:50:12.619544 25409 caffe.cpp:251] Starting Optimization
I0503 16:50:12.619585 25409 solver.cpp:279] Solving AlexNet
I0503 16:50:12.619592 25409 solver.cpp:280] Learning Rate Policy: step
I0503 16:50:12.626791 25409 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 16:50:12.895716 25409 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:50:50.041647 25409 solver.cpp:404] Test net output #0: loss = 3.26702 (* 1 = 3.26702 loss)
I0503 16:50:50.041822 25409 solver.cpp:404] Test net output #1: top1_accuracy = 0.36774
I0503 16:50:50.041837 25409 solver.cpp:404] Test net output #2: top5_accuracy = 0.61448

I0503 22:01:39.975592 8616 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_L1_0.4467.caffemodel
I0503 22:01:46.399006 8616 caffe.cpp:251] Starting Optimization
I0503 22:01:46.399055 8616 solver.cpp:279] Solving AlexNet
I0503 22:01:46.399062 8616 solver.cpp:280] Learning Rate Policy: step
I0503 22:01:46.405417 8616 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 22:01:47.340291 8616 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 22:02:16.862293 8616 solver.cpp:404] Test net output #0: loss = 2.76222 (* 1 = 2.76222 loss)
I0503 22:02:16.862720 8616 solver.cpp:404] Test net output #1: top1_accuracy = 0.42278
I0503 22:02:16.862746 8616 solver.cpp:404] Test net output #2: top5_accuracy = 0.673259

Steps to reproduce

download models uploaded in caffe model zoo (https://github.com/BVLC/caffe/wiki/Model-Zoo#learning-structured-sparsity-in-deep-neural-networks)
configure train_val.prototxt as following:

layer
{
name: "top5_accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "top5_accuracy"
accuracy_param {
top_k: 5
}
include {
phase: TEST
}
}
layer {
name: "top1_accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "top1_accuracy"
include {
phase: TEST
}
}

To make sure imagenet data is well prepared, I evaluate the alexnet model (https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet) and get following:

I0503 16:45:40.494272 21305 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:46:14.405321 21305 solver.cpp:404] Test net output #0: loss = 1.86402 (* 1 = 1.86402 loss)
I0503 16:46:14.405730 21305 solver.cpp:404] Test net output #1: top1_accuracy = 0.56822
I0503 16:46:14.405750 21305 solver.cpp:404] Test net output #2: top5_accuracy = 0.799561

Your system configuration

Operating system: ubuntu 16.04
Compiler:
CUDA version (if applicable): 8.0
CUDNN version (if applicable):
BLAS:
Python or MATLAB version (for pycaffe and matcaffe respectively):

Removing all-zero rows and columns of the feature map matrix by a GPU routine

In GPU mode with conv_mode: LOWERED_CCNMM , we need to first remove all-zero columns and rows in the feature map matrix col_buffer_ . This concatenation process is temporally using the corresponding CPU routine.
We plan to substitute it with a GPU routine. Please pull request if anyone implements this.

Code branch: https://github.com/wenwei202/caffe/tree/scnn

Error in running nn_decomposer.py

I am getting the following errors after running python python/nn_decomposer.py

raceback (most recent call last):
File "../python/nn_decomposer.py", line 194, in
net_msg.layer._values.insert(layer_idx,low_rank_layer)
AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute '_values'

Traceback (most recent call last):
File "../python/nn_decomposer.py", line 194, in
net_msg.layer.insert(layer_idx,low_rank_layer)
AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'insert'

i tried protobuf version 2.3, 2.5 and 3.3
what could possibly be wrong at my end, please help me.

where to get lenet_0.9917.caffemodel.h5? bvlc_alexnet.caffemodel.h5?

Where we can get below files?

lenet_0.9917.caffemodel.h5
bvlc_alexnet.caffemodel.h5?

Error while converting caffemodel to HDF5

Issue summary

Hi @wenwei202 ,
I'm trying to apply low rank approximation to object detection model similar to SSD. But while converting the model to HDF5 using caffemodel_converter.py I'm getting below error:
I1114 15:37:20.404398 4379 net.cpp:916] Layer namestem1_stem1/relu_0_split HDF5-DIAG: Error detected in HDF5 (1.8.16) thread 140026632021760: #000: ../../../src/H5G.c line 314 in H5Gcreate2(): unable to create group major: Symbol table minor: Unable to initialize object #001: ../../../src/H5Gint.c line 194 in H5G__create_named(): unable to create and link to group major: Symbol table minor: Unable to initialize object #002: ../../../src/H5L.c line 1638 in H5L_link_object(): unable to create new link to object major: Links minor: Unable to initialize object #003: ../../../src/H5L.c line 1882 in H5L_create_real(): can't insert link major: Symbol table minor: Unable to insert object #004: ../../../src/H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed major: Symbol table minor: Object not found #005: ../../../src/H5Gtraverse.c line 755 in H5G_traverse_real(): component not found major: Symbol table minor: Object not found F1114 15:37:20.404465 4379 net.cpp:919] Check failed: layer_data_hid >= 0 (-1 vs. 0)

The model I'm using has a Dense block as described in https://arxiv.org/abs/1608.06993,
not sure if that could be the issue. I checked related issues online where people suggested the issue might be because of duplicate layer names. But I checked it was not the case with my model. If you have any suggestion please let me know.

Your system configuration

Operating system: Ubuntu 16.04.1
Compiler:
CUDA version (if applicable): Cuda-8.0
CUDNN version (if applicable): Cudnn-5.0.5
BLAS: Atlas
Python or MATLAB version (for pycaffe and matcaffe respectively):pycaffe

how to get speedup on GPU using conv_mode: LOWERED_CCNMM

I have a alexnet caffemodel with zero-column and zero-row weights. Using conv_mode: LOWERED_CCNMM, I got speedup on CPU (like structured sparsity=75%, speedup=3.1x), but on GPU, there is no speedup at all, what should I do to get speedup on GPU? I use the build/tools/caffe time tool to evaluate inference time. Anyone know sth. about this? thx a lot !!

How to make ?

I have compiled successfully the codes of caffe in my computer,running examples! But when compiling modified codes,I encounter problems.
the problem as follow:
./include/caffe/util/cudnn.hpp:113:70: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
I should how to solve it?
ubuntu 16,cuda9.0,cuda compilation tools 9.0

Special steps required for implementation?

Do I need to alter my train_val.prototxt file to take advantage of sparsity, or will this version of caffe implement sparsity for me?

For comparison, I took a model I had previously trained using vanilla caffe and retrained it with this caffe library. Training executed and the model learned successfully, but there was no speed-up observed during inference. So, I'm wondering if I need to make any changes to my train prototxt file to take advantage of the sparsity operations, since I didn't see any special instructions in the project readme.

Thank you.

I have some problem about code

In the code, what are their meaning that LOWERED_CSRMM,LOWERED_CCNMM and DIRECT_SCONV?
In the process of making sparsification, whether you use existing library, example mkl on CPU and CUDA on GPU

wenwei202 / caffe Goto Github PK

caffe's People

Contributors

Stargazers

Watchers

Forkers

caffe's Issues

Issue summary

Steps to reproduce

Issue summary

Steps to reproduce

Your system configuration

Issue summary

Steps to reproduce

Your system configuration

Issue summary

Steps to reproduce

Your system configuration

Issue summary

Your system configuration

Detailed problem

Your system configuration

Issue summary

Steps to reproduce

Your system configuration

Issue summary

Your system configuration

Recommend Projects

Recommend Topics

Recommend Org