wenwei202 / caffe Goto Github PK
View Code? Open in Web Editor NEWCaffe for Sparse and Low-rank Deep Neural Networks
License: Other
Caffe for Sparse and Low-rank Deep Neural Networks
License: Other
nn_decomposer.py expects cur_layer_param.convolution_param.kernel_size._values to be set. This means the convolutional kernel needs to be square and kernel_h and kernel_w may not be different. Is there any special reason for this?
hi @wenwei202 , Thank you for your sharing. Based on your paper, I implemented ZJCV/SSL using pytorch, including train-prune-finetuing
for VGGNet and ResNet
From the experimental results, we can see that the pruning effect is very good. Thank you again for SSL
I am working on low rank deep neural networks, to speedup the testing for better deployability. Anyone is working on similar stuff?
Code is in https://github.com/wenwei202/caffe/tree/sfm.
Related publication in ICCV 2017
When I define my network according to Network in Network architecture without using SSL regularization (similar to lenet_train_test.prototxt, which no SSL regularization has been used), the network can not train, however I played with LR and weight decays. Loss value during the training iterations is always constant value : Train net output #0: loss = 2.30259 (* 1 = 2.30259 loss)
Could you please help me with that?
I am using docker container and here is my Dockerfile, which the image has been built from it:
`FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
LABEL maintainer [email protected]
RUN apt-get update && apt-get install -y --no-install-recommends
build-essential
cmake
git
wget
libatlas-base-dev
libboost-all-dev
libgflags-dev
libgoogle-glog-dev
libhdf5-serial-dev
libleveldb-dev
liblmdb-dev
libopencv-dev
libprotobuf-dev
libsnappy-dev
protobuf-compiler
python-dev
python-numpy
python-pip
python-setuptools
python-scipy &&
rm -rf /var/lib/apt/lists/*
ENV CAFFE_ROOT=/opt/caffe
WORKDIR $CAFFE_ROOT
ENV CLONE_TAG=1.0
RUN git clone -b scnn --depth 1 https://github.com/wenwei202/caffe.git . &&
pip install --upgrade pip &&
cd python && for req in $(cat requirements.txt) pydot; do pip install $req; done && cd .. &&
git clone https://github.com/NVIDIA/nccl.git && cd nccl && make -j install && cd .. && rm -rf nccl
#mkdir build && cd build &&
#cmake .. && \
RUN cp Makefile.config.example Makefile.config &&
echo 'INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/' >>./Makefile.config &&
echo 'LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/' >>./Makefile.config &&
make -j"$(nproc)" &&
make pycaffe -j"$(nproc)"
RUN pip install lmdb
ENV PYCAFFE_ROOT $CAFFE_ROOT/python
ENV PYTHONPATH $PYCAFFE_ROOT:$PYTHONPATH
ENV PATH $CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH
RUN echo "$CAFFE_ROOT/build/lib" >> /etc/ld.so.conf.d/caffe.conf && ldconfig
WORKDIR /workspace
`
Operating system: Ubuntu 16.04 LTS
Compiler:
CUDA version (if applicable): V8.0.61
CUDNN version (if applicable):
BLAS: I can not find it by grep OPENBLAS_VERSION /usr/local/include/openblas_config.h
Python or MATLAB version (for pycaffe and matcaffe respectively): python 2.7
@wenwei202 I am trying to train my network (Resnet like architecture) with force regularization. But I could not find any documentation on how to enable force regularization during training using my network.
From examples I can understand that "force_mult: 1" can be added to Convolution layer "param" to enable it. Is it the correct way or are there any other steps required?
Hi, when i run
./examples/cifar10/train_script.sh 0.1 0.0001 0.0 0.0 0.0 0
template_resnet_solver.prototxt
encounter the following problem,what is the reason and how to solve this,if you have any sugesstion ,I will apprecriate it. thanks!
./build/tools/caffe.bin train --solver=examples/cifar10//0.1_0.0001_0.0_0.0_0.0_2017年_05月_03日_星期三_18-57-55_CST/solver.prototxt
./examples/cifar10/train_script.sh: 行 66: 14108 已放弃 (核心已转储) ./build/tools/caffe.bin train --solver=$solverfile > "${snapshot_path}/train.info" 2>&1
I am getting below error while trying to convert Dense Matrix to Sparse Matrix.
TypeError: Cannot cast scalar from dtype('O') to dtype('float32') according to the rule 'same_kind'
... vanilla_layer =net.params[layer][0].data
... #print('{}. Layer: {} Type: {} Shape: {} Min weight: {} Max weight: {}'.format(idx,layer,net.layers[idx].type,vanilla_layer.shape,vanilla_layer.min(),vanilla_layer.max()))
... if 'fc6' in layer:
... vanilla_layer_prune,before,after=prune_max_abs_connections_threshold(layer,vanilla_layer,prune_level_FC6)
... np.copyto(net.params[layer][0].data, sparse.csc_matrix(vanilla_layer_prune))
... if 'fc7' in layer:
... vanilla_layer_prune,before,after=prune_max_abs_connections_threshold(layer,vanilla_layer,prune_level_FC7)
... np.copyto(net.params[layer][0].data, sparse.csc_matrix(vanilla_layer_prune))
... if 'fc8' in layer:
... vanilla_layer_prune,before,after=prune_max_abs_connections_threshold(layer,vanilla_layer,prune_level_FC8)
... np.copyto(net.params[layer][0].data, sparse.csc_matrix(vanilla_layer_prune))
...
Traceback (most recent call last):
File "", line 6, in
TypeError: Cannot cast scalar from dtype('O') to dtype('float32') according to the rule 'same_kind'
Currently cmake
is not supported. Please copy Makefile.config.example
and make all
at this stage.
cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python, or if cuDNN is desired)
make all
Trying to open this issue to merge what I did in Makefile
and Makefile.config.example
to cmake
.
See commits by @wenwei202 on
https://github.com/wenwei202/caffe/commits/scnn/Makefile
https://github.com/wenwei202/caffe/commits/scnn/Makefile.config.example
sorry, but I can't find cifar10_full_train_test_kernel_shape.prototxt and ./examples/cifar10/train_script.sh
in https://github.com/wenwei202/caffe/blob/scnn/examples/cifar10/readme.md
Would you please share speed up ratios and final ranks on ResNet20 (CIFAR) obtained in Coordinating Filters for Faster Deep Neural Networks paper?
Dear @wenwei202 ,
I found training speed get much much slower with group lasso than without it. I believe you must had the same experience. I hacked your code and found this line should be responsible for this efficiency decline(about 30%). I replace this kind of dynamic inquery to device, which is proved to be time-comsuming, with simply macro CAFFE_CUDA_NUM_THREADS. The training speed is now comparable to previous no group lasso one.
Hope this helpful. Thank you
I organize the issues regarding make runtest
here.
Please see some of them reported in BVLC/caffe#4328 by searching make runtest
.
Code branch: https://github.com/wenwei202/caffe/tree/scnn
Hi, wei, I have downloaded your code on my PC and use your code to test the training on ResNet18 using ImageNet Dataset to train from scratch. But I find it seems hard to convergence with the setting below:
net: "./models/resnet-18-lowrank/train.prototxt"
test_iter: 2000
test_interval: 5000
test_initialization: true
display: 30
base_lr: 0.005
lr_policy: "multistep"
stepvalue: 150000
stepvalue: 300000
gamma: 0.1
max_iter: 600000
momentum: 0.9
weight_decay: 0.0001
snapshot: 6000
snapshot_prefix: "./models/resnet-18-lowrank/resnet-18"
solver_mode: GPU
force_type: "Constant"
force_decay: 0.0001
And I also did a contrast experiment before that I use the same configure for resnet-18 training without force_type and force_decay, just set the base_lr to 0.05 as your paper say in section5.1, it seems it can convergence quickly. So, could you give me some advice?
Have you heard about a CNN lib named ncnn? The author tries to speed up the CNN using assembly language optimization in android platform. Someone claims that img2col+gemm is slower than ncnn using arm chip. It is claimed that img2col operation cost much time in android platform. I asked the authors whether someone had tried doing sparse matrix computation optimization using ncnn. The author told me that they have not done that. Another guy told me that they are trying to improve the compuation speed using winograd further. However, it seems that caffe2 or Tensor flow are using img2col+gemm.
So can you give any suggestion about doing sparse computation in android platform?I found that just Eigen supported sparse computation in android platform. If eigen is using img2col+gemm, is it slower than those assembly language optimized lib such as ncnn?
Hi,
I was trying to reproduce the baseline performance of ConvNet (reported in your NIPS paper). But I tried the default setting (cifar10_full_multistep_solver.prototxt
) as the provided one in the directory and only got around 79~80% accuracy. Can you give me some advices on how to boost for the extra 2 percentages on the test data?
I also tried to train the SSL from scratch (without initiating from a baseline model) and still got 79~80% accuracy.
Jianbo
I have compiled successfully the order of make all -j8!
But executing the order of make runtest,I failed.
The faults as follow:
[ OK ] DeconvolutionLayerTest/1.TestSetup (0 ms)
[ RUN ] DeconvolutionLayerTest/1.TestGradient3D
*** Aborted at 1520849489 (unix time) try "date -d @1520849489" if you are using GNU date ***
PC: @ 0x7f491ef40ef2 (unknown)
*** SIGSEGV (@0x0) received by PID 6175 (TID 0x7f4928b09740) from PID 0; stack trace: ***
@ 0x7f491f29c390 (unknown)
@ 0x7f491ef40ef2 (unknown)
@ 0x7f491ef4553c cfree
@ 0x4789f5 caffe::Blob<>::~Blob()
@ 0x478a82 boost::detail::sp_counted_impl_p<>::dispose()
@ 0x475fba boost::detail::sp_counted_base::release()
@ 0x47a7fc std::vector<>::~vector()
@ 0x485d82 caffe::GradientChecker<>::CheckGradientSingle()
@ 0x4c9de3 caffe::GradientChecker<>::CheckGradientExhaustive()
@ 0x551725 caffe::DeconvolutionLayerTest_TestGradient3D_Test<>::TestBody()
@ 0x91d9f3 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x91700a testing::Test::Run()
@ 0x917158 testing::TestInfo::Run()
@ 0x917235 testing::TestCase::Run()
@ 0x91850f testing::internal::UnitTestImpl::RunAllTests()
@ 0x918833 testing::UnitTest::Run()
@ 0x46e1dd main
@ 0x7f491eee1830 __libc_start_main
@ 0x475c49 _start
@ 0x0 (unknown)
it appear in different place,
cuda8.0 and cudnn 5.1
I should how to solve it??????????
hi,
In my opinion there should be some python scripts that can remove all-ZEROs-weights filters(row sparsity) directly to accelerate GPU inferences without any CPU subroutines, so are the net_pruner.py and net_skipper.py used for that? Or Can you give me some advises?
and I can not figure out what 'convq_layer' and 'convq_param_key' means in net_pruner.py and net_skipper.py, for example there obvioursly do not exit 'conv1q' key in src_net.params.
Thanks a lot for your help!
`
src_net = caffe.Net(srcproto,srcmodel, caffe.TEST)
print("src net:\n blobs {}\nparams {}\n".format(src_net.blobs.keys(), src_net.params.keys()))
src_net_parser = caffeparser.CaffeProtoParser(srcproto)
net_msg = src_net_parser.readProtoNetFile()
layer_idx = 0
loop_layers = net_msg.layer[:] #adding : implicitly makes a copy to avoid being modified in the loop
convxq_positions = []
convxq_m = []
convxq_add_layers = []
position_idx = 0
total_all_zero_counter = 0
# generate and save dst prototxt
for cur_layer in loop_layers:
if 'Convolution'==cur_layer.type and re.match("^conv[0-9]+$",cur_layer.name):
convq_layer = net_msg.layer._values[position_idx-1]
convq_param_key = cur_layer.name+"q"
param_key = cur_layer.name
convx_ptr = net_msg.layer._values.pop(position_idx)
convx_ptr.CopyFrom(cur_layer)
convxq_ptr = net_msg.layer._values.pop(position_idx-1)
convxq_ptr.CopyFrom(convq_layer)
assert len(src_net.params[convq_param_key])==1
weights_convxq = src_net.params[convq_param_key][0].data
weights_convx = src_net.params[param_key][0].data
assert weights_convx.shape[3]==1 and weights_convx.shape[2]==1
orig_grp_num = weights_convxq.shape[0]/weights_convx.shape[1]
cur_m = convq_layer.convolution_param.group
orig_grp_num = cur_layer.convolution_param.group
num_per_orig_grp = (cur_m/orig_grp_num)
cur_sxs = weights_convx.shape[1]*orig_grp_num/cur_m
`
Hi @wenwei202,
I'am currently trying to train a sparse network through SSL. But I have some big issues getting the training to converge. As soon as I add any kind of regularization (L1, L2, your SSL) the loss increases and the training diverges. This even happens if I set the weight_decay to something like 0.0000001.
The following log shows the behavior when trying to train the resnet baseline example from your cifar10 readme.
./examples/cifar10/train_script.sh 0.1 0.00001 0.0 0.0 0.0 0 template_resnet_solver.prototxt
I1117 11:30:51.336390 896 solver.cpp:348] Iteration 0, Testing net (#0)
I1117 11:30:52.452332 896 solver.cpp:415] Test net output #0: accuracy = 0.1
I1117 11:30:52.452364 896 solver.cpp:415] Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1117 11:30:52.624837 896 solver.cpp:231] Iteration 0, loss = 3.50511
I1117 11:30:52.624869 896 solver.cpp:247] Train net output #0: loss = 3.50511 (* 1 = 3.50511 loss)
I1117 11:30:52.624882 896 sgd_solver.cpp:106] Iteration 0, lr = 0.1
I1117 11:30:52.653563 896 solver.cpp:260] Total regularization terms: 2504.25 loss+regular. : 2507.76
I1117 11:31:22.397892 896 solver.cpp:231] Iteration 200, loss = 1.52217
I1117 11:31:22.398046 896 solver.cpp:247] Train net output #0: loss = 1.52217 (* 1 = 1.52217 loss)
I1117 11:31:22.398053 896 sgd_solver.cpp:106] Iteration 200, lr = 0.1
I1117 11:31:22.443342 896 solver.cpp:260] Total regularization terms: 2.1337e+09 loss+regular. : 2.1337e+09
I1117 11:31:52.203909 896 solver.cpp:231] Iteration 400, loss = 1.31369
I1117 11:31:52.203939 896 solver.cpp:247] Train net output #0: loss = 1.31369 (* 1 = 1.31369 loss)
I1117 11:31:52.203946 896 sgd_solver.cpp:106] Iteration 400, lr = 0.1
I1117 11:31:52.249099 896 solver.cpp:260] Total regularization terms: 7.16458e+09 loss+regular. : 7.16458e+09
Do you know by any chance what could cause this behavior? Or how I could fix this?
Training any net with enabled regularization.
Operating system: Ubuntu 16.04 or Arch
Compiler: gcc5.4 (Ubuntu) and gcc5.5 (Arch)
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5
BLAS: Atlas
Python or MATLAB version (for pycaffe and matcaffe respectively): 3.5 (Ubuntu) 3.6 (Arch)
Dear @wenwei202 ,
I found it is so hard to converage with group lasso term using scnn branch. The task is face classification with my own data and loss is simply SoftmaxWithLoss. At the very begining of training, the loss drops smoothly. When the process just reaches to about 2K iterations, the loss suddenly becomes to be 87.33, which is caused by some nan weights accroding to my log. Oh, one more thing, I train from scratch.
I think it is really ridiculous. After some hack into code, I found this line maybe problematic. I modify this to if(res > 0). Actually, It is inspired by the cpu version of the same math function. Then, every thing goes normal.
Although I think they are almost equal, the result are so diffrent. It only can be explained by numerical stability, But dont know why and how. Would you please to shed a light to this? Thank you.
Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help.
Do not post such requests to Issues. Doing so interferes with the development of Caffe.
Please read the guidelines for contributing before submitting this issue.
I tried to use examples/caffenet_classifier.py but didn't see the speed up on my GPU (GTX 1080). Following are my results:
python examples/caffenet_classifier.py models/bvlc_reference_caffenet/deploy.prototxt caffenet_SSL_0.4469.caffemodel
I0412 13:00:27.347717 29124 base_conv_layer.cpp:855] conv1 group 0: 64.352 us (Dense Scheme Timing)
I0412 13:00:27.347939 29124 base_conv_layer.cpp:855] conv2 group 0: 48.128 us (Dense Scheme Timing)
I0412 13:00:27.348006 29124 base_conv_layer.cpp:855] conv2 group 1: 50.176 us (Dense Scheme Timing)
I0412 13:00:27.348263 29124 base_conv_layer.cpp:855] conv3 group 0: 91.136 us (Dense Scheme Timing)
I0412 13:00:27.348376 29124 base_conv_layer.cpp:855] conv4 group 0: 34.816 us (Dense Scheme Timing)
I0412 13:00:27.348429 29124 base_conv_layer.cpp:855] conv4 group 1: 37.152 us (Dense Scheme Timing)
I0412 13:00:27.348539 29124 base_conv_layer.cpp:855] conv5 group 0: 32.768 us (Dense Scheme Timing)
I0412 13:00:27.348592 29124 base_conv_layer.cpp:855] conv5 group 1: 36.832 us (Dense Scheme Timing)
python examples/caffenet_classifier.py models/bvlc_reference_caffenet/deploy_csrmm.prototxt caffenet_SSL_0.4469.caffemodel
I0411 20:03:28.505044 14113 base_conv_layer.cpp:813] conv1 group 0: 479.008 us (Compressed Row Storage Timing)
I0411 20:03:28.505533 14113 base_conv_layer.cpp:813] conv2 group 0: 286.528 us (Compressed Row Storage Timing)
I0411 20:03:28.505679 14113 base_conv_layer.cpp:813] conv2 group 1: 124.928 us (Compressed Row Storage Timing)
I0411 20:03:28.505998 14113 base_conv_layer.cpp:813] conv3 group 0: 141.312 us (Compressed Row Storage Timing)
I0411 20:03:28.506122 14113 base_conv_layer.cpp:813] conv4 group 0: 38.016 us (Compressed Row Storage Timing)
I0411 20:03:28.506234 14113 base_conv_layer.cpp:813] conv4 group 1: 94.144 us (Compressed Row Storage Timing)
I0411 20:03:28.506362 14113 base_conv_layer.cpp:813] conv5 group 0: 45.152 us (Compressed Row Storage Timing)
I0411 20:03:28.506521 14113 base_conv_layer.cpp:813] conv5 group 1: 140.288 us (Compressed Row Storage Timing)
If you are having difficulty building Caffe or training a model, please ask the caffe-users mailing list. If you are reporting a build error that seems to be due to a bug in Caffe, please attach your build configuration (either Makefile.config or CMakeCache.txt) and the output of the make (or cmake) command.
Operating system: Ubuntu
Compiler: GCC
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: cuBlas
Python or MATLAB version (for pycaffe and matcaffe respectively): 2.7
When I ran the scnn
branch for the method on NIPS 2016, it crashed down
Operating system: CentOS-7.3
Compiler: gcc-4.8.5 and g++-4.8.5
CUDA version (if applicable): CUDA-9.0
CUDNN version (if applicable): cudnn-7.3
BLAS: openblas
When I just tested the model , it crashed down on the convolution layer for the compute_shape
function. The following is the dump of the call stack:
(gdb) bt
#0 0x00000000004212fc in __gnu_cxx::new_allocator::construct(int*, int const&) (this=0x1076fe90, __p=0x4, __val=@0x7ffd0f353d2c: 149)
at /usr/include/c++/4.8.2/ext/new_allocator.h:130
#1 0x000000000041f110 in __gnu_cxx::__alloc_traits<std::allocator >::construct(std::allocator&, int*, int const&) (__a=..., __p=0x4, __arg=@0x7ffd0f353d2c: 149) at /usr/include/c++/4.8.2/ext/alloc_traits.h:216
#2 0x000000000041d068 in std::vector<int, std::allocator >::push_back(int const&) (this=0x1076fe90, __x=@0x7ffd0f353d2c: 149)
at /usr/include/c++/4.8.2/bits/stl_vector.h:905
#3 0x00007fce7d23acc0 in caffe::ConvolutionLayer::compute_output_shape() (this=0x1076f9b0) at src/caffe/layers/conv_layer.cpp:20
#4 0x00007fce7d2496f4 in caffe::BaseConvolutionLayer::Reshape(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&) (this=0x1076f9b0, bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...}) at src/caffe/layers/base_conv_layer.cpp:385
#5 0x00007fce7d197cd1 in caffe::Layer::SetUp(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&) (this=0x1076f9b0, bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...}) at ./include/caffe/layer.hpp:73
#6 0x00007fce7d2e0471 in caffe::Net::Init(caffe::NetParameter const&) (this=0x7ffd0f3548e0, in_param=...) at src/caffe/net.cpp:151
#7 0x00007fce7d2debb8 in caffe::Net::Net(std::string const&, caffe::Phase, int, std::vector<std::string, std::allocatorstd::string > const*, caffe::Net const*) (this=0x7ffd0f3548e0, param_file="/home/zhibin/qzhong/caffe/models/ilsvrc12_inceptionv4/inceptionv4_train_val.prototxt", phase=caffe::TEST, level=0, stages=0x7ffd0f354e40, root_net=0x0) at src/caffe/net.cpp:47
#8 0x00000000004195e3 in test() () at tools/caffe.cpp:285
#9 0x000000000041af6b in main(int, char**) (argc=2, argv=0x7ffd0f3550f8) at tools/caffe.cpp:451
Is there anything changed on the compute_shape
function of convolution layer ?
HOW to prune the zero-weight in the conv layers after train the model using block_group_lasso or breadth_decay_mult?
I found the sparsity rate in each conv layer apparently increased. Like 36% sparsity in conv1_1. But i dont know how to remove these invalid weights.
I parse models/resnet/caffenet_train_iter_2000000.caffemodel under the caffe. I only transplant the part of sparse matrix calculations to caffe(for example, the caffe_gpu_sparse_dense2csr function interface),but CUSPARSE_CHECK () reports error when running to the caffe_gpu_sparse_mmcsr function interface。What parts do I need to transplant to run correctly? I need yorur help,Thank you.
When trying to execute
python python/resnet_generator.py --net_template examples/cifar10/resne
t_template.prototxt --n 3 --force_regularization
got the error below:
Traceback (most recent call last):
File "python/resnet_generator.py", line 281, in
add_conv_layer(net_msg,name='conv1',bottom='data',num_output=16,pad=1,kernel_size=3,stride=1,connectivity_mode=connectivity_mode)
File "python/resnet_generator.py", line 21, in add_conv_layer
conv_layer.bottom._values.append(bottom)
AttributeError: 'google.protobuf.pyext._message.RepeatedScalarConta' object has no attribute '_values'
Kindly suggest how to go ahead
我的目的是借用scnn来压缩模型参数,但是在某些环境下(如安卓系统)缺乏sparseblas库;此时只能采用常规的卷积操作;如果我不用sparseblas测试压缩后的模型,会不会影响测试精度?
Hi @wenwei202 ,
After finishing the three steps of Resnet20 baseline, SSL and finetuning, I am having trouble evaluating the performance of finetuned models.
I changed the conv mode to LOWERED_CSRMM but got an error as "Not Implemented yet". Besides, I have also tried the python script cifar10_classifier.py but it wouldn't work neither.
I am wondering how could I evaluate the inference performance of sparsified models?
FYI, here's the error info:
CSRMM:
I0727 12:35:50.174033 1970 base_conv_layer.cpp:17] layer conv1 has sparsity of 0.0578704
I0727 12:35:50.174360 1970 base_conv_layer.cpp:29] ConvolutionParameter_ConvMode_LOWERED_CSRMM
F0727 12:35:50.174373 1970 math_functions.cpp:411] Not Implemented Yet
*** Check failure stack trace: ***
@ 0x7fb2ad0255cd google::LogMessage::Fail()
@ 0x7fb2ad027433 google::LogMessage::SendToLog()
@ 0x7fb2ad02515b google::LogMessage::Flush()
@ 0x7fb2ad027e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fb2ad6ca9d0 caffe::caffe_cpu_sparse_dense2csr<>()
@ 0x7fb2ad7fe3c9 caffe::BaseConvolutionLayer<>::WeightAlign()
@ 0x7fb2ad69308b caffe::Net<>::CopyTrainedLayersFrom()
@ 0x7fb2ad69d5f5 caffe::Net<>::CopyTrainedLayersFromBinaryProto()
@ 0x7fb2ad69d68e caffe::Net<>::CopyTrainedLayersFrom()
@ 0x40c222 time()
@ 0x407520 main
@ 0x7fb2abf95830 __libc_start_main
@ 0x407d49 _start
@ (nil) (unknown)
CCNMM:
I0727 11:48:19.621824 24561 base_conv_layer.cpp:17] layer res_grp1_1_conv1 has sparsity of 1
I0727 11:48:19.622687 24561 base_conv_layer.cpp:61] ConvolutionParameter_ConvMode_LOWERED_CCNMM
I0727 11:48:19.622701 24561 base_conv_layer.cpp:80] concatenating weight matrix
I0727 11:48:19.622706 24561 base_conv_layer.cpp:88] res_grp1_1_conv1 left_cols=0 left_rows=0
I0727 11:48:19.622711 24561 base_conv_layer.cpp:91] squeezing weight matrix
I0727 11:48:19.622715 24561 base_conv_layer.cpp:102] res_grp1_1_conv1 squeezing to 0x0
F0727 11:48:19.622720 24561 blob.cpp:131] Check failed: data_
*** Check failure stack trace: ***
@ 0x7f18331595cd google::LogMessage::Fail()
@ 0x7f183315b433 google::LogMessage::SendToLog()
@ 0x7f183315915b google::LogMessage::Flush()
@ 0x7f183315be1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f18337ea41b caffe::Blob<>::mutable_cpu_data()
@ 0x7f18339330e5 caffe::BaseConvolutionLayer<>::WeightAlign()
@ 0x7f18337c709b caffe::Net<>::CopyTrainedLayersFrom()
@ 0x7f18337d1605 caffe::Net<>::CopyTrainedLayersFromBinaryProto()
@ 0x7f18337d169e caffe::Net<>::CopyTrainedLayersFrom()
@ 0x40c222 time()
@ 0x407520 main
@ 0x7f18320c9830 __libc_start_main
@ 0x407d49 _start
@ (nil) (unknown)
Best Regards,
Leo
Operating system: Ubuntu 16.04
Compiler:
CUDA version (if applicable): 9.0
CUDNN version (if applicable): 5
BLAS: MLK
Python or MATLAB version (for pycaffe and matcaffe respectively):
Evaluating your uploaded models in general caffe (https://github.com/BVLC/caffe/tree/master/src/caffe) results in following:
I0503 16:43:44.044796 19898 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_SSL_0.4259.caffemodel
I0503 16:43:48.726794 19898 caffe.cpp:251] Starting Optimization
I0503 16:43:48.726846 19898 solver.cpp:279] Solving AlexNet
I0503 16:43:48.726857 19898 solver.cpp:280] Learning Rate Policy: step
I0503 16:43:48.751430 19898 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 16:43:49.218724 19898 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:44:20.341884 19898 solver.cpp:404] Test net output #0: loss = 2.91344 (* 1 = 2.91344 loss)
I0503 16:44:20.342052 19898 solver.cpp:404] Test net output #1: top1_accuracy = 0.40986
I0503 16:44:20.342062 19898 solver.cpp:404] Test net output #2: top5_accuracy = 0.658339
I0503 16:51:35.119240 26706 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_SSL_0.4469.caffemodel
I0503 16:51:43.908475 26706 caffe.cpp:251] Starting Optimization
I0503 16:51:43.908524 26706 solver.cpp:279] Solving AlexNet
I0503 16:51:43.908535 26706 solver.cpp:280] Learning Rate Policy: step
I0503 16:51:43.916864 26706 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 16:51:44.117143 26706 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:52:13.693686 26706 solver.cpp:404] Test net output #0: loss = 2.72754 (* 1 = 2.72754 loss)
I0503 16:52:13.693859 26706 solver.cpp:404] Test net output #1: top1_accuracy = 0.43084
I0503 16:52:13.693868 26706 solver.cpp:404] Test net output #2: top5_accuracy = 0.67862
I0503 16:50:10.291841 25409 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_L1_0.4251.caffemodel
I0503 16:50:12.619544 25409 caffe.cpp:251] Starting Optimization
I0503 16:50:12.619585 25409 solver.cpp:279] Solving AlexNet
I0503 16:50:12.619592 25409 solver.cpp:280] Learning Rate Policy: step
I0503 16:50:12.626791 25409 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 16:50:12.895716 25409 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:50:50.041647 25409 solver.cpp:404] Test net output #0: loss = 3.26702 (* 1 = 3.26702 loss)
I0503 16:50:50.041822 25409 solver.cpp:404] Test net output #1: top1_accuracy = 0.36774
I0503 16:50:50.041837 25409 solver.cpp:404] Test net output #2: top5_accuracy = 0.61448
I0503 22:01:39.975592 8616 caffe.cpp:155] Finetuning from /home/leo/Downloads/caffenet_L1_0.4467.caffemodel
I0503 22:01:46.399006 8616 caffe.cpp:251] Starting Optimization
I0503 22:01:46.399055 8616 solver.cpp:279] Solving AlexNet
I0503 22:01:46.399062 8616 solver.cpp:280] Learning Rate Policy: step
I0503 22:01:46.405417 8616 solver.cpp:337] Iteration 0, Testing net (#0)
I0503 22:01:47.340291 8616 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 22:02:16.862293 8616 solver.cpp:404] Test net output #0: loss = 2.76222 (* 1 = 2.76222 loss)
I0503 22:02:16.862720 8616 solver.cpp:404] Test net output #1: top1_accuracy = 0.42278
I0503 22:02:16.862746 8616 solver.cpp:404] Test net output #2: top5_accuracy = 0.673259
layer
{
name: "top5_accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "top5_accuracy"
accuracy_param {
top_k: 5
}
include {
phase: TEST
}
}
layer {
name: "top1_accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "top1_accuracy"
include {
phase: TEST
}
}
To make sure imagenet data is well prepared, I evaluate the alexnet model (https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet) and get following:
I0503 16:45:40.494272 21305 blocking_queue.cpp:50] Data layer prefetch queue empty
I0503 16:46:14.405321 21305 solver.cpp:404] Test net output #0: loss = 1.86402 (* 1 = 1.86402 loss)
I0503 16:46:14.405730 21305 solver.cpp:404] Test net output #1: top1_accuracy = 0.56822
I0503 16:46:14.405750 21305 solver.cpp:404] Test net output #2: top5_accuracy = 0.799561
Operating system: ubuntu 16.04
Compiler:
CUDA version (if applicable): 8.0
CUDNN version (if applicable):
BLAS:
Python or MATLAB version (for pycaffe and matcaffe respectively):
In GPU mode with conv_mode: LOWERED_CCNMM
, we need to first remove all-zero columns and rows in the feature map matrix col_buffer_
. This concatenation process is temporally using the corresponding CPU routine.
We plan to substitute it with a GPU routine. Please pull request if anyone implements this.
Code branch: https://github.com/wenwei202/caffe/tree/scnn
I am getting the following errors after running python python/nn_decomposer.py
raceback (most recent call last):
File "../python/nn_decomposer.py", line 194, in
net_msg.layer._values.insert(layer_idx,low_rank_layer)
AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute '_values'
Traceback (most recent call last):
File "../python/nn_decomposer.py", line 194, in
net_msg.layer.insert(layer_idx,low_rank_layer)
AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'insert'
i tried protobuf version 2.3, 2.5 and 3.3
what could possibly be wrong at my end, please help me.
Where we can get below files?
lenet_0.9917.caffemodel.h5
bvlc_alexnet.caffemodel.h5?
Hi @wenwei202 ,
I'm trying to apply low rank approximation to object detection model similar to SSD. But while converting the model to HDF5 using caffemodel_converter.py I'm getting below error:
I1114 15:37:20.404398 4379 net.cpp:916] Layer namestem1_stem1/relu_0_split HDF5-DIAG: Error detected in HDF5 (1.8.16) thread 140026632021760: #000: ../../../src/H5G.c line 314 in H5Gcreate2(): unable to create group major: Symbol table minor: Unable to initialize object #001: ../../../src/H5Gint.c line 194 in H5G__create_named(): unable to create and link to group major: Symbol table minor: Unable to initialize object #002: ../../../src/H5L.c line 1638 in H5L_link_object(): unable to create new link to object major: Links minor: Unable to initialize object #003: ../../../src/H5L.c line 1882 in H5L_create_real(): can't insert link major: Symbol table minor: Unable to insert object #004: ../../../src/H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed major: Symbol table minor: Object not found #005: ../../../src/H5Gtraverse.c line 755 in H5G_traverse_real(): component not found major: Symbol table minor: Object not found F1114 15:37:20.404465 4379 net.cpp:919] Check failed: layer_data_hid >= 0 (-1 vs. 0)
The model I'm using has a Dense block as described in https://arxiv.org/abs/1608.06993,
not sure if that could be the issue. I checked related issues online where people suggested the issue might be because of duplicate layer names. But I checked it was not the case with my model. If you have any suggestion please let me know.
Operating system: Ubuntu 16.04.1
Compiler:
CUDA version (if applicable): Cuda-8.0
CUDNN version (if applicable): Cudnn-5.0.5
BLAS: Atlas
Python or MATLAB version (for pycaffe and matcaffe respectively):pycaffe
I have a alexnet caffemodel with zero-column and zero-row weights. Using conv_mode: LOWERED_CCNMM
, I got speedup on CPU (like structured sparsity=75%, speedup=3.1x), but on GPU, there is no speedup at all, what should I do to get speedup on GPU? I use the build/tools/caffe time
tool to evaluate inference time. Anyone know sth. about this? thx a lot !!
I have compiled successfully the codes of caffe in my computer,running examples! But when compiling modified codes,I encounter problems.
the problem as follow:
./include/caffe/util/cudnn.hpp:113:70: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
I should how to solve it?
ubuntu 16,cuda9.0,cuda compilation tools 9.0
Do I need to alter my train_val.prototxt file to take advantage of sparsity, or will this version of caffe implement sparsity for me?
For comparison, I took a model I had previously trained using vanilla caffe and retrained it with this caffe library. Training executed and the model learned successfully, but there was no speed-up observed during inference. So, I'm wondering if I need to make any changes to my train prototxt file to take advantage of the sparsity operations, since I didn't see any special instructions in the project readme.
Thank you.
In the code, what are their meaning that LOWERED_CSRMM,LOWERED_CCNMM and DIRECT_SCONV?
In the process of making sparsification, whether you use existing library, example mkl on CPU and CUDA on GPU
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.