Giter Club home page Giter Club logo

warpctc-caffe's Introduction

Caffe-With-Warpctc

CTC Loss is used in sequence learning. The repo merges WarpCTC which is implmented and maintained by Baidu Research into Caffe.

There is a toy demo in examples/warpctc_captcha, which can train a 2-layer lstm model to recongnize the captcha in an image. To run the demo, you should first generate the dataset for training and validating with the python scripts, then it is an ordinary tranning procedure using Caffe.

Here is also a similar repo implemented by PyTorch. See captcha-recognition for detail.

This repo is a personal project.

Issue when building the project

Please comment the following lines in Makefile.config, otherwise it may cause gradient explosion when training because old CUDA runtime library doesn't support __shlf_down used in Baidu's implementation. See issue 1 for detail(discussion in Chinese).

-gencode arch=compute_20,code=sm_20 
-gencode arch=compute_20,code=sm_21

How to run the demo

In this demo, captcha images can contain digit sequence with different length(more specifically, 1~5 digits). CTC loss is very suitable for this kind of variable length sequence learning. See the three images below for detail of the examples in the demo.

captcha image with 3 digits captcha image with 4 digits captcha image with 5 digits

The original WarpCTC by Baidu Research supports multi-thread processing when using CPU. However it is not supported by this repo. So GPU is necessary for the following experiment, otherwise the running time will be unbearably long. Or you can reduce the dataset size to save time.

To run the demo, first, make sure you are in $CAFFE_ROOT directory. Then, run the scripts to generate data using python captcha library and hdf5 files for training and testing.

# generate data
python examples/warpctc_captcha/generate_captcha.py
# generate hdf5 files
python examples/warpctc_captcha/generate_dataset.py

Due to different hardware capabilities, this process may take a different time. Then you should find captcha images in directory $CAFFE_ROOT/data/captcha. You can change the parameters in the above two scripts to get larger dataset and use more threads to accerate the process.

Then, you can run the bash script to train the 2-layer lstm model using ctc loss.

./examples/warpctc_captcha/train.sh

Have a cup of coffee when training!

Demo results

I ran the demo for several times and the model can converge finally. The accuracy of the model is not too high, but enough to prove the power of the naive 2-layer lstm network trained with CTC loss.

trainning loss result test loss result test accuracy

The model I trainned can be downloaded from Google Drive.

Check ./examples/warpctc_captcha/captcha_prediction.cpp for deployment. It is modified based on examples/cpp_classification/classification.cpp.

warpctc-caffe's People

Contributors

blgene avatar cdluminate avatar cypof avatar dgolden1 avatar ducha-aiki avatar eelstork avatar erictzeng avatar flx42 avatar jamt9000 avatar jeffdonahue avatar jyegerlehner avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mohomran avatar mtamburrano avatar netheril96 avatar philkr avatar qipeng avatar rbgirshick avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar ste-m5s avatar tnarihi avatar xmfbit avatar yangqing avatar yosinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

warpctc-caffe's Issues

make all err

我在官方的caffe的docker环境中下载了您的代码,
在cpu环境下面,我直接使用您的代码进行编译成功了
但是在gpu环境下面,我编译的时候,在make all的时候出现了以下错误:
CXX src/caffe/data_transformer.cpp
NVCC src/caffe/3rdparty/reduce.cu
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
src/caffe/3rdparty/reduce.cu(44): error: identifier "__shfl_down" is undefined
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::add<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::negate<float, float>, Rop=ctc_helper::add<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::negate<float, float>, Rof=ctc_helper::add<float, float>]"
(149): here

src/caffe/3rdparty/reduce.cu(44): error: identifier "__shfl_down" is undefined
detected during:
instantiation of "T CTAReduce<NT, T, Rop>::reduce(int, T, CTAReduce<NT, T, Rop>::Storage &, int, Rop) [with NT=128, T=float, Rop=ctc_helper::maximum<float, float>]"
(76): here
instantiation of "void reduce_rows<NT,Iop,Rop,T>(Iop, Rop, const T *, T *, int, int) [with NT=128, Iop=ctc_helper::identity<float, float>, Rop=ctc_helper::maximum<float, float>, T=float]"
(124): here
instantiation of "void ReduceHelper::impl(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(139): here
instantiation of "ctcStatus_t reduce(Iof, Rof, const T *, T *, int, int, __nv_bool, cudaStream_t) [with T=float, Iof=ctc_helper::identity<float, float>, Rof=ctc_helper::maximum<float, float>]"
(157): here

2 errors detected in the compilation of "/tmp/tmpxft_00000e20_00000000-14_reduce.compute_20.cpp1.ii".
Makefile:594: recipe for target '.build_release/cuda/src/caffe/3rdparty/reduce.o' failed
make: *** [.build_release/cuda/src/caffe/3rdparty/reduce.o] Error 1

如果您遇到过,或者知道该如何解决,王不吝赐教~

CtcLoss的输入?

CtcLoss的两个输入中,fc1的输出是80×512×11的,那label的维度呢?
输入的时候我看到label是512×5的,没有看到经过哪一步骤改变啊。。。

编译以及训练的一些问题

你好,我发现编译的时候找不到 __shlf_down这个函数,查询后发现这个是cuda的问题,可是我的环境是Ubuntu14.04 + cudnn V5 + cuda 7.5,理论上已经够新的了。后来在网上找到了这个解决方案,https://github.com/parallel-forall/code-samples/blob/master/posts/parallel_reduction_with_shfl/fake_shfl.h 替换掉__shlf_down, 之后就可以编译了,但是有一个问题,就是不知道这两个函数是否真的是一样的。

另外在训练验证码的例子时,发现一直不收敛,不知道你在训练的时候是不是也这样很难收敛?或者有什么技巧吗?谢谢

some question about lstm

你好,我有一个问题想问你,就是关于lstm层,我看了prototxt,lstm的输入是permute后的,形状是[80,1,3,30],然后训练出来这一层的h是40090,c是400,权重是400100,相当于100100(因为有4组权重),那么输入的[80,1,3,30]如何和h40090合并,然后合并之后又如何和100*100做卷积运算?

编译ctc_loss_layer.cu出错

出错信息如下:
NVCC src/caffe/3rdparty/reduce.cu
src/caffe/layers/ctc_loss_layer.cu: In instantiation of ‘void caffe::CtcLossLayer::Forward_gpu(const std::vector<caffe::Blob>&, const std::vector<caffe::Blob>&) [with Dtype = float]’:
src/caffe/layers/ctc_loss_layer.cu:61:135: required from here
src/caffe/layers/ctc_loss_layer.cu:14:17: error: ‘std::initializer_list<_Tp> options’ has incomplete type
auto options = ctcOptions{};
^
make: *** [.build_release/cuda/src/caffe/layers/ctc_loss_layer.o] Error 1
make: *** Waiting for unfinished jobs....

wrong result

Thanks for your contribution。When I use your code to train captach follow your steps, I can see the loss is lower and accuracy is good from log. But When I test single captach Image with captcha_prediction.cpp, the result is wrong and Unbelievable,Why? Thank you

warpctc速度

您好,请问您有没有对比过warpctc和原始的ctc速度上的差异

mac 下编译问题

➜  warpctc-caffe git:(zxdev) ✗ make -j8
CXX src/caffe/layers/ctc_loss_layer.cpp
CXX src/caffe/layers/eltwise_layer.cpp
CXX src/caffe/layers/elu_layer.cpp
CXX src/caffe/layers/embed_layer.cpp
CXX src/caffe/layers/euclidean_loss_layer.cpp
CXX src/caffe/layers/exp_layer.cpp
CXX src/caffe/layers/filter_layer.cpp
CXX src/caffe/layers/flatten_layer.cpp
src/caffe/layers/ctc_loss_layer.cpp:84:23: error: no member named 'accumulate' in namespace 'std'
    Dtype loss = std::accumulate(cost, cost + mini_batch, Dtype(0));
                 ~~~~~^
1 error generated.
make: *** [.build_release/src/caffe/layers/ctc_loss_layer.o] Error 1
make: *** Waiting for unfinished jobs....

build issue

[ 1%] Built target proto
[ 1%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile.dir/layers/cuda_compile_generated_ctc_loss_layer.cu.o
/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(14): error: explicit type is missing ("int" assumed)

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(14): error: type name is not allowed

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(14): error: expected a ";"

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(15): error: expression must have class type

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(16): error: expression must have class type

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(17): error: expression must have class type

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(26): error: no suitable constructor exists to convert from "int" to "ctcOptions"

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(37): error: namespace "std" has no member "accumulate"

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(41): error: expression must have class type

/root/warpctc-caffe/src/caffe/layers/ctc_loss_layer.cu(32): error: no suitable constructor exists to convert from "int" to "ctcOptions"
detected during instantiation of "void caffe::CtcLossLayer::Forward_gpu(const std::vector<caffe::Blob *, std::allocator<caffe::Blob *>> &, const std::vector<caffe::Blob *, std::allocator<caffe::Blob *>> &) [with Dtype=float]"
(61): here

10 errors detected in the compilation of "/tmp/tmpxft_000058bc_00000000-6_ctc_loss_layer.cpp1.ii".
CMake Error at cuda_compile_generated_ctc_loss_layer.cu.o.cmake:266 (message):
Error generating file
/root/warpctc-caffe/Release/src/caffe/CMakeFiles/cuda_compile.dir/layers/./cuda_compile_generated_ctc_loss_layer.cu.o

src/caffe/CMakeFiles/caffe.dir/build.make:16157: recipe for target 'src/caffe/CMakeFiles/cuda_compile.dir/layers/cuda_compile_generated_ctc_loss_layer.cu.o' failed
make[2]: *** [src/caffe/CMakeFiles/cuda_compile.dir/layers/cuda_compile_generated_ctc_loss_layer.cu.o] Error 1
CMakeFiles/Makefile2:272: recipe for target 'src/caffe/CMakeFiles/caffe.dir/all' failed
make[1]: *** [src/caffe/CMakeFiles/caffe.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

error: ‘accumulate’ is not a member of ‘std’

CXX src/caffe/layers/flatten_layer.cpp
CXX src/caffe/layers/batch_norm_layer.cpp
src/caffe/layers/ctc_loss_layer.cpp: In member function ‘virtual void caffe::CtcLossLayer::Forward_cpu(const std::vector<caffe::Blob>&, const std::vector<caffe::Blob>&)’:
src/caffe/layers/ctc_loss_layer.cpp:86:17: error: ‘accumulate’ is not a member of ‘std’
Dtype loss =std::accumulate(cost, cost + mini_batch, Dtype(0));
^
Makefile:580: recipe for target '.build_release/src/caffe/layers/ctc_loss_layer.o' failed
make: *** [.build_release/src/caffe/layers/ctc_loss_layer.o] Error 1
make: *** Waiting for unfinished jobs....

能否给一个测试的demo呢

训练过程没有问题,也生成了相应的model,但对这块儿东西不熟,能否提供一个测试的网络及程序呢

build error

Makefile:587: recipe for target '.build_release/src/caffe/proto/caffe.pb.o' failed
make: *** [.build_release/src/caffe/proto/caffe.pb.o] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.