Giter Club home page Giter Club logo

Comments (9)

xijunjun avatar xijunjun commented on July 28, 2024 3

一开始未做任何改动,完全按照步骤生成数据,然后训练报错,维度不匹配,根据相关问题里的解答,将reshape里的24和timestep的24都改为32后(24应该是宽度为96时的设置,而当前的图片宽度是128),可以顺利train起来,并在训练集和验证集都达到99的正确率。但是测试训练集图片时的输出都完全不对。需要修改两个地方:1.默认的验证码数据的单个字符仅包含0-9再加上空白标记总共11个,所以blank_label: 10,alphabet_size: 11,fc 层num_output: 11。2.训练时图片数据是没有归一化到0-1的,而recognition.cpp中sample_resized.convertTo(sample_float, CV_32FC3)将图片像素值归一化到0-1,将其改为sample_resized.convertTo(sample_float, CV_32FC3, 1/255.0)。

from crnn.caffe.

 avatar commented on July 28, 2024

@plastic0313 我也有这样的问题,请问您的解决了么

from crnn.caffe.

yalecyu avatar yalecyu commented on July 28, 2024

@plastic0313 @greatgeekgrace 因为的自己的分类个数是74,你要参考你自己的分类个数,更改generate_dataset.py num_output alphabet_size等参数。

from crnn.caffe.

 avatar commented on July 28, 2024

@yalecyu 好的,非常感谢~~~目前预测的captcha图片(图片为0642)结果:
74 74 0 74 6 74 4 74 2 74 74 74 74 74 74 74 74 74 74 74 - - - -
看起来结果有24个字符(上面的结果一个一个数的),可是在crnn.prototxt和deploy.prototxt中num_output设置的是75,是怎么回事呢。而且还出现了-符号?

from crnn.caffe.

yalecyu avatar yalecyu commented on July 28, 2024

@xijunjun 对,主要注意的是,因为另一个OCR的项目,我更改了prototxt的配置,没有用生成数据集验证是否维度匹配。另一个需要注意的就是0-1和0-255,但是没有验证过,只是README里面给出提示。有时间补了这些坑。

from crnn.caffe.

dingtao1 avatar dingtao1 commented on July 28, 2024

@xijunjun 哪儿可以知道训练的时候是归一化的?convertTo(sample_float, CV_32FC3, 1/255.0)这个函数的作用不是归一化吗?

from crnn.caffe.

xijunjun avatar xijunjun commented on July 28, 2024

@dingtao1 我是看了下数据制作代码和数据输入层参数

from crnn.caffe.

yjtan118 avatar yjtan118 commented on July 28, 2024

Hi, sorry for posting on an old discussion, but i need some help or hints as I can't seem to get consistent and correct results for my own trained crnn model after following all the steps. I ported the Linux code and compiled this on Windows and Visual Studio 2017 compiler. I managed to compile the codes successfully after making some changes, but I supposed this shouldn't affect the results.

  1. First I generated dataset using generate_captcha.py. Total image size is 50,000.

  2. Then execute generate_dateset.py.
    IMAGE_WIDTH, IMAGE_HEIGHT = 128, 32.
    Training size = 40,000 and Test size = 10,000.

  3. In my crnn.prototxt, I changed batch size to 50 to cater for my GPU which only have 2 MB memory. I changed the following as well:
    layer {
    name: "reshape"
    type: "Reshape"
    bottom: "conv6"
    top: "reshape"
    reshape_param {
    shape {
    #nc(w*h)
    dim: 50
    dim: 512
    dim: 32
    }
    }
    }
    layer {
    name: "indicator"
    type: "ContinuationIndicator"
    top: "indicator"
    continuation_indicator_param {
    time_step: 32
    batch_size: 50
    }
    }
    layer {
    name: "ctc_loss"
    type: "CtcLoss"
    bottom: "fc1"
    bottom: "label"
    top: "ctc_loss"
    loss_weight: 1.0
    ctc_loss_param {
    blank_label: 10
    alphabet_size: 11
    time_step: 32
    }
    }
    layer {
    name: "accuracy"
    type: "LabelsequenceAccuracy"
    bottom: "premuted_fc"
    bottom: "label"
    top: "accuracy"
    labelsequence_accuracy_param {
    blank_label: 10
    }
    }
    I managed to get over 0.95 accuracy for both test and train data. Loss seems to be on the low side as well (0.00x).

  4. Next, I change the deploy.prototxt:
    name: "crnn"
    layer {
    name: "data"
    type: "Input"
    top: "data"
    input_param {shape:{dim:1 dim:3 dim:32 dim:128}}
    }
    layer {
    name: "reshape"
    type: "Reshape"
    bottom: "conv6"
    top: "reshape"
    reshape_param {
    shape {
    #nc(w*h)
    dim: 1
    dim: 512
    dim: 32
    }
    }
    }
    layer {
    name: "indicator"
    type: "ContinuationIndicator"
    top: "indicator"
    continuation_indicator_param {
    time_step: 32
    batch_size: 1
    }
    }
    layer {
    name: "fc1"
    type: "InnerProduct"
    bottom: "lstm2"
    top: "fc1"
    param {
    lr_mult: 1
    decay_mult: 1
    }
    param {
    lr_mult: 2
    decay_mult: 0
    }
    inner_product_param {
    num_output: 11
    axis: 2
    weight_filler {
    type: "xavier"
    }
    bias_filler {
    type: "constant"
    value: 0
    }
    }
    }

  5. I also amend the recognition.cpp to include the normalization:
    if (num_channels_ == 3)sample_resized.convertTo(sample_float, CV_32FC3, 1.f/255);
    else sample_resized.convertTo(sample_float, CV_32FC1, 1.f/255);

for the output, when i run the recognition exe such as below:
recognition D:\ImageProc\ImgDataset\Data\Captcha\49998-7959.png D:\Lib\caffecrnn\examples\crnn\deploy.prototxt D:\Lib\caffecrnn\examples\crnn\model\crnn_captcha_iter_3600.caffemodel
i can't get consistent results from the model each time, and i can't get an accurate output as well:
Output that I get if i run it for three times:
8 9 8 8 8 8 8 8 8 8 8 8 8 8 8 4 4 4 4 4 1 1 1 - 1 1 1 0 0 - 1 2

6 6 - - 9 9 - - - - - - - - - - - - 6 6 6 6 6 6 6 6 6 6 6 6 3 3

1 1 2 1 1 8 8 8 8 1 - 1 1 1 1 1 1 1 1 7 7 7 7 7 7 7 7 7 7 7 6 1

Anyone have any hints or detected where I have a mistake? Anyone managed to get accurate output from the trained model?

Please help! Thank you.

from crnn.caffe.

BarryKCL avatar BarryKCL commented on July 28, 2024

(数字+英文字母)测试图从BGR转RGB可以解决训练过程中测试准确率很高,但是cpp_recognition输出结果不对的问题!!!
#~~~~~~~~~~~~~~~~~~原因如下~~~~~~~~~~~~~~~~~~~~#
我们做数据的时候:img = caffe.io.load_image(os.path.join(img_path, image))
caffe.io里面:img = skimage.img_as_float(skimage.io.imread(filename, as_grey=not color)).astype(np.float32)
问题所在:cv2的存储格式是BGR,而skimage的存储格式是RGB(recognition.cpp里面的读图是用opencv,使用cv::cvtColor(resizeimg, resizeimg, cv::COLOR_BGR2RGB);)

from crnn.caffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.