sierkinhane / crnn_chinese_characters_rec Goto Github PK

(CRNN) Chinese Characters Recognition.

Python 100.00%

crnn_chinese_characters_rec's Introduction

Characters Recognition

A Chinese characters recognition repository based on convolutional recurrent networks. (Below please scan the QR code to join the wechat group.)

Performance

Recognize characters in pictures

Dev Environments

WIN 10 or Ubuntu 16.04
PyTorch 1.2.0 (may fix ctc loss) with cuda 10.0 🔥
yaml
easydict
tensorboardX

Data

Synthetic Chinese String Dataset

Download the dataset
Edit lib/config/360CC_config.yaml DATA:ROOT to you image path

    DATASET:
      ROOT: 'to/your/images/path'

Download the labels (password: eaqb)
Put char_std_5990.txt in lib/dataset/txt/
And put train.txt and test.txt in lib/dataset/txt/

eg. test.txt

    20456343_4045240981.jpg 89 201 241 178 19 94 19 22 26 656
    20457281_3395886438.jpg 120 1061 2 376 78 249 272 272 120 1061
    ...

Or your own data

Edit lib/config/OWN_config.yaml DATA:ROOT to you image path

    DATASET:
      ROOT: 'to/your/images/path'

And put your train_own.txt and test_own.txt in lib/dataset/txt/

eg. test_own.txt

    20456343_4045240981.jpg 你好啊！祖国！
    20457281_3395886438.jpg 晚安啊！世界！
    ...

note: fixed-length training is supported. yet you can modify dataloader to support random length training.

Train

   [run] python train.py --cfg lib/config/360CC_config.yaml
or [run] python train.py --cfg lib/config/OWN_config.yaml

#### loss curve

```angular2html
   [run] cd output/360CC/crnn/xxxx-xx-xx-xx-xx/
   [run] tensorboard --logdir log

loss overview(first epoch)

Demo

   [run] python demo.py --image_path images/test.png --checkpoint output/checkpoints/mixed_second_finetune_acc_97P7.pth

References

crnn_chinese_characters_rec's People

Contributors

Stargazers

Watchers

Forkers

aurora11111 hanesier beimingmaster alwc infinitisun luwei6896 yudmoe yangjx54 lovaya liben2018 zgsxwsdxg stephenchen625 gavin666github sherryshall windowxiaoming linecode whaozl chenjun2hao dreadlord1984 kspook zp994188707 gitbruce lihengtianxia batermj xgmiao hongminli stevenliy roughsoft happog jacke121 zhra46 yst8493182 wangyanna1991 ht-dep jeffrey98-ai helloheshee 15802662151 changwh qiaoxie gzzhao etrigger blankxyz wangxiaocao fresty shuguang2014 slidelucask lxychuanhai caozhengquan userame tbfly fendaq zosimer wangguoweimmg ixhorse courao lironghua318 hansonsun robertwang2 xuqingquan789 microphoneben zacklin923 xuweitj bylonluo carinalyn123 renhuaqiang hwwu gs0000 sxin-h qqgeogor wangshuai9517 10183308 wwwanghao sunyancn storms0 jangocheng dzyink ahuirecome roec centosrhel loovelj chapzq77 billyzju lonelygo yangyang diesel790529 wuxiaolianggit suchaoxiao mingewang llforest zhenxing9968 2snoopy88 980044579 robbertonly root1369 roxbili sunxingxingtf foye501 2017tjm jiacrash2580 airob

crnn_chinese_characters_rec's Issues

编码问题

loading pretrained model from trained_models/mixed_second_finetune_acc97p7.pth
Traceback (most recent call last):
  File "test.py", line 73, in <module>
    crnn_recognition(image, model)
  File "test.py", line 56, in crnn_recognition
    print('results: {0}'.format(sim_pred))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-46: ordinal not in range(128)

训练 loss一直很大，训练效果非常不理想

根据一百种字体生成将近50万张图片，45万张作为训练集，5万张作为测试集，训练loss 一直徘徊在20.几左右。训练完了，除了训练集里面的图片能识别，其他同样字体图片都不能识别。我是需要调整什么参数呢

关于windows下识别图片

我把模型放在了windows下，识别训练的图片，发现没有识别出来。是因为在linux下训练的模型，在windows下就会出现问题吗？
原图

识别结果：

关于英文识别中的空格问题

@haneSier @Sierkinhane 如何识别空格，是应该在训练集合中加入空格，作为label训练么？这样这个空格与ctc loss中的空格是否有影响，实际预测需要输出空格，该怎么做？

yue@yue-Vostro-3668:/crnn_chinese_characters_rec$ cd warp-ctc/pytorch_binding
yue@yue-Vostro-3668:/crnn_chinese_characters_rec/warp-ctc/pytorch_binding$ python setup.py install
Torch was not built with CUDA support, not building warp-ctc GPU extensions.
generating build/warpctc_pytorch/_warp_ctc/__warp_ctc.c
(already up-to-date)
not modified: 'build/warpctc_pytorch/_warp_ctc/__warp_ctc.c'
running install
Checking .pth file support in /usr/local/lib/python3.5/dist-packages/
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: '/usr/local/lib/python3.5/dist-packages/test-easy-install-8801.pth'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/usr/local/lib/python3.5/dist-packages/

Perhaps your account does not have write access to this directory? If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account. If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.

For information on other options, you may wish to consult the
documentation at:

https://pythonhosted.org/setuptools/easy_install.html

Please make the appropriate changes for your system and try again.

关于训练集的几个问题

想请教下你的mixed_second_finetune_acc97p7.pth 模型文件是基于360万张开源数据集+自己本地脚本生成的，一些本地常用文字组合的数据集一次训练出来的呢还是基于开源数据集训练生成的模型再而二次训练生成的？
还有一个问题是我试了一下你的模型文件对于白底黑字识别精度挺好的但是对于复杂一些的背景识别结果一般，和网上一些其他开源的模型文件识别结果有差距是因为你用的360万张数据集本身的背景不够多样化导致的吗

您好请问，识别这样逆时针旋转90度字体可以吗？需要改什么吗？

运行python tolmdb.py

当运行这个文件，我是python3.6 其他环境和作者是一样的。
运行的时候报了这个错误：
File "/home/hyc/anaconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
怎么才能成功运行呢

训练出问题

为什么每训练一段时间，就会周期性出现如下错误，但是不影响训练进程
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fa67ff97f60>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 737, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

有关训练的问题

您好，我想请问下，我采用了同样的360万的数据集和标签文件，其它参数也一致，训练的时候准确率也可以达到98%，但是在另外生成的白底黑字的数据上进行测试准确率只有六十几，而且和您给的模型准确率也有所差距，请问下您有对数据进行其它什么额外的处理，还有就是另外生成的测试和训练数据非常相似，但是准确率确远不如训练的精度是什么原因呢？可以的话能否指点一下，谢谢！

请教一些问题

我是inter显卡，这个该怎么运行起来

现在ctc 都配置完成了，运行报错：
cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at torch/csrc/cuda/Module.cpp:51
不知道有没有解决方案

多gpu

我在使用crnn.pytorch进行训练，结果多gpu总是出问题，能请问一下你是多gpu训练的吗？
而且我看crnn_main.py里面程序也是又问题的，param对应不上，所以这是您最后执行成功的代码吗？

出现一个RuntimeError

RuntimeError: Error(s) in loading state_dict for CRNN:
size mismatch for rnn.1.embedding.bias: copying a param of torch.Size([19997]) from checkpoint, where the shape is torch.Size([6736]) in current model.
size mismatch for rnn.1.embedding.weight: copying a param of torch.Size([19997, 512]) from checkpoint, where the shape is torch.Size([6736, 512]) in current model.
打扰了，刚接触这方面的问题，跑程序的时候遇到了这个问题。请问有什么解决办法吗？

请问10万张图片生成lmdb文件，需要多久？

使用lmdb的方式读取数据，与直接读取数据那个更快捷方便？

对倾斜或弯曲字体的识别效果很差

@Sierkinhane 你好，我使用了你训练好的模型，对横向的且清晰的文字识别率较高，但是对带有倾斜、弯曲或变形的文字识别效果很差。我尝试了对图片进行预处理矫正再使用模型预测，结果还是很差，那我是不是要在生成样本时加入一些倾斜弯曲的操作，我看你的generator.py里面没有补全旋转、噪声和字体拉伸函数。

旋转函数

def rotate_func():
pass

噪声函数

def random_noise_func():
pass

字体拉伸函数

def stretching_func():
pass

关于test.py图像缩放参数问题

作者您好我想问一下test.py 对图像进行缩放w的取值问题，为什么有个280/160 这个能解释一下么

训练对GPU资源的要求？报内存不足

作者跑300万数据集用的GPU资源要多少？
我有一个2万张图片的数据集，做成lmdb格式也就800M，跑起来CPU模式报内存泄漏，GPU模式报Cuda计算错误——而且两者有时都是跑了一两个batch才出错的，怀疑是中间变量过大导致——把batch size从16改为1，GPU就可以正常跑了，但这样貌似没意义吧。。。
我的GPU用的是阿里云的P100，显存16G的

训练的模型与字体问题

作者你好，首先非常感谢你提供的已训练模型和代码，这里有两个问题想问你一下：

当时训练的时候是用的什么字体？是generator文件夹里面的几个吗？
我现在想在你的基础上增加一些特殊字符的识别，大概数据规模需要多少呢？另外，做修改的话是否只要在alphabets.py中进行更改，然后按照README进行训练就行？

麻烦您有空解答一下，感激不尽！

您好，请问用于训练的数据一定是280*32大小吗，可不可以是不定长宽？上面必须是10个字符吗？谢谢！

您好，请问您实现的是基于词典的转录还是无词典的转录？在代码的哪部分有体现？

finetune

请问您在finetune时使用生成的数据都是10个字符的长度的吗？

python3环境执行crnn_main.py报错

在制作lmdb时，tolmdb.py 28行报错：
TypeError: Won't implicitly convert Unicode to bytes; use .encode()

25 def writeCache(env, cache):
26 with env.begin(write=True) as txn:
27 for k, v in cache.items():
28 txn.put(k, v)

28修改成：txn.put(str(K).encode('utf-8'), str(v).encode('utf-8'))
可以继续执行，但是crnn_main.py训练时报错：
Corrupted image for 2488925
Corrupted image for 3213999
Corrupted image for 3214001
Corrupted image for 2488927
Corrupted image for 2488929
Corrupted image for 3214003
Corrupted image for 2488931
Corrupted image for 3214005
Corrupted image for 2488933
Corrupted image for 3214007
Corrupted image for 2488935
Corrupted image for 3214009
Corrupted image for 2488937
Corrupted image for 3214011
Corrupted image for 2488939

关于crnn_main脚本问题

您好，关于crnn_main脚本，第158行代码
image = torch.FloatTensor(params.batchSize, 3, params.imgH, params.imgH)
这里是有意要让图片调整到正方形的吗？

识别效果与字体大小有关吗？

我这边识别一张发票上的区域图片，完全识别不出来，我稍微resize得大一些就识别出来了，再大一些又识别得很差，请问是否和字体的大小有关呢?

训练需要多久？

我用的是csdn文章中提供的训练集。
修改了 batch_size = 128, worker = 8。
我的系统是Ubuntu 16.04 LTS, CPU I7 8Core，16G内存，带1块显卡Nvidia 1070（8G显存）。
训练了8个小时还没有结束，现在的准确度是95%。
Test loss: 0.232426, accuray: 0.959453

是不是我修改的worker或者batch_size太大了？

How to fine-tune?

Hi, thx for ur job. Could u pls tell me how to fine-tune with ur pre-train model?

AttributeError: Can't get attribute '_rebuild_tensor_v2'

AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/data8T/fangping/anaconda2/envs/pytorch_py36/lib/python3.6/site-packages/torch/_utils.py'>
我使用的pytorch版本为0.3.1,貌似是pytorch版本不对，请问你的版本是多少呢？

相反样本训练

今天突然想到一个问题，假设我在训练CRNN的时候，开始我使用的白底黑字的样本，后来我又训练黑底白字样本。两种样本的风格完全相反。这样会对模型的准确率产生不好的影响吗？

Tolmdb

图片和文字生成lmdb的时候，是用二进制读取图片很文字吗，如果tolmdb.py下
with open('.'+imagePath, 'r') as f:
imageBin = f.read()
如果不用‘rb’，读出来的图片，会说'%s is not a valid image' %

关于nclass的问题

中文一个字符占3个字节，比如说我有5个字符，那我的nclass不就是3×5+1=16了，这样的话标签的种类不就有16个，而不是3+1=4

图片尺寸问题

大佬您好，谢谢您的分享。想请教个问题，我的数据集的图片大小是不固定的，各种尺寸都有，那么我应该怎么使用您的代码？

训练时出现 KeyError: '\x00'

Start val
Traceback (most recent call last):
File "crnn_main.py", line 200, in
training()
File "crnn_main.py", line 117, in training
val(crnn, test_dataset, criterion)
File "crnn_main.py", line 57, in val
t, l = converter.encode(cpu_texts)
File "/home/OCR/crnn_train/crnn_chinese_characters_rec-master/utils.py", line 101, in encode
index = self.dict[char]
KeyError: '\x00'
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f425248dc88>>
Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in del
self._shutdown_workers()
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f47434f68d0>>
Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in del
self._shutdown_workers()
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
主要问题是出现char在dicts中找不到，找到报错的这行数据是“61688125_428659907.jpg 时，塔吉尔的手不由得”，并没有不在字典中的字符，请问这是什么问题造成的呢？
目前的训练集使用的是Synthetic_Chinese_String_Dataset，字典是char_std_5990.txt

多GPU版报错

把代码改成多GPU的train时候正常但是在val时候报错。上图用的是4GPU，数值差四倍

GPU代码加上了这句。
val代码加加上注释的那一句也报错

把 preds = preds.squeeze(2)注释掉后报错为

在制作数据集lmdb时候，提示有错误

Traceback (most recent call last):
File "tolmdb.py", line 90, in
createDataset(outputPath, imagePathList, labelList)
File "tolmdb.py", line 55, in createDataset
imageBin = f.read()
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Train loss < 1.0 but Test Loss > 10,acc is low

我训练的是英文字符串，17个字符。2W的训练+2K的val，lr=0.00005.

能方便发一下训练好的模型吗？

能方便发一下训练好的模型吗？自己的电脑条件不是很好，谢谢！

5990个字符的作用

我想请问一下就是说在char_std_5990.txt 有5990个字符，那么这么多的字符的作用是什么呢？是类别码？是将识别当作分类来做吗？

val_root 验证集路径

这个验证集路径是哪个，我仅仅是生成了 lmdb文件而已，不知道这个参数是什么意思

preprocessing

我想请问一下那个preprocessing.py 是干嘛用的，我跑了一下什么都没出现啊。

博客中最后一句关于不定长的解释是什么意思

（不定长识别是将训练集图片的放缩feed到神经网络中的尺寸应用到测试中，test.py已经标注！）

损失降下来之后突然又升上去了。。

昨天训练的，损失降到0到1之间，准确率大约在88%左右吧。今早起来看损失又变成了50多，准确率是0，这是怎么回事？=.=另外作者数据集的label全是你一个一个打的吗？

关于编译warp-ctc的问题

你好，我在训练的时候一直提示GPU execution requested, but not compiled with GPU support
然后我回头看warp-ctc cmake 的时候发现提示Building shared library with no GPU support 然而我cuda的路径已经导入了而且查看WITH_GPU的值是TRUE 但是一直进不去with GPU 的语句。
请问下这个是什么原因呢？谢谢

准确率一直为0，请问为什么，学习率调小了。每个标签集字符数必须一致吗（比如都是10个字符，标签集可不可以有不同长度的字符）

请帮忙看看，非常感谢

python test.py error

当我尝试运行这份代码时，test.py 有以下报错信息，疑惑？
我添加了map_location='cpu' 是这个原因吗？
model.load_state_dict(torch.load(crnn_model_path,map_location='cpu'))

报错信息：

python test.py ****
loading pretrained model from trained_models/mixed_second_finetune_acc97p7.pth
Traceback (most recent call last):
File "test.py", line 64, in
**model.load_state_dict(torch.load(crnn_model_path,map_location='cpu')) **
File "/home/seven/anaconda3/envs/chinese_ocr/lib/python2.7/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CRNN:
size mismatch for rnn.1.embedding.bias: copying a param of torch.Size([19997]) from checkpoint, where the shape is torch.Size([6736]) in current model.
size mismatch for rnn.1.embedding.weight: copying a param of torch.Size([19997, 512]) from checkpoint, where the shape is torch.Size([6736, 512]) in current model.

Runtime Error cuda out of memory occurs while the gpu memory is empty

Detailed error description::

Traceback (most recent call last):
File "crnn_main.py", line 193, in
training()
File "crnn_main.py", line 110, in training
cost = trainBatch(crnn, criterion, optimizer, train_iter)
File "crnn_main.py", line 96, in trainBatch
cost = criterion(preds, text, preds_size, length) / batch_size
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorch/init.py", line 82, in forward
self.length_average, self.blank)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorch/init.py", line 32, in forward
blank)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/utils/ffi/init.py", line 202, in safe_call
result = torch._C._safe_call(args, kwargs)
torch.FatalError: CUDA error: out of memory (allocate at /pytorch/aten/src/THC/THCCachingAllocator.cpp:510)
frame #0: THCudaMalloc + 0x79 (0x7f50f7b32e99 in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/lib/libcaffe2_gpu.so)
frame #1: gpu_ctc + 0x134 (0x7f50f61f92a4 in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorch/_warp_ctc/$
_warp_ctc.cpython-35m-x86_64-linux-gnu.so)
frame #2: + 0x1ad2 (0x7f50f61f8ad2 in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorc$
/_warp_ctc/__warp_ctc.cpython-35m-x86_64-linux-gnu.so)

frame #5: THPModule_safeCall(_object, _object, _object) + 0x4c (0x7f511e7a67cc in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/_C.cpython-35m-x86_64-l$
nux-gnu.so)
frame #8: python() [0x5401ef]
frame #11: python() [0x4ec358]
frame #14: THPFunction_apply(_object, _object) + 0x38f (0x7f511eb9383f in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so)
frame #18: python() [0x4ec3f7]
frame #22: python() [0x4ec2e3]
frame #24: python() [0x4fbfce]
frame #26: python() [0x574db6]
frame #31: python() [0x53fc97]
frame #33: python() [0x60cb42]
frame #38: __libc_start_main + 0xf0 (0x7f513430a830 in /lib/x86_64-linux-gnu/libc.so.6)
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f50ec9151d0>>
Traceback (most recent call last):
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused

I am using ::
cuda: 8.0
python: 3.5
pytourch : 0.4.1

I am getting error while using cuda. It is running fine on cpu.

model和log的保存

我想请问一下model 和 log 实训连结束后才保存到expr文件夹中的吗我想可视化loss曲线怎么办

模型训练的预处理

作者您好，我想请问下，我看了下模型训练的脚本，发现在对训练集做处理时。好像没有对图像做灰度处理，是我没有看到吗？

正确率

采用360万数据集,其他参数和你的一致,不管是从头训练还是finetune最后正确率都在50左右震荡是怎么回事

数据字符问题

您好我看这个360w数据集中的字符为6736.个，的数据集中有一些特殊繁体字，与alphabet.py中的字合并后成为9116个字。但是用这个新的alphabet.py文件后不能在测试成功。报以下错误

oading pretrained model from trained_models/mixed_second_finetune_acc97p7.pth
Traceback (most recent call last):
File "test.py", line 66, in
model.load_state_dict(torch.load(crnn_model_path))
File "/home/imc/XR/anaconda3/envs/crnn-chinese/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CRNN:
size mismatch for rnn.1.embedding.weight: copying a param with shape torch.Size([6736, 512]) from checkpoint, the shape in current model is torch.Size([9116, 512]).
size mismatch for rnn.1.embedding.bias: copying a param with shape torch.Size([6736]) from checkpoint, the shape in current model is torch.Size([9116]).
(crnn-chinese) imc@imc-NO108:~/XR/models/chenxu/crnn_chinese$
请问在网络中是不是字符长度个数是写死了的？
为何怎么提示shape不匹配？要如何修改呢？