Giter Club home page Giter Club logo

sightseq's Introduction

🔭sightseq

Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.

What's New:

  • July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
  • June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

Features:

sightseq provides reference implementations of various deep learning tasks, including:

Additionally:

  • All features of fairseq
  • Flexible to enable convolution layer, recurrent layer in CRNN
  • Positional Encoding of images

General Requirements and Installation

  • PyTorch (There is a bug in nn.CTCLoss which is solved in nightly version)
  • Python version >= 3.5
  • Fairseq version >= 0.7.1
  • torchvision version >= 0.3.0
  • For training new models, you'll also need an NVIDIA GPU and NCCL

Pre-trained models and examples

License

sightseq is MIT-licensed. The license applies to the pre-trained models as well.

sightseq's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sightseq's Issues

Getting accuracy as 0.00

I am trying to train a model but always getting accuracy as 0.00 :)
My data folder ->
data.zip
Command used : python ./main.py --dataset-root G:\11\crnn.pytorch-master\crnn.pytorch-master\data --arch densenet121 --alphabet G:\11\crnn.pytorch-master\crnn.pytorch-master\data\alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop
Training Log screenshot :
image

Need help here, thanks in advance for same

Error(s) in loading state_dict for CRNN:

I downloaded and tested the pretrained densenet model, but it showed the error messages.

Error(s) in loading state_dict for CRNN:
Unexpected key(s) in state_dict: "features.1.denselayer1.norm1.num_batches_tracked", ...

中文识别率不高问题

请问下,我这边数字识别精度挺好的,中文识别率为何这么低,而且我的字典里就19个特定的中文字,图片像素也调过,训练数据也产生了1000多w的,是否需要调节模型一些参数?还是crnn里尝试blstm?

Input size

Is it possible to train the network with images of different size?

dimensions in forward pass

@zhiqwang, could you please correct dimensions I've added below in comments in the forward() pass of CRNN class, because I cannot figure out what happens after permute line

out = self.features(x) # out: (B, H, W, C)
# features -> pool -> flatten -> decoder -> softmax
out = self.avgpool(out) # out: (B, 1, W, C)
out = out.permute(3, 0, 1, 2).view(out.size(3), out.size(0), -1) # out: (C, B, 1*W)
out = self.classifier(out) # expected in: (B, W, C) != (C, B, 1*W) ?

关于加载预训练模型的问题

想问问,我在加载您的与训练模型的时候,出现了这个问题,

image

我用的模型是densnet121,并且把模型放在了
image
这个里面,
想问问,这个是为什么呢?我模型没有改,gpu是单卡运行。
然后我又改了一下这个地方:
image
结果还是这样,加载不了与训练模型,想问问这是什么原因呢?
我在程序中把内置的ctc改为了warpctc,是不是这个原因呢?
谢谢

Not found recurrent layer in model files

I checked the network roughly, and I found it seems no recurrent layers like Bi-LSTM?
Is this repo another implementation for CRNN? I just see several CNN backbone and fully connected layers, but not found RNN layers.

中文识别率不高是不是因为感受野的原因?

我用的模型是mobilenetv2,在这个网络中,block的重复次数是增加感受野的,我计算了一下你小尺寸的模型的感受野是139,但是图片的尺寸是32×280,一般来讲,感受野在64附近就是比较合适,这个感受野过大会不会是影响中文识别率不高的一个原因?感谢

有关loss变为nan的情况,我看了之前的解答,但还是想问问

我要用mobilenetv2+ctc训练一批自己的数据,数据的size是32258,数据集图片是这样的,
image
都是32
258,一共9k张训练,1k张验证,标签是这样的:
00000000.jpg 144 80 91 9 213 24 16 217 91 682 129 100 5
00000001.jpg 140 481 9 102 2612 31 330 71 65 15 4
00000002.jpg 1688 195 91 49 678 4 24 1166 2700 58 135
每一张图片中的字都是不一定的,是10个左右,比如11,13这样

首先,我在代码中改了:
parser.add_argument('--width', type=int, default=256,)
然后运行的代码为:
python main.py --gpu-id 0 --not-pretrained --optimizer adam
可是在运行了四个epoch后,出现了loss为nan的情况

ctc设置为:
criterion = nn.CTCLoss(zero_infinity=True)
图片的方差和标准差设置了成自己图片的方差和标准差,为:
model_params['mean'] = (0.57680161,0.57680161,0.57680161)
model_params['std'] = (0.1311234,0.1311234,0.1311234)

image
求问,这是什么原因啊,我按您之前讲解的都设置好了,还是出现了nan的情况,因为我得训练mobilenetv2的网络因为课题需要,谢谢您

How is the picture processed in sequence_generate?

The shape of the picture is (Batch, Channel, H, W)
The data shape that the sequence_generate can receive is (batch, seq_len,...)
I did not find a solution in your code, how did you deal with this problem?
Thank you

loss become inf , then Nan

mtwi_2018_train/images/001807_00031.jpg

Train: [1][108/90000] Time 0.348 (0.361) Data 0.003 (0.006) Loss 30.0584 (31.5477)
mtwi_2018_train/images/007947_00013.jpg
Train: [1][109/90000] Time 0.422 (0.361) Data 0.003 (0.006) Loss inf (inf)
mtwi_2018_train/images/002394_00012.jpg
Train: [1][110/90000] Time 0.332 (0.361) Data 0.003 (0.006) Loss nan (nan)


command:
python ./main.py --dataset-root mtwi_2018_train --arch densenet121 --alphabet ./data/alphabet_decode_5990.txt --lr 1e-6 --optimizer rmsprop --gpu-id -1 --workers 1 --not-pretrained --batch-size 1 --keep-ratio --print-freq 1


attach 007947_00013.jpg
image

TypeError: 'DigitsBatchTrain' object is not iterable

Extract the Chinese_dataset.rar to data folder and move all pictures to images, then modify data_test.txt to data_dev.txt.

run main.py, it shows that:

Creating directory if it does not exist:
'./checkpoint/densenet121_rmsprop_lr5.0e-05_wd5.0e-04_bsize64_imsize32'
Using model from scratch (random weights) 'densenet121'
Traceback (most recent call last):
File "/home/luban/repository/crnn.pytorch/main.py", line 352, in
main()
File "/home/luban/repository/crnn.pytorch/main.py", line 197, in main
loss = train(train_loader, model, criterion, optimizer, epoch)
File "/home/luban/repository/crnn.pytorch/main.py", line 231, in train
for i, (images, targets, target_lengths) in enumerate(train_loader):
TypeError: 'DigitsBatchTrain' object is not iterable

Process finished with exit code 1

Help Needed

I am training a CRNN model in pytorch
max_seq_length=99
number_of_alphabets=96
batch_size=16
output=CRNN(image)
what should be the expected shape of output?
Secondly, should we apply softmax in CRNN after fully connected layer?
Any help would be appreciated. Thanks

annotation file format for English data

Please, can you share an example for training English text.
CHARMAP used for data include all variable [A-Z a-z0-9 :,>/-].

target = torch.IntTensor([get_key(char_convert,i) for i in target])
TypeError: an integer is required (got type NoneType)

What should be the format of encoded data in annotation file after conversion?

RuntimeError: CUDA error: an illegal memory access was encountered

python ./main.py --dataset-root datasets --arch densenet121 --alphabet datasets/alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop --gpu-id 6 --not-pretrained

Traceback (most recent call last):
File "./main.py", line 387, in
main()
File "./main.py", line 209, in main
_ = train(train_loader, model, criterion, optimizer, epoch)
File "./main.py", line 274, in train
loss = criterion(log_probs, targets, input_lengths, target_lengths)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1248, in forward
return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1732, in ctc_loss
return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction))
RuntimeError: CUDA error: an illegal memory access was encountered

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.