I have few questions noted below. Have you tried training with

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pretraining on SynthText will help. <a class="user-mention notranslate" data-hovercard

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I have few questions. about pixel_link HOT 38 CLOSED

zjulearning commented on September 6, 2024

I have few questions.

from pixel_link.

Comments (38)

GodOfSmallThings commented on September 6, 2024 1

@lizzyYL I also tried at batch 8 for first time, and got Nan. When batch 24, Nan didn't occurred and performance went up to 82.4%.

from pixel_link.

BowieHsu commented on September 6, 2024

@GodOfSmallThings
1.On VGG backbone，I tried train on SynthText about 6W iter using 0.01 learning rate then finetune on IC15 using 0.001 learning rate about 3W iter, the F-measure score is about 77%. But when I train on IC15 about 10W iter, F-measure score is nearly 82%
2.I tried replace backbone network by resnet50，with same SGD policy after 10W iter training，F-measure is about 75% .
3.Pretrain model can not make the final score higher on this network architecture, under my experiment.
really interesting.

from pixel_link.

GodOfSmallThings commented on September 6, 2024

@BowieHsu
Thank you for the answer on first question!
I am also curious about the way you use ground truth, which is my second question. As I mentioned above, this model changes 8-coords bounding box information to aligned bounding box. I guessed it would contain certain amount of background information. Would it perform better if we use 8-coord tight bounding box rather than 4-coord aligned box?

from pixel_link.

BowieHsu commented on September 6, 2024

@GodOfSmallThings It may help, you can try to do some experiment.

from pixel_link.

dengdan commented on September 6, 2024

Pretraining on SynthText will help. @BowieHsu: when SynthText is added for pretraining, an fmean of 85% can be achieved. http://rrc.cvc.uab.es/?ch=4&com=evaluation&task=1&e=1&f=1&d=0&p=0&s=1

However, quite a lot more iterations are needed.

from pixel_link.

GodOfSmallThings commented on September 6, 2024

Thank you very much !

from pixel_link.

lizzyYL commented on September 6, 2024

@BowieHsu @dengdan
I trianed on ICDAR15 about 10W iter, set learning rate 0.001, and batchsize is 12.
But F-measure score is 78.9%(R=77.7% P=80.2%). Obviously lower than your grades.
Are there any parameters need to be adjusted?

Thank you!

from pixel_link.

BowieHsu commented on September 6, 2024

dude，1e-2 learning rate will improve F-score

from pixel_link.

lizzyYL commented on September 6, 2024

1e-2 learning rate caused loss = Nan, so I change it to 1e-3....

from pixel_link.

BowieHsu commented on September 6, 2024

@lizzyYL so you need train in 1e-3 learning rate for 100 iter and then just train it in 1e-2.

from pixel_link.

lizzyYL commented on September 6, 2024

@BowieHsu Yes, that's what I did the first time. Running train.sh, but loss=nan at 3w iters.
Becasue of bach size? I set it = 8 the first time.

from pixel_link.

lizzyYL commented on September 6, 2024

@GodOfSmallThings Understand! Thank you for your reply~

from pixel_link.

tsing-cv commented on September 6, 2024

@lizzyYL @dengdan @BowieHsu @GodOfSmallThings @comzyh
I just want to test on my own images and ICDAR2015,using the two pretrained model.
But an amazing is occured, all the predicted images only changed their color without any bboxes. All the predicted txt files have no coordinates.
Could you solve the problem for me?

from pixel_link.

BowieHsu commented on September 6, 2024

@tsing-cv so,have you print the pixellink results in test_any_image.py?

from pixel_link.

GodOfSmallThings commented on September 6, 2024

@tsing-cv
I've encountered same problem as you. It seems like the network has reached non-opitimal local minima. (It may not be diverged, because loss don't goes to infinity or nan) You can try few checkpoints from the last to earlier one, and there may be the checkpoint that is not trapped on non-optimal point. Then you can erase checkpoints after it, modify checkpoint(txt), and resume training/test.

from pixel_link.

GodOfSmallThings commented on September 6, 2024

@dengdan
Can you share details on how you achieved 85% hmean? (learning rate, number of iterations, number of batch, etc.)
I've used SynthText DB for pretrained model, but it achieved only ~82.x%.
Thanks.

from pixel_link.

BowieHsu commented on September 6, 2024

@GodOfSmallThings I meet the same problem, After training on SynthText for about 10W iter using 1e-2 learning rate and then finetune on ICDAR2015, the F-score stay 0 forever.

from pixel_link.

BowieHsu commented on September 6, 2024

@dengdan @GodOfSmallThings another thing, I tried add BN ops after every conv layer which leads F-score drop about 5~6%.

from pixel_link.

GodOfSmallThings commented on September 6, 2024

@tsing-cv
I think sharing the weight won't help you much, so I'll just tell the method.

Just train 3~40000 iters.
See the result and resume training.

You'll probably see at some point training is working. Find the point where training fails, checking the result every 10000 times.

@BowieHsu
I think it's the problem of optimizer. I've tried Adam, it performed bad, but it did not fail. Seems like SGD has high probability of trapping on local minima. What I did was, I tried to find the point right before the training fails, and resume training from that checkpoint, and it worked.
And for the batch norm, Let's figure it out why.

from pixel_link.

BowieHsu commented on September 6, 2024

@GodOfSmallThings have you tried to use staircase learning rate instead of fix it at 1e-2, I also tried Adam, perform bad, I think the reason should be upscale 11 convolution, the channel of those 11 convolution are too few which leads network unstable.

from pixel_link.

GodOfSmallThings commented on September 6, 2024

@BowieHsu
I thought similarly. I was guessing Adam performs well when the network is deep enough. Anyway, I don't think staircase will solve the problem. I think problem comes from that this model inherently generates big loss, since the input bbox is aligned rectangle, which contains lot of background. I think modifying data augmentation, which is to make bbox tight to the words, will help.

from pixel_link.

tsing-cv commented on September 6, 2024

Anybody encountered this problem?

Traceback (most recent call last):
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 383, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 303, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got 8.0 of type 'float' instead.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_pixel_link.py", line 294, in <module>
    tf.app.run()
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_pixel_link.py", line 287, in main
    batch_queue = create_dataset_batch_queue(dataset)
  File "train_pixel_link.py", line 153, in create_dataset_batch_queue
    capacity = 500)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 927, in batch
    name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 722, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 519, in _apply_op_helper
    repr(values), type(values).__name__))
TypeError: Expected int32 passed to parameter 'n' of op 'QueueDequeueManyV2', got 8.0 of type 'float' instead.

@GodOfSmallThings @dengdan @lizzyYL

from pixel_link.

GodOfSmallThings commented on September 6, 2024

@tsing-cv
I didn't encounter the problem above. Seems like type mismatch problem.
However, thing is that you are using python3. The code is based on python2. Note that python version can behave in different way. And check tensorflow version, too.

from pixel_link.

tsing-cv commented on September 6, 2024

Why it always give these notes, while training? All Pred BBoxes cannot drawn in the picture? @GodOfSmallThings, @dengdan, @BowieHsu
Bounding box (-19,538,78,592) is completely outside the image and will not be drawn.

from pixel_link.

GodOfSmallThings commented on September 6, 2024

@tsing-cv Sorry, I don't know. I didn't dig to that problem yet.

from pixel_link.

small-wong commented on September 6, 2024

@tsing-cv #6

from pixel_link.

tsing-cv commented on September 6, 2024

@small-wong thank you

from pixel_link.

Jyouhou commented on September 6, 2024

Hi, can anyone obtain the result of 83.7%?

from pixel_link.

small-wong commented on September 6, 2024

@tsing-cv I have trained the model to detect Chinese text, you can have a try~

from pixel_link.

cjt222 commented on September 6, 2024

i am training the model to detect English text in documents, how can i make sure model has converged? i try to finetune the model author provided by my images， what can loss reach? it converges so slow... @GodOfSmallThings @BowieHsu @dengdan @lizzyYL @tsing-cv

from pixel_link.

BowieHsu commented on September 6, 2024

@cjt222 you can download icdar accuracy calculate bash to test whether your model has converged.

from pixel_link.

tsing-cv commented on September 6, 2024

@small-wong Thank you !

from pixel_link.

cjt222 commented on September 6, 2024

how can i train model by vgg pretrain model?i download vgg model from
https://github.com/tensorflow/models/tree/master/research/slim,but it can not load it,some layers can not be found @tsing-cv

from pixel_link.

tsing-cv commented on September 6, 2024

@cjt222 If you want to use pretrained model, you should write a loading model code file.

from pixel_link.

af258963 commented on September 6, 2024

@small-wong can you share the model again? the model to detect Chinese text

from pixel_link.

aravinthmuthu commented on September 6, 2024

@tsing-cv Bounding box (-19,538,78,592) is completely outside the image and will not be drawn.
issue is because of data augmentation. The authors randomly crop boxes in the the scales 0.1 to 1. So naturally a few GT boxes are partially in the image.

from pixel_link.

jisheng047 commented on September 6, 2024

@GodOfSmallThings At the time you reaching the result of 83.7%, Can you share the loss? I still get around 0.4-0.5 and when i used model to predict, it predict empty box.

from pixel_link.

Lanme commented on September 6, 2024

Anybody encountered this problem?

Traceback (most recent call last):
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 383, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 303, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got 8.0 of type 'float' instead.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_pixel_link.py", line 294, in <module>
    tf.app.run()
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_pixel_link.py", line 287, in main
    batch_queue = create_dataset_batch_queue(dataset)
  File "train_pixel_link.py", line 153, in create_dataset_batch_queue
    capacity = 500)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 927, in batch
    name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 722, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 519, in _apply_op_helper
    repr(values), type(values).__name__))
TypeError: Expected int32 passed to parameter 'n' of op 'QueueDequeueManyV2', got 8.0 of type 'float' instead.

config.batch_size_per_gpu = int(config.batch_size_per_gpu)

from pixel_link.

I have few questions. about pixel_link HOT 38 CLOSED

Comments (38)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent