dhlab-epfl / dhsegment Goto Github PK

View Code? Open in Web Editor NEW

368.0 29.0 115.0 6.03 MB

Generic framework for historical document processing

Home Page: https://dhlab-epfl.github.com/dhSegment

License: GNU General Public License v3.0

Python 96.90% Jupyter Notebook 3.10%

tensorflow segmentation historical-data python3 document-processing

dhsegment's People

Contributors

Stargazers

Watchers

Forkers

mikekestemont stevenlol fireae shubhampachori12110095 kba bygreencn cwbjyy aileader fendaq jkloe jim-salmons diemmarkus qiaolin1992 ryanfb sepidehalassi impresso gds101054108 nathangeology mingchen62 jessionhe raphaelbarman hollaus tejasytl epinnock hx2009302823 jblz tonyma666 aniketgurav happog pankajmehar navigatingbots ramdhanoriya shahmustafa eglxiang damonsj elavin11 aklibouhassi zenozhouzhao harirajeev subbaraomanchala diegosiqueir4 opaetzel soumyadeepdey wrznr klinkai wzhsunn melodieboillet frysztak cxf2015 solivr rabi3elbeji jikcou nononowow fturib geetanshu-m starride-irisa hongchow thebenedict madhugraj kapitsa2811 sebastianhorwege duncdrum saurabh1920 project-renard-survey iiihunter crazycrud yanqi1811 we1l1n longwall meerkat-cv vickipedia6 zzmcdc deneshkumar jbaltr hell-to-heaven lxj0276 mrjj dongwang218 ericke8 xchhmanong jimlinntu benetech racoutinho brandnewa cleanview-evan esegaul sshuster feedbackfruits askintution fagan2888 solenetarride tralfamadude olivierbinette bumbutudor aaru-06 dynamicguy gregbugaj datascouting silviupanaite jwijffels

dhsegment's Issues

Convert generated VIA binary masks (black and white) into RGB expected format

First, thanks for your work !

I tried to create masks from VIA project file (doc here). It works but how to convert the black and white generated masks into RGB masks (with classes.txt) ?

I may have missed something but I did not find the code to do it.

Thanks for your help !

What are the differences between these loss functions? How do I choose?

What are the differences between these loss functions? How do I choose? especially REGRESSION

tf.squared_difference loss fuction

demo.py doesn't find experiment

Dear all,

I am just trying out your algorithm. To speed up with my new data, I set both n_epochs and evaluate_every_epoch to 1. The training part also runs through. However, if I then run demo.py to look at the processed images, it throws the following error.

model_dir = os.path.join(model_base_dir, max(possible_dirs)) # Take latest export ValueError: max() arg is an empty sequence

It seems that demo.py doesn't find the trained model, although I find the following files in page_model folder:

checkpoint config.json eval events.out.tfevents.1565610667.newspaper-vm export graph.pbtxt model.ckpt-800.data-00000-of-00001 model.ckpt-800.index model.ckpt-800.meta

What should I do?

Can't run train.py

Hello,

I am trying to train my own model with dhSegment but it seems I can't get train.py to run.
I also tried following the demo instructions to see if I was still getting the same error and I do:

python train.py with demo/demo_config.json

Traceback (most recent call last):
  File "train.py", line 8, in <module>
    from dh_segment.io import input
  File "/home/timeus/demo_dhseg/dhSegment/dh_segment/io/__init__.py", line 140, in <module>
    from . import via
  File "/home/timeus/demo_dhseg/dhSegment/dh_segment/io/via.py", line 12, in <module>
    from skimage import transform
  File "/home/timeus/miniconda3/envs/dh_segment/lib/python3.6/site-packages/skimage/__init__.py", line 158, in <module>
    from .util.dtype import *
  File "/home/timeus/miniconda3/envs/dh_segment/lib/python3.6/site-packages/skimage/util/__init__.py", line 7, in <module>
    from .arraycrop import crop
  File "/home/timeus/miniconda3/envs/dh_segment/lib/python3.6/site-packages/skimage/util/arraycrop.py", line 8, in <module>
    from numpy.lib.arraypad import _validate_lengths
ImportError: cannot import name '_validate_lengths'

Is it a problem within the script or am I missing a requirement?
For information, I am only using a CPU at the moment.

Many thanks,

how to get the baseline or text boxes for your model?

Hello,I want to use your code to detect text boxes from scaned PDF.
But I don't know your trained model is for page extraction or both for page extraction and baseline detection?
Thank you very much!

Model is not exported after training

Hi :)

I trained a model from scratch for ~ 10 epochs. Unfortunately, the model is not exported (the export folder is empty).

Is there any workaround to create an exported model from the checkpoints?

Thanks in advance :)

Define the license

We need to decide which software license we use exactly for this.

A more efficient neural architecture

@solivr @SeguinBe Thank you for your hard work,

Can you merge Mobilenet v2 with master, along with adding a demo for using it.
Thank you

Waiting for your reply

It is not working with TF 2.0,

How can I solve Tensorflow.contrib?

error in predict_with_tiles

ValueError                                Traceback (most recent call last)
<ipython-input-71-67f6e250ab7e> in <module>()
----> 1 predictions = model.predict_with_tiles(img[None], linear_interpolation=True)

/DocumentSegmentation/doc_seg/loader.py in predict_with_tiles(self, image_tensor, tile_size, min_overlap, linear_interpolation)
     70                     assigned_up_to_x = 0
     71                     for x, output in zip(x_pos, y_outputs):
---> 72                         _merge_x(tmp, assigned_up_to_x, output[k], x)
     73                         assigned_up_to_x = x+tile_size
     74                     _merge_y(result[k], assigned_up_to_y, tmp, y)

/DocumentSegmentation/doc_seg/loader.py in _merge_x(full_output, assigned_up_to, new_input, begin_position)
     47             if overlap_size > 0:
     48                 weights = np.arange(0, overlap_size)/overlap_size
---> 49                 full_output[:, :, begin_position:assigned_up_to] = (1-weights)[:, None]*full_output[:, :, begin_position:assigned_up_to] + \
     50                                                                    weights[:, None]*new_input[:, :, :overlap_size]
     51 

ValueError: operands could not be broadcast together with shapes (128,1) (1,500,128)

Size of the input image : [1, 4778, 3474, 3]. No error if linear_interpolation=False

Error on training

Hello,

I'm having difficulty with training. I was able to train with the demo data, but when I submit my own data I get the following error:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/caleb/Work/.../env_dhSegment/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/caleb/Work/.../env_dhSegment/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/caleb/Work/.../env_dhSegment/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must be broadcastable: logits_size=[451801,7] labels_size=[451248,7]
	 [[{{node loss/per_pixel_loss}} = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](loss/per_pixel_loss/Reshape, loss/per_pixel_loss/Reshape_1)]]
	 [[{{node Loss_1/map/while/Switch_1/_4839}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5009_Loss_1/map/while/Switch_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopLoss_1/map/while/TensorArrayReadV3_1/_4715)]]

There are 7 classes in my data, and only 2 in the demo data. Apart from that, I can see no obvious distinctions between the two data sets.

I am coming up to speed with dhSegment, and some of the supporting libraries, and will appreciate any and all assistance greatly.

Taking too much time in training

Model taking 2 hrs for one epoch having 2300 images and batch size is 1, but you guys have mentioned it took only 4 hrs to train page detection model which contains 1600 images for 30 epochs.

can someone tell me the reason?

Original Training image with XML labels to extract data from documents

Hi,

I'm working in a page layout analysis and information extractor and I found that dhSegment might work ok in this task. However, I don't know exactly if dhSegment can work with XML-based anotations (TextRegion, SeparatorRegion, TableRegion, ImageRegion, points defining bounds of each region...) for training besides the RGB styled section definitions. I see in the main page of the project that there is a Layout Analysis example under Use Cases section. That is the case that most resembles to the one I want to implement. Also, I want to extract text from the detected regions.

How can I do that? Can I still use dhSegment or I have to implement my own detector?

Thanks.

Regards.

Mulilabel limitation should be documented

Only 7 labels are supported and this is not documented. Since effort can be expended to prepare training data, finding out this limitation when running train.py is wasteful.

OOTB syntax error with demo.py

Following the installation instructions in the docs the final step: python demo.py

results in

File "demo.py", line 20
    def page_make_binary_mask(probs: np.ndarray, threshold: float=-1) -> np.ndarray:
                                   ^
SyntaxError: invalid syntax

Similarly python3 demo.py

Traceback (most recent call last):
  File "demo.py", line 6, in <module>
    import cv2
ModuleNotFoundError: No module named 'cv2'

Getting error "tensorflow.python.framework.errors_impl.InternalError: Failed to create session"

When I run the code I get an error at line

    estimator.train(input.input_fn(train_input,
                                   input_label_dir=train_labels_input,
                                   num_epochs=training_params.evaluate_every_epoch,
                                   batch_size=training_params.batch_size,
                                   data_augmentation=training_params.data_augmentation,
                                   make_patches=training_params.make_patches,
                                   image_summaries=True,
                                   params=_config,
                                   num_threads=32))

The error is:
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/kapitsa/PycharmProjects/objectLocalization/DocumentTableSeg/IeeeTransc/dhSegment/train.py", line 115, in run
num_threads=32))
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 366, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1119, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1135, in _train_model_default
saving_listeners)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1333, in _train_with_estimator_spec
log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 415, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 826, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 549, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1012, in init
_WrappedSession.init(self, self._create_session())
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1017, in _create_session
return self._sess_creator.create_session()
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 706, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 477, in create_session
init_fn=self._scaffold.init_fn)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 281, in prepare_session
config=config)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 184, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1563, in init
super(Session, self).init(target, graph, config=config)
File "/home/kapitsa/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 633, in init
self._session = tf_session.TF_NewSession(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

Help Wanted - Training Datasets Questions

I am developing #MAGAZINEgts, a ground-truth storage format based on an ontological "stack" of #cidocCRM/FRBRoo/PRESSoo. This format uses a metamodel subgraph design pattern of fine-grained PRESSoo Issuing Rules to prescribe complex document structures. For example, the Advertising Model describes the size, shape, position, and other features such as the page grid of the containing page, the number of colors in the ad, and whether the as has margin 'bleed', etc. The reference implementation of #MAGAZINEgts is being evolved on the 48-issue collection of Softalk magazine currently available at the Internet Archive.

Using our prototype metadata discovery and curation tools -- the PrintPageNumber-to-ImageID mapper and the "Ad Ferret" -- we have curated a detailed model of all 7,000+ ads in Softalk magazine. We are integrating these two tools into an expanded application called the FactMiners Toolkit. The impetus for this new version of our #MAGAZINEgts-compatible tools is our interest in incorporating the generation of #dhSegment training datasets. An interesting feature of this training dataset page and labeled-mask image generation workflow will be our ability to use the metamodel subgraph of complex document structures to generate synthetic training dataset page/mask images for under-sampled cases/labels.

While we will undoubtedly have additional questions about guidelines for generating #dhSegment training datasets, I'd like your insights on these three basic questions that I need to understand to continue development of our toolkit:

Should a training dataset for a #dhSegment model to be trained to recognize magazine ads include the 'no ad' case/label? (i.e. a label image that is all background color w/ no class/label color-coded mask bounding box for a page that does not have an ad... the 'anti'-case IOW)
If an ad is full-page w/ margin bleed, would its training image mask be all case/label color with no background color visible?
Magazine ads vary by size and shape constrained by page grid columns and allowable positions. Can a training dataset of multiple classes/color-assignments, be composed of individual mask images where only one of the many "watched for" cases/labels is found per training image/mask instance? For example, can 'red' be the color for a 1/4-page vertical ad and 'blue' be the color for 1/2-page horizontal ads and the model to be trained will learn to distinguish size- and shape-based granularity of the document structure model and not just, "Yes, there is an ad of some kind on this page" which would be the case if all size/shape ads were masked by the same color/class?

In closing... a more generic 'help wanted' ask here would be for any pointers to papers, datasets, or other on-line resources to better understand the assessment and handling of training dataset balancing, particularly as you have faced this issue in your own #dhSegment experiments.

This current activity is the subject of my proposed #DATeCH2019 submission, "#MAGAZINEgts and #dhSegment: Using a Metamodel Subgraph to Generate Synthetic Data of Under-Sampled Complex Document Structures for Machine-Learning." However, it does not appear that the January 20th deadline for full-paper submissions will be extended. So my research is full-speed-ahead although I am increasingly aware that this paper will have to find another venue for sharing and possible publication.

Thank you for your timely reply and keep up the GREAT work. #dhSegment is not only a great contribution to the historic document text- and data-mining domain, but it will undoubtedly be a significant technology resource for the Time Machine FET Flagship project.

Happy-Healthy Vibes from Colorado USA,
-- Jim --

P.S. Here is a screenshot of the Ad Ferret in use and a pivot-table of the counts of various ad size/shape in Softalk magazine.

Performance issue in the definition of model_fn, dh_segment/estimator_fn.py(P1)

Hello, I found a performance issue in the definition of model_fn, dh_segment/estimator_fn.py, tf.cast(tf.shape(network_output)[1:3] will be calculated repeatedly during program execution, resulting in reduced efficiency. I think it should be created before the loop.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Demo for Document layout analysis

Could you please help with including "document layout analysis" in demo ?.
Thanks for your help.

Multilabel trainning problem

Hi,
I have used this code to train a Document Layout Analysis model. I set:
prediction_type = utils.PredictionType.MULTILABEL
And my classes.txt (9 classes) file:

0 0 0 1 0 0 0 0 0 0 0 0
25 255 255 0 1 0 0 0 0 0 0 0
142 130 255 0 0 1 0 0 0 0 0 0
191 130 74 0 0 0 1 0 0 0 0 0
191 14 74 0 0 0 0 1 0 0 0 0
191 181 74 0 0 0 0 0 1 0 0 0
36 13 249 0 0 0 0 0 0 1 0 0
110 49 7 0 0 0 0 0 0 0 1 0
250 246 7 0 0 0 0 0 0 0 0 1

But I've got an error:

Caused by op 'Label2Img/GatherNd', defined at:
File "train.py", line 47, in
@ex.automain
File "/home/it/.local/lib/python3.6/site-packages/sacred/experiment.py", line 137, in automain
self.run_commandline()
File "/home/it/.local/lib/python3.6/site-packages/sacred/experiment.py", line 260, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "/home/it/.local/lib/python3.6/site-packages/sacred/experiment.py", line 209, in run
run()
File "/home/it/.local/lib/python3.6/site-packages/sacred/run.py", line 221, in call
self.result = self.main_function(*args)
File "/home/it/.local/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "train.py", line 111, in run
num_threads=32))
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1208, in _train_model_default
input_fn, model_fn_lib.ModeKeys.TRAIN))
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1049, in _get_features_and_labels_from_input_fn
self._call_input_fn(input_fn, mode))
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1136, in _call_input_fn
return input_fn(**kwargs)
File "/home/it/Projects/DLA/dhSegment/dh_segment/io/input.py", line 224, in fn
label_export = utils.multiclass_to_label_image(label_export, classes_file)
File "/home/it/Projects/DLA/dhSegment/dh_segment/utils/labels.py", line 67, in multiclass_to_label_image
return tf.gather_nd(c, tf.cast(class_label_tensor, tf.int32))
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3140, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/it/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Only indices.shape[-1] values between 1 and 7 are currently supported. Requested rank: 9
[[{{node Label2Img/GatherNd}} = GatherNd[Tindices=DT_INT32, Tparams=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Label2Img/GatherNd/params, Cast_1)]]

I guess that we can not train a multilabel classification model with more than 7 classes. Can anyone help me to fix this problems?
Thanks.

Baselines to Textlines

@solivr @SeguinBe @raphaelBarman

Once I have detected the baseline masks, now how can I convert that into textline boxes/polygons

pages_sample.zip link not working

Really like the idea of using Jupyter Notebooks introduced in 9889c7d. Nevertheless the link for pages-sample.zip (https://github.com/dhlab-epfl/dhSegment/releases/download/untagged-b55f9aa4fff5efd4b1b8/pages_sample.zip) isn't working.

Suggest to loosen the dependency on sacred

Hi, your project dhSegment(commit id: cca94e9) requires "sacred==0.7.4" in its dependency. After analyzing the source code, we found that the following versions of sacred can also be suitable, i.e., sacred 0.7.3, since all functions that you directly (1 APIs: sacred.experiment.Experiment.init) or indirectly (propagate to 19 sacred's internal APIs and 17 outsider APIs) used from the package have not been changed in these versions, thus not affecting your usage.

Therefore, we believe that it is quite safe to loose your dependency on sacred from "sacred==0.7.4" to "sacred>=0.7.3,<=0.7.4". This will improve the applicability of dhSegment and reduce the possibility of any further dependency conflict with other projects.

May I pull a request to further loosen the dependency on sacred?

By the way, could you please tell us whether such an automatic tool for dependency analysis may be potentially helpful for maintaining dependencies easier during your development?

how to use baseline post process

About find baseline.i get the line mask,but I don't know how to extract line,I try to change the demo.py.Could you show me about baseline demo?thank you.

Article Segmentation

Hi everyone,

I am thinking about adapting dhsegment to detect articles on historical newspaper title pages. Do you think dhsegment is a sensible application for this? Is there anything I should particularly think about when annotating the title pages?

Sincerely

Julian

Speed of Inference on GeForce GTX 1080

My testing based on a variation of demo.py for classification of 7 labels/classes is showing choppy performance on a GPU. Excluding python post-processing and ignoring the first two inferences, I see processing durations like 0.09, 0.089, 0.56, 0.56, 0.079, 0.39, 0.09 ... ; average over 19 images is 0.19sec per image.

I'm surprised by the variance.

At 5/sec it is workable, but could be better. Would tensorflow-serving help by getting python out of the loop? I need to process 1M images per day.

(The GPU is GeForce GTX 1080 and is using 10.8GB of 11GB RAM, only one TF session is used for multiple inferences.)

PredictionType.CLASSIFICATION and extracting rectangles

I am attempting CLASSIFICATION now, not MULTILABEL (issue #29 was helpful in mentioning that mutually-exclusive areas mean classification, not multilabel. This is clear in retrospect ;^)

Now I need to extract rectangles and I have hit a big gap in dhSegment. The demo.py code shows how to generate the rectangle corresponding to a skewed page, but there is only one class. I modified demo.py to identify rectangles for each label. When there are multiple classes, there can be spurious, overlapping rectangles.

How can I:

Identify the highest confidence class instances
That are not overlapping

The end result I want is one or more jpegs associated with a particular class label plus the coordinates within the input image.

Perhaps the labels plane in the prediction result offers some help here? demo.py does not use the labels plane.

Which CUDA, cudnn and tensorflow versions are meant to be used together

I am doing some small project for multilabel classification, but during training I'm keep getting errors about

Tests were made with TensorflowGPU 1.13.1, CUDA 10.1 and cudnn 7.6.5.32 under Windows 10

Is this related to my bad configuration of project, incorrect installation of CUDA and cudnn or versions missmatch between those three?

Model loading/training error

When executing the following command: python train.py with demo/demo_config.json
I get this error. FYI I've followed the installation instructions with conda.

InternalError (see above for traceback): cuDNN launch failure : input shape([1,3,1095,538]) filter shape([7,7,3,64]) [[{{node resnet_v1_50/conv1/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/resnet_v1_50/conv1/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_50/conv1/weights/read)]]

No box found : Is dhSegment fit for my problem ?

Hello,

I would like to detect defects introduced on documents by faulty scanners, in particular vertical lines.

So I have a dataset consisting of the same document, with one random line added to each document.

As I presumed the model would work better with detecting horizontal lines, I flipped all the images in my dataset.

All my images are annotated in the same way as the demo dataset.
Here you can see two images and their label from my dataset :

I quickly trained the model on 300 training samples, and the loss does seem do decrease (although I haven't fully grasped all the logs, and the way sacred works).

However when I apply the model on the test set, I get "No box found in demo/pages/test_a1/images/img-xxx.png" for all of the images...

So I wonder if dhSegment can work for this task of finding thin lines.
I used the basic demo config, and did not change much (except n_epoch that I set to 1, but even then It seemed to get through 10 iterations of the dataset).

I am on Ubuntu 18.04 and train with a GP100.

If dhSegment isn't fit for this task, could you suggest me some ways I could achieve this detection ? I am a bit stuck at the moment.

Thank you a lot.

ValueError: too many values to unpack (expected 2)

In Demo folder when i run interactive_demo.ipynb

When i run the code :
pred_page_coords = boxes_detection.find_boxes(bin_upscaled.astype(np.uint8, copy=False),
mode='min_rectangle', n_max_boxes=1)

i get the error:

ValueError Traceback (most recent call last)
in ()
1 pred_page_coords = boxes_detection.find_boxes(bin_upscaled.astype(np.uint8, copy=False), mode='min_rectangle', n_max_boxes=1)
----> 2
/content/dh_segment/post_processing/boxes_detection.py in find_boxes(boxes_mask, mode,min_area, p_arc_length, n_max_boxes)
26 'Input mask must be a 2D array ! Mask is now of shape {}'.format(boxes_mask.shape)
27
---> 28 contours, _ = cv2.findContours(boxes_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
29 if contours is None:
30 print('No contour found')

ValueError: too many values to unpack (expected 2)

Model Optimization

Can't be installed under Windows

dhSegment is AWESOME and EXACTLY what my wife and I need for our post-cancer #PayItForward Bonus Round activity doing grassroots #CitizenScience #digitalhumanities research in support of eResearch and machine-learning in the domain of digitization of serial publications, primarily modern commercial magazines. We are working on the development of the #MAGAZINEgts ground-truth storage format providing standards-based (#cidocCRM/FRBRoo/PRESSoo) integrated complex document structure and content depiction models.

When a tweet about dhSegment surfaced through my feed, I could barely contain myself... we have detailed, multi-valued metadata -- based on a metamodel of fine-grained use of PRESSoo's Issuing Rules -- that describe the location, bounding box, size, shape, number of colors, products featured, etc. for 7,157 advertisements appearing in the 48 issues of Softalk magazine (https://archive.org/details/softalkapple). It will be trivial for me to generate the annotated label images for all these ads as we have already programmatically extracted the ad sub-images from the full pages once we used our "Ad Ferret" to discovery and curate the specification for every ad.

Once we have a dhSegment instance trained on the Softalk ads, there are over 1.5M pages just within the "collection of collections" of computer magazines at the Internet Archive, and many millions more pages of content in magazines of all types over considerable time periods of their serial publication. The #MAGAZINEgts format, together with brilliant technical achievements like dhSegment, can open new levels of scholarship and machine access to digital collections. We believe dhSegment will be a valuable component for our research platform/framework.

With great excitement I chased down and have installed and tested the prerequisite CUDA and cuDNN frameworks/platforms under Windows. I have these features now working at the 9.1 version. (This alone was tricky, but I got it working.)

Unfortunately, the current implementation of the incredibly important dhSegment environment cannot be installed under Windows 10. After the stock Anaconda environment yml file died somewhat dramatically, I then took that file and attempted to search for and install each package individually. (NOTE: I am not a Python expert, so what I report here is subject to refinement by someone who knows better...) Here is what is NOT available under Windows:

# Python packages for dh_segment not available under Windows
- dbus=1.12.2
- fontconfig
- glib=2.53.6
- gmp=6.1.2
- graphite2=1.3.10
- gst-plugins-base
- gstreamer=1.12.4
- harfbuzz=1.7.4
- jasper=1.900.1
- libedit=3.1
- libffi=3.2.1
- libgcc-ng=7.2.0
- libgfortran-ng=7.2.0
- libopus=1.2.1
- libstdcxx-ng=7.2.0
- libvpx=1.6.1
- ncurses=6.0
- ptyprocess=0.5.2
- readline=7.0
- pip:
  - tensorflow-gpu==1.4.1 (I did find and installed 1.8.0 instead)

Anything not on this list made it into my Windows-based Anaconda environment, the yml for which I have included here as a file attachment.

win10_dh_segment.yml.txt

I am so disappointed to not be able to install and use dhSegment under Windows. While a docker image would likely be possible to create, I am skeptical that it would work at the level needed for interfacing with the NVIDIA hardware and its CUDA/cuDNN frameworks, etc. Alternatively, perhaps a cloud-based dev platform would work for us (that is affordable as we are independent and unfunded #CitizenScientists). Your workaround/alternative suggestions are welcome.

At any rate, sorry for the overly long initial issue posting. But I wanted to explain my and my wife's great interest in this important technology as well as provide what I hope is useful feedback with regard to its potential use under Windows. Looking forward, I am very interested in evolving a collaborative relationship with you good folks of DHLAB.

ITMT, I am going to generate the labeled training images. :-)

Happy-Healthy Vibes,
FactMiner Jim

P.S. Here is our #DATeCH2017 poster that will further explain the focus of our research.

P.P.S. And here is a screenshot showing a typical metadata "spec" for an ad. The simple integer value for the AdLocation is used in concert with an embedded DSL in the fine-grained Issuing Rules of the Advertising Model. This DSL provides a resolution-independent means to describe and compute the upper-left and bounding box of an ad. For example, the four locations of a 1/4 pg sized ad on a page with a 2-column page grid are numbered 1-4, left-to-right top-to-bottom. The proportions of these page segments based on simple geometric proportional computations.

And finally, the evolving #MAGAZINEgts for the Softalk magazine collection at the Internet Archive is available here: https://archive.org/download/softalkapple/softalkapple_publication.xml

HOW TO USE IT ON TF SERVING BATCH PREDICTION

I have retrained the model using my own dataset, but when I try to get prediction using TF serving using gRPC API call I am not able to pass the images in a batch, it gives out dimensions error but when I pass single image I am able to get predictions. can some help with me on using this model on batch prediction when served.

Does it work without GPU?

Hello,
I have simple videocard on my working Win7 machine. Installed all needed reqs and stared demo.py
I got some error stack and wonder - does it refer to missing GPU or any incorrect installation procedure?

`(dh_segment) C:\p\Documents\ai\dhSegment>python demo.py
Traceback (most recent call last):
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper

_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\p\anaconda3\envs\dh_segment\lib\imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "C:\p\anaconda3\envs\dh_segment\lib\imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "demo.py", line 8, in
import tensorflow as tf
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow_init_.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python_init_.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\p\anaconda3\envs\dh_segment\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\p\anaconda3\envs\dh_segment\lib\imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "C:\p\anaconda3\envs\dh_segment\lib\imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.`

Could you please advice? On my laptop with GPU I managed to start demo

use dhSegment to extract table in the page

can i train dhSegment to extract the table in page ?
I just mark tables as foreground and others as background,will it work ?

Text-line Detection

@SeguinBe @solivr Thank you for your hard work.

Regarding training a text-line detector from scratch,

Since I'm only interested in text-lines; Can I only color the text-lines as boxes in Red and make the classes.txt file as:

0 0 0
255 0 0

What is the suitable config.json to be used for training a text-line detection from scratch, since demo_config.json requires a pretrained_model named resnet50, which is not for line detection.

python train.py with demo/demo_config.json

@solivr you suggested in issue 9 to use the cBAD dataset for baseline detection, I have downloaded this dataset, and noticed that the annotations are in page xml, so how can I use this dataset with it's page xml in training?

You have mentioned in your github main-page that you have used the annotator1 from pagenet, how did you use it, if it's in x1, y1, x2, y2, x3, y3, x4, y4 format?
You also have mentioned that the training images have been downsized to have 1M pixels each, does downsizing the training images reduce the recognition quality?

new user looking for examples of multilabel classification

I have a body of images which I want to segment into seven classes using the utils.PredictionType.MULTILABEL modality for predicting class memberships. I believe that I can do this based on the Use Cases presented in the docs. The only example I have found of dhSegment in use is the demo in the docs, however, which is using the simpler utils.PredictionType.CLASSIFICATON modality.

I would love to look at any and all examples of dhSegment being used in any modality, and in particular in the MULTILABEL modality.

Any and all pointers and assistance will be greatly appreciated.

From googling around, it feels like the user base is somewhat rarified. I hope to add my own application to those testing and applying dhSegment.

Thank you!

ICDAR 2019 Competition on Baseline Detection (cBAD)?

@solivr @SeguinBe Will you participate in the ICDAR 2019 cBAD competition?
Do you have any plans to optimize dhSegment?

Can this network be used to extract table from a document?

Can this network be used to extract table from a document? If it could be, how to do it? Painting the table lines in red and the others in black?

detecting multiple instances of same object

Like the way this page shows multiple ornament extraction on same page, My model never detects more than one instance of a similar object.

I am using the same demo.py as in master branch.

Can someone help me ?

Read Tag Values from XML file

Dear Sir,

I would like to get coordinates of text lines of a document from xml file.
I'm using the function:
tl_coords=PAGE.get_unique_tags_from_xml_text_regions(xml_filename= str, tag_pattern= 'type:Page/TextRegion/TextLine/Coords points')

But, the output is null. I'm asking if you can specify how the tag pattern should be in input.
Thank you in advance.

How to use multilabel prediction type?

when i change prediction_type from CLASSIFICATION' to 'MULTILABEL

result.shape[1] > 3, "The number of columns should be greater in multi-label framework"

so how to use multi-label?

Thanks!

Not Able to start training using demo dataset

I am using tensorflow==1.12.0 is this an issue cause seems estemator not able to load paramater from _config

demo.py: Indexes pred_page_coords even if None

In the following code (lines 83 to 102 in demo.py), the pred_page_coords variable is indexed as a list even if it has been returned with None. (Immediately after the #Create page region and XML file line

            pred_page_coords = boxes_detection.find_boxes(bin_upscaled.astype(np.uint8, copy=False),
                                                          mode='min_rectangle', n_max_boxes=1)
            # Draw page box on original image and export it. Add also box coordinates to the txt file
            original_img = imread(filename, pilmode='RGB')
            if pred_page_coords is not None:
                cv2.polylines(original_img, [pred_page_coords[:, None, :]], True, (0, 0, 255), thickness=5)
                # Write corners points into a .txt file
                txt_coordinates += '{},{}\n'.format(filename, format_quad_to_string(pred_page_coords))
            else:
                print('No box found in {}'.format(filename))
            basename = os.path.basename(filename).split('.')[0]
            imsave(os.path.join(output_dir, '{}_boxes.jpg'.format(basename)), original_img)

            # Create page region and XML file
            page_border = PAGE.Border(coords=PAGE.Point.cv2_to_point_list(pred_page_coords[:, None, :]))
            page_xml = PAGE.Page(image_filename=filename, image_width=original_shape[1], image_height=original_shape[0],
                                 page_border=page_border)
            xml_filename = os.path.join(output_pagexml_dir, '{}.xml'.format(basename))
            page_xml.write_to_file(xml_filename, creator_name='PageExtractor')

Layout Analysis Use Case: DIVA-HisDB

Hi,

Are there any plans to issue the training and demo for layout analysis on DIVA-HisDB?

Tensorflow 2.4 (request for permission to upgrade this repo to this)

Hi!

I have locally upgraded this repo to Tensorflow 2.4.1. I thought it might be helpful if I shared this code with you. If you would like I can create a pull request with this update for the repo, I would just need permissions to do so. Let me know!

Toby

how to get the result of Document layout analysis?

how to get the result of Document layout analysis? ths

Reproducing baseline detection results

Hello,

I'm trying to reproduce the baseline detection results in your paper. What was the training/validation split used? Also, is it the case that demo/demo_cbad_config.json is the same configuration used to achieve your results? Thank you!

Need a short guide of layout detection and line detection

Hello,
I have a large collection of scans of written text in table forms with complex layout structure and printed only vertical borders.
My plan the a segmentation table rows cell by cell ,line detection inside each cell and then a trial of recognition.
I passed through dhSegment demo,it'sok but met problems with operations.
Could you please provide any examples of use cases described in the overview https://dhlab-epfl.github.io/dhSegment/ ?
I'm ready to label training dataset from my collection but cannot get a start. Any notebook or video guide?
One more question is about READ-BAD dataset that was suggested in a couple of issues discussions. I see the article PDF in arxive.org but didn't find a link to download the image collection. What did I miss?

dhlab-epfl / dhsegment Goto Github PK

dhsegment's People

Contributors

Stargazers

Watchers

Forkers

dhsegment's Issues

i get the error:

Recommend Projects

Recommend Topics

Recommend Org