shahrukhqasim / ties-2.0 Goto Github PK

Code for: S.R. Qasim, H. Mahmood, and F. Shafait, Rethinking Table Recognition using Graph Neural Networks (2019)

License: MIT License

Python 100.00%

machine-learning document-pro table-reco computer icdar

ties-2.0's Introduction

TIES-2.0

TIES was my undergraduate thesis, Table Information Extraction System. I picked the name from there and made it 2.0 from there.

This is a repository containing source code for the arxiv paper 1905.13391 (link). This paper has been accepted into ICDAR 2019. To cite the paper, use:

@article{rethinkingGraphs,
  author    = {Qasim, Shah Rukh and Mahmood, Hassan and Shafait, Faisal},
  title     = {Rethinking Table Recognition using Graph Neural Networks},
  journal   = {Accepted into ICDAR 2019},
  volume    = {abs/1905.13391},
  year      = {2019},
  url       = {https://arxiv.org/abs/1905.13391},
  archivePrefix = {arXiv},
  eprint    = {1905.13391},
}

Note to the visitors

We are still working to improve a few technical details for your convenience. We'll remove this note once we are done. Expect them to be done by June 15, 2019. We are also working to improve dataset format for easier understanding.

Dataset

Partial dataset which was used for test can be found here. We are uploading rest of the dataset. The current format of the dataset is tfrecords.

In the meantime, if you want to generate the dataset, head on to the following repository:

github.com/hassan-mahmood/Structural_Analysis

Development note

The project is divided into language parts, python and cpp, for python and C++ respectively. There is nothing in the cpp folder as of now.

The python dir is supposed to be the path where a script is to be run, or alternatively, it could be added to the $PYTHONPATH environmental variable. It would contain further directories:

bin contain the scripts which are to be run from the terminal. Within bin, there would be multiple folders, short for different classes of executable programs.
1. iterate for running training or inference.
2. analyse for analysing inference output.
3. checks this was for testing various files while development. You can safely ignore it.
iterators provides functionality to iterate through the datasets while you are training or testing.
layers contains basic layers for graph networks
models contains the main model and network segments. Most of the functionality can be found in basic_model.py. Start to trace from there.
ops contains basic modified operations. These contains the advanced graph operations code.
readers is for readers, entities responsible for reading the data from tfrecords. Their format can be changed in this file.
libs contains all other helper and library functions.

Within the context of this repository, iterate refers to any of train, test or anything which is done iteratively. You can say anything that is done iteratively mostly on the GPU. So if there is an iterator somewhere, it probably refers to an entity which handles training, testing etc.

Preparation

Prepare the dataset. For this, you are required to divide the dataset into three different sections, test, train and validation. Test set will be used to run the analysis after training is done. Backpropagation will be run on the train set. Validation set is used to produce plots for tensorboard to monitor performance of the network.
The dataset files have to be in tfrecords format. Make a new file called train_files.txt. It should contain full paths of all the training tfrecords files. For example:
```
/home/shahrukhqasim/dataset/train_1.tfrecord
/home/shahrukhqasim/dataset/train_2.tfrecord
/home/shahrukhqasim/dataset/train_3.tfrecord
```
Similarly, prepare validation_files.txt, test_files.txt. The contents of these three files should not be overlapping.
Make a config file according to the format given in configs/config.ini.example. This file determines all the settings, dataset locations and results generation paths. The example config file contains documentation for your ease. If you are unclear about a setting, send an email to me or generate an issue in this repository.
Each config file will contain multiple configurations. These configurations are recommended to be used for different models. So, for instance, you make different configs for DGCNN, GravNet and Convolutional networks.

Training

To run the training, you need to issue the following command:

$ python bin/iterate/table_adjacency_parsing.py path/to/the/config/file config

While you are running the training, you can monitor using tensorboard. The paths are to be set into the config file as described in the previous step. Use the following command to run the tensorboard:

$ tensorboard --logdir=/media/all/shahrukhqasim/Tables/TrainOut/betaout/summary

You can monitor the performance after that in your browser. The port number will be displayed when you run the above command.

Inference

You first need to run inference which will generate bin files in numpy pickle format.

$ python bin/iterate/table_adjacency_parsing.py path/to/the/config/file config --test True

TODO: Analaysis code and further documentation is coming.

Installation

Python 3.5+ is needed. We recommend using virtualenv but anaconda should also work fine.

The required packages are listed in requirements.txt. They can be installed by:

$ pip install -r requirements.txt

In addition to this, you need to download another repository from here:

github.com/jkiesele/caloGraphNN

Let's say you clone it into /home/shahrukhqasim/caloGraphNN. You need to add this path to the $PYTHONPATH environmental variable.

$ export PYTHONPATH=$PYTHONPATH:/home/shahrukhqasim/caloGraphNN

In addition to this, you should run all the commands from inside of python directory. And python should also be present in $PYTHONPATH environmental variable.

$ export PYTHONPATH=$PYTHONPATH:/home/shahrukhqasim/TIES-2.0/python

You can also add . to the $PYTHONPATH if you know you will always run the commands from inside of python directory.

It is advised you make a sh file with these export commands and a command which activates the virtual environment. I use the following sourcing file (ties.sh):

source ~/Envs/h3/bin/activate
cd /Users/shahrukhqasim/Workspace/TIES-2.0/python
export PYTHONPATH=$PYTHONPATH:/Users/shahrukhqasim/Workspace/caloGraphNN:/Users/shahrukhqasim/Workspace/TIES-2.0

I source it every time I want to run training or inference using:

$ source ties.sh

Coming soon

Training data uploaded
Trained models

ties-2.0's People

Contributors

Stargazers

Watchers

ties-2.0's Issues

DataLossError (see above for traceback): inflate() failed with error -3: incorrect header check

Caused by op 'IteratorGetNext_3', defined at:
File "bin/iterate/table_adjacency_parsing.py", line 31, in
trainer.train()
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/iterators/table_adjacency_parsing_iterator.py", line 48, in train
model.initialize(training=True)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/models/basic_model.py", line 73, in initialize
self.validation_feeds = self.validation_reader.get_feeds()
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/readers/image_words_reader.py", line 68, in get_feeds
vertex_features, vertex_text, image, global_features, adj_cells, adj_rows, adj_cols = iterator.get_next()
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 414, in get_next
output_shapes=self._structure._flat_shapes, name=name)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1685, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/env/table-gn/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): inflate() failed with error -3: incorrect header check
[[node IteratorGetNext_3 (defined at /media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/readers/image_words_reader.py:68) ]]
[[node IteratorGetNext_3 (defined at /media/nagarro/b83d492f-3491-4265-b0d8-f6300e4586e0/home/nagarro/work/TIES-2.0-master/python/readers/image_words_reader.py:68) ]]

What is the "config" argument in `table_adjacency_parsing.py` ?

I followed all the steps you mentioned in the README to train/test the model:

(ties) manish@gpu:~/code/TIES-2.0/python$ python bin/iterate/table_adjacency_parsing.py ../configs/config.ini config
Traceback (most recent call last):
  File "bin/iterate/table_adjacency_parsing.py", line 20, in <module>
    gconfig.init(args.input, args.config)
  File "/home/manish/code/TIES-2.0/python/libs/configuration_manager.py", line 17, in init
    config_manager_instance = ConfigurationManager(config_file_path, config_name)
  File "/home/manish/code/TIES-2.0/python/libs/configuration_manager.py", line 10, in __init__
    self.config = config_file[config_name]
  File "/usr/lib/python3.5/configparser.py", line 956, in __getitem__
    raise KeyError(key)
KeyError: 'config'

I looked at the help output:

usage: table_adjacency_parsing.py [-h] [--test TEST] [--profile PROFILE] [--visualize VISUALIZE] input config
table_adjacency_parsing.py: error: the following arguments are required: input, config

I couldn't figure out what are the input and config arguments for the script. Any help would be highly appreciated!

Useful system info:

Python version: 3.5
OS: Ubuntu 16.04

"is_sampling_balanced" is required but missing in the example config

(ties) manish@gpu:~/code/TIES-2.0/python$ python bin/iterate/table_adjacency_parsing.py ../configs/config.ini basic_conv_graph
Traceback (most recent call last):
  File "bin/iterate/table_adjacency_parsing.py", line 31, in <module>
    trainer.train()
  File "/home/manish/code/TIES-2.0/python/iterators/table_adjacency_parsing_iterator.py", line 48, in train
    model.initialize(training=True)
  File "/home/manish/code/TIES-2.0/python/models/basic_model.py", line 51, in initialize
    self.is_sampling_balanced = gconfig.get_config_param("is_sampling_balanced", "bool")
  File "/home/manish/code/TIES-2.0/python/libs/configuration_manager.py", line 34, in get_config_param
    if config_manager_instance.config[key] == "true":
  File "/usr/lib/python3.5/configparser.py", line 1230, in __getitem__
    raise KeyError(key)
KeyError: 'is_sampling_balanced'

Updates on Coming Soon

Are there any updates on the coming soon section? I.e. the trained models and full training set? I'm trying to recreate the results from your paper but i'm guessing that's not possible without your full training set. I can generate a dataset from your other repo but i'm assuming this can't be used to reproduce the results from the paper? Thanks

How can I inference single image?

Hi, I cant find the way for inference single image, could you update your code?

Code Licensing

I've been reading through the paper and loved it! Just in case it was an oversight, would you kindly add a license to this repository?

Pretrained Models

Hi there,
Can the authors or someone who is following up this work, share the pretrained weights for this architecture?

visual_feedback

how to interpret the visual_feedback images.
yellow, green, red, blue colours denote.

Correct prediction for column, rows and cells coordinates

Is the repo public-ready?

Hi guys, this project looks really interesting! But there are some of ToDo's in the README.md.

We'll remove this note once we are done. Expect them to be done by June 15, 2019.

Any chance that's done? I want to try this out and extract tables in the documents I have.

Anyway, thanks for open-sourcing this project. ❤️

OP_REQUIRES failed

2021-03-19 05:40:04.139370: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: global_features. Can't parse serialized Example.
Traceback (most recent call last):
File "/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "
anaconda3/envs/tr-graph/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "anaconda3/envs/tr-graph/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Key: global_features. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseSingleExample}}]]
[[IteratorGetNext_1]]
[[IteratorGetNext_1/_3]]
(1) Invalid argument: Key: global_features. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseSingleExample}}]]
[[IteratorGetNext_1]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Regarding parsing error

Thanks for making the code available for the paper. I am running the code with the example data you provided. I am running in to error related to parsing.

iterators/table_adjacency_parsing_iterator.py", line 78, in train
    model.sanity_preplot(sess, summary_writer)

tensorflow.python.framework.errors_impl.InvalidArgumentError: Key: global_features.  Can't parse serialized Example.
	 [[{{node ParseSingleExample/ParseSingleExample}} = ParseSingleExample[Tdense=[DT_INT64, DT_INT64, DT_INT64, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], dense_keys=["adjacency_matrix_cells", "adjacency_matrix_cols", "adjacency_matrix_rows", "global_features", "image", "vertex_features", "vertex_text"], dense_shapes=[[810000], [810000], [810000], [3], [1049088], [4500], [27000]], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const, ParseSingleExample/Const, ParseSingleExample/Const_3, ParseSingleExample/Const_3, ParseSingleExample/Const_3, ParseSingleExample/Const)]]
	 [[node IteratorGetNext_1 (defined at /table_detection/graphparsing/ties_v2/readers/image_words_reader.py:67)  = IteratorGetNext[output_shapes=[[?,4500], [?,27000], [?,1049088], [?,3], [?,810000], [?,810000], [?,810000]], output_types=[DT_FLOAT, DT_INT64, DT_FLOAT, DT_FLOAT, DT_INT64, DT_INT64, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator_1)]]
	 [[{{node IteratorGetNext_1/_9}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11_IteratorGetNext_1", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

I think the issue is related to getfeeds. I think there is mismatch between how TFRecords are packed and how they are parsed. I also tried by generating new data using the code provided in READ file and I am facing the same issue.

Problems with training: Tensorflow queue_runner

Hi!

So, i tried to run the training as read.me shows, installing the dependencies (pip install -r requirements.txt) and
when I ran this command
python .\python\bin\iterate\table_adjacency_parsing.py .\configs\config.ini basic_conv_graph
the file table_adjacency_parsing.py showed me an error to import the iterators file table_adjacency_parsing_iterator.py. I solved this by placing the file table_adjacency_parsing.py at the root of the project.

Excellent! But when I ran the command again it showed me the following warnings
WARNING:tensorflow: From table_adjacency_parsing_iterator.py:73: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the 'tf.data' module. WARNING:tensorflow:'tf.train.start_queue_runners()' was called when no queue runners were defined. You can safely remove the call to this deprecated function.

Ok, i need to update the tensorflow method, but where is the graph keys (to update the deprecated function) and why this is happening?

Specs

Windows 10
nvidia mx110 (vídeo card)
I7 8th Gen
8GB Ram
Python 3.6 :: Anaconda, Inc.

Inference with single image

@shahrukhqasim hi, i have a question after reading your code. When inferring a picture, do we need to provide a box corordinate(x,y,x2,y2) for each word in the table? Because i did not see any updatas about vertex features in your code.

Invalid argument: indices[24,899,5] = [24, 899, 900] does not index into param shape [25,900,900]

Hello,

When i am training the model with the data which was mentioned in this repo(i.e. google drive link) I am getting below error.

I have tried this on Windows 8.1 CPU version and Linux 64 bit CPU.
Tensorflow Version: 2.2.0 also tried on TF version: 1.15.3

I have modified the code according to 2.* version. But I couldn't find the fix for this issue.
Could you please help me what changes to be done to overcome this.

Here is the traceback of the issue.

2020-09-01 10:26:56.662955: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[24,899,5] = [24, 899, 900] does not index into param shape [25,900,900]
2020-09-01 10:26:56.711510: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[24,899,5] = [24, 899, 900] does not index into param shape [25,900,900]
2020-09-01 10:26:56.783215: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[21,788,3] = [21, 788, 900] does not index into param shape [25,900,900]
2020-09-01 10:26:58.902602: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[12,450,3] = [12, 900] does not index into param shape [25,900,128]
2020-09-01 10:26:58.905858: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[23,394,5] = [23, 900] does not index into param shape [25,900,128]
2020-09-01 10:26:58.912144: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[24,899,5] = [24, 900] does not index into param shape [25,900,128]
Traceback (most recent call last):
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[24,899,5] = [24, 899, 900] does not index into param shape [25,900,900]
[[{{node conv_graph_dgcnn_fast_conv_1/GatherNd_10}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "bin/iterate/table_adjacency_parsing.py", line 35, in
trainer.train()
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../iterators/table_adjacency_parsing_iterator.py", line 84, in train
model.run_training_iteration(sess, summary_writer, iteration_number)
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py", line 481, in run_training_iteration
ops_result = sess.run(ops_to_run, feed_dict = feed_dict)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 958, in run
run_metadata_ptr)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[24,899,5] = [24, 899, 900] does not index into param shape [25,900,900]
[[node conv_graph_dgcnn_fast_conv_1/GatherNd_10 (defined at /home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py:259) ]]

Errors may have originated from an input operation.
Input Source operations connected to node conv_graph_dgcnn_fast_conv_1/GatherNd_10:
conv_graph_dgcnn_fast_conv_1/Placeholder_5 (defined at /home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py:163)
conv_graph_dgcnn_fast_conv_1/concat_23 (defined at /home/sdpu
ser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py:244)

Original stack trace for 'conv_graph_dgcnn_fast_conv_1/GatherNd_10':
File "bin/iterate/table_adjacency_parsing.py", line 35, in
trainer.train()
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../iterators/table_adjacency_parsing_iterator.py", line 48, in train
model.initialize(training=True)
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py", line 96, in initialize
self.build_computation_graphs()
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py", line 445, in build_computation_graphs
self.build_classification_segments(graph_features, placeholders)
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py", line 337, in build_classification_segments
sampled_indices, computation_graph, gt_matrix = self.do_monte_carlo_sampling(graph_features, gt_sampled_adj_matrix)
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python_v2/bin/iterate/../../models/basic_model.py", line 259, in do_monte_carlo_sampling
return samples, x, tf.gather_nd(gt_matrix, indexing_tensor_for_adj_matrices)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 4852, in gather_nd_v2
return gather_nd(params, indices, name=name, batch_dims=batch_dims)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 4844, in gather_nd
return gen_array_ops.gather_nd(params, indices, name=name)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3666, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3327, in _create_op_internal
op_def=op_def)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1791, in init
self._traceback = tf_stack.extract_stack()

config file setup

Hi, please suggest, if it possible to feed both the configs? as the config file contains two sections [basic_conv_graph] and [conv_graph_dgcnn_fast_conv].

Below command I am using for training
python bin/iterate/table_adjacency_parsing.py ../configs/config.ini basic_conv_graph

I have problem while I'm tranning.

Training Iteration 105:
Accuracy - cells: 0.978424609 rows: 0.895143926 cols: 0.803354
Loss - cells: 0.0991929844 rows: 0.293497771 cols: 0.486667097
Fraction - cells: 0.0215753876 rows: 0.104856044 cols: 0.19664596
Total loss: 0.293119282
Training Iteration 106:
Accuracy - cells: 0.98035419 rows: 0.896977782 cols: 0.827894092
Loss - cells: 0.0928051174 rows: 0.2944884 cols: 0.456886739
Fraction - cells: 0.0196458325 rows: 0.103022188 cols: 0.172105893
Total loss: 0.281393409
Training Iteration 107:
2019-12-02 11:23:50.896536: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[26,190] = [26, 109, 90] does not index into param shape [30,109,109,128]
Traceback (most recent call last):
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[26,190] = [26, 109, 90] does not index into param shape [30,109,109,128]
[[{{node conv_graph_dgcnn_fast_conv_1/GatherNd}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "table_adjacency_parsing.py", line 31, in
trainer.train()
File "/home/hungdv/Documents/graphNN/TIES-verions2/python/iterators/table_adjacency_parsing_iterator.py", line 88, in train
model.run_training_iteration(sess, summary_writer, iteration_number)
File "/home/hungdv/Documents/graphNN/TIES-verions2/python/models/basic_model.py", line 400, in run_training_iteration
ops_result = sess.run(ops_to_run, feed_dict = feed_dict)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[26,190] = [26, 109, 90] does not index into param shape [30,109,109,128]
[[node conv_graph_dgcnn_fast_conv_1/GatherNd (defined at /home/hungdv/Documents/graphNN/TIES-verions2/python/ops/ties.py:32) ]]

Errors may have originated from an input operation.
Input Source operations connected to node conv_graph_dgcnn_fast_conv_1/GatherNd:
conv_graph_dgcnn_fast_conv_1/Cast (defined at /home/hungdv/Documents/graphNN/TIES-verions2/python/ops/ties.py:31)
conv_graph_dgcnn_fast_conv_1/conv2d_14/LeakyRelu (defined at /tmp/tmp8vnsy1dh.py:74)

Original stack trace for 'conv_graph_dgcnn_fast_conv_1/GatherNd':
File "table_adjacency_parsing.py", line 31, in
trainer.train()
File "/home/hungdv/Documents/graphNN/TIES-verions2/python/iterators/table_adjacency_parsing_iterator.py", line 53, in train
model.initialize(training=True)
File "/home/hungdv/Documents/graphNN/TIES-verions2/python/models/basic_model.py", line 94, in initialize
self.build_computation_graphs()
File "/home/hungdv/Documents/graphNN/TIES-verions2/python/models/basic_model.py", line 365, in build_computation_graphs
vertices_y2, vertices_x2, scale_y, scale_x)
File "/home/hungdv/Documents/graphNN/TIES-verions2/python/ops/ties.py", line 32, in gather_features_from_conv_head
return tf.gather_nd(conv_head, indexing_tensor)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 3796, in gather_nd
return gen_array_ops.gather_nd(params, indices, name=name)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3991, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/home/hungdv/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

How I can train this model?

Hi, author
Can you share how to train the model?
thanks,

DataLossError

Hi, shahrukhqasim. Thanks for sharing this code. When runing python bin/iterate/table_adjacency_parsing.py path/to/the/config/file basic aft after modifying the config file, I camp up with an error as follows:

DataLossError (see above for traceback): inflate() failed with error -3: incorrect header check
         [[node IteratorGetNext_1 (defined at /home/wangzehua/table_recognition/TIES-2.0/python/readers/image_words_reader.py:67)  = IteratorGetNext[output_shapes=[[?,4500], [?,27000], [?,1049088], [?,3], [?,810000], [?,810000], [?,810000]], output_types=[DT_FLOAT, DT_INT64, DT_FLOAT, DT_FLOAT, DT_INT64, DT_INT64, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator_1)]]
         [[{{node IteratorGetNext_1/_9}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_11_IteratorGetNext_1", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

I used the data provided at https://drive.google.com/drive/folders/18QyBB1pavj_xCsTyCR6XC_AA525nZaVZ
do you know how to fix it?

I have question about file to solving the problem of maximal cliques

"After getting adjacency matrices, complete cells, rows and columns can be reconstructed by
solving the problem of maximal cliques [33] for rows and columns and connected components for cells"

Getting access denied error in model_path

2021-01-20 13:42:41.836366: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model: Unknown: NewRandomAccessFile failed to Create/Open: D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model : Access is denied.
; Input/output error
2021-01-20 13:42:41.847091: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model: Unknown: NewRandomAccessFile failed to Create/Open: D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model : Access is denied.
; Input/output error
2021-01-20 13:42:41.856967: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_tensor.cc:175 : Data loss: Unable to open table file D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model: Unknown: NewRandomAccessFile failed to Create/Open: D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model : Access is denied.
; Input/output error
Traceback (most recent call last):
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model: Unknown: NewRandomAccessFile failed to Create/Open: D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model : Access is denied.
; Input/output error
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "bin/iterate/table_adjacency_parsing.py", line 31, in
trainer.train()
File "D:\Users\ankans\Desktop\Table_recognition_GNN\TIES-2.0\python\iterators\table_adjacency_parsing_iterator.py", line 71, in train
saver.restore(sess, self.model_path)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1546, in restore
{self.saver_def.filename_tensor_name: save_path})
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model: Unknown: NewRandomAccessFile failed to Create/Open: D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model : Access is denied.
; Input/output error
[[node save/RestoreV2 (defined at D:\Users\ankans\Desktop\Table_recognition_GNN\TIES-2.0\python\models\basic_model.py:362) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "bin/iterate/table_adjacency_parsing.py", line 31, in
trainer.train()
File "D:\Users\ankans\Desktop\Table_recognition_GNN\TIES-2.0\python\iterators\table_adjacency_parsing_iterator.py", line 48, in train
model.initialize(training=True)
File "D:\Users\ankans\Desktop\Table_recognition_GNN\TIES-2.0\python\models\basic_model.py", line 92, in initialize
self.build_computation_graphs()
File "D:\Users\ankans\Desktop\Table_recognition_GNN\TIES-2.0\python\models\basic_model.py", line 362, in build_computation_graphs
self.saver = tf.train.Saver(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, self.get_variable_scope()))
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1102, in init
self.build()
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 795, in _build_internal
restore_sequentially, reshape)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1549, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "D:\Users\ankans\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): Unable to open table file D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model: Unknown: NewRandomAccessFile failed to Create/Open: D:\Users\ankans\Desktop\Table_recognition_GNN\outputs\model : Access is denied.
; Input/output error
[[node save/RestoreV2 (defined at D:\Users\ankans\Desktop\Table_recognition_GNN\TIES-2.0\python\models\basic_model.py:362) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Sysinfo:
python-->3.6.8
OS --> windows 10 server edition

Any help is highly appreciable

ValueError: The passed save_path is not a valid checkpoint

Hi,
As per config file description we created output file structure and model file structure but we are getting the error listed below. Pls check and provide the necessary solution.

Traceback (most recent call last):
File "bin/iterate/table_adjacency_parsing.py", line 31, in
trainer.train()
File "/home/vision/shafique/citi/TIES-2.0-master/python/iterators/table_adjacency_parsing_iterator.py", line 71, in train
saver.restore(sess, self.model_path)
File "/home/vision/anaconda3/envs/TIES_table/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1538, in restore
+ compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: /home/vision/shafique/citi/TIES-2.0-master/output/Tables/TrainOut/betaout/model

How can I perform inference on a single image? Can anybody help with the issues?

I've observed that there are so many open and unanswered questions about various issues in code. I also couldn't find any clue about inferencing on a single image. Could you please provide some help about it?

ModuleNotFoundError: No module named 'iterators'

When I try to run the program on windows, I get the error:
python: can't open file 'python/bin/iterate/table_adjacency_parsing': [Errno 2] No such file or directory

When changing the command to python bin/iterate/table_adjacency_parsing.py , I get the traceback:

Traceback (most recent call last):
  File "bin/iterate/table_adjacency_parsing.py", line 2, in <module>
    from iterators.table_adjacency_parsing_iterator import TableAdjacencyParsingIterator
ModuleNotFoundError: No module named 'iterators'

Evaluation code and results generation

@shahrukhqasim Hi, author, thanks for your excellent work. I am now concentrating on the problem of table recognition, and learn a lot from your paper and code. It's really kind of you for sharing your code and giving detailed description.
By contrasting the paper and code, I found that the part of the evaluation doesn't exist in the project, and I am frustrated when converting the three adjacency matrices(cell, row, column) into the final results. Could you please share the two parts code?

How can I inference single image?

Probabilities for numbers, words and other counts

Here you are extracting probabilities for the table creation, are these probabilities from the whole document or just the table regions.

Bad Visual feedback results for Columns and Cells

Hello @shahrukhqasim and Team,

I'm getting very bad results for Columns and Cells Prediction. But rows prediction is good.
Could someone look into this and help me on this.

I have modified "is_sampling_balanced = 0" from 1 to overcome "indices does not index into param shape" issue.
In the pdf's
Blue rectangle indicates - Ground_Truth = 0 and Predicted = 0
Pink rectangle indicates - Ground_Truth = 1 and Predicted = 0
Orange rectangle indicates - The test cell which we are using for prediction of Cells/Columns/Rows

02916_cells.pdf
02916_cols.pdf

Thanks in advance.

final dataset required

Hey guys,

I am looking for the final tfrecords for both training and validation. is it possible to share? Thanks in advance.
Great work!!

Inference with own image

How do I transform my own images in a tf_record either to run on training or inference, (OCR, then stablishing vertext, etc ) ?
Thank you, this is a nice project.

How can I inference on single image?

DataLossError: corrupted record at 0 while reading tfrecord files

  1 import tensorflow as tf
  2

----> 3 for example in tf.python_io.tf_record_iterator("ZI70YDAKXIT453SKIGZ8.tfrecord"):
4 result = tf.train.Example.FromString(example)

1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/tf_record.py in tf_record_iterator(path, options)
179 while True:
180 try:
--> 181 reader.GetNext()
182 except errors.OutOfRangeError:
183 break

/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py in GetNext(self)
795
796 def GetNext(self):
--> 797 return _pywrap_tensorflow_internal.PyRecordReader_GetNext(self)
798
799 def record(self):

DataLossError: corrupted record at 0

Fetch argument None has invalid type <class 'NoneType'>

Traceback (most recent call last):
  File "bin\iterate\table_adjacency_parsing.py", line 36, in <module>
    trainer.train()
  File "PycharmProjects\TIES-2.0\python\iterators\table_adjacency_parsing_iterator.py", line 85, in train
    model.run_training_iteration(sess, summary_writer, iteration_number)
  File "PycharmProjects\TIES-2.0\python\models\basic_model.py", line 388, in run_training_iteration
    ops_result = sess.run(ops_to_run, feed_dict = feed_dict)
  File "AppData\Local\Continuum\anaconda3\envs\Yvision\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
    run_metadata_ptr)
  File "AppData\Local\Continuum\anaconda3\envs\Yvision\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run
    self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
  File "AppData\Local\Continuum\anaconda3\envs\Yvision\lib\site-packages\tensorflow\python\client\session.py", line 471, in __init__
    self._fetch_mapper = _FetchMapper.for_fetch(fetches)
  File "AppData\Local\Continuum\anaconda3\envs\Yvision\lib\site-packages\tensorflow\python\client\session.py", line 261, in for_fetch
    return _ListFetchMapper(fetch)
  File "AppData\Local\Continuum\anaconda3\envs\Yvision\lib\site-packages\tensorflow\python\client\session.py", line 370, in __init__
    self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
  File "AppData\Local\Continuum\anaconda3\envs\Yvision\lib\site-packages\tensorflow\python\client\session.py", line 370, in <listcomp>
    self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
  File "AppData\Local\Continuum\anaconda3\envs\Yvision\lib\site-packages\tensorflow\python\client\session.py", line 258, in for_fetch
    type(fetch)))
TypeError: Fetch argument None has invalid type <class 'NoneType'>

Can someone guide, what could possibly be wrong here ?
Train/test/val tfrecords path is correct in config

Loss close to 0, acc nearly 1, but poor visual_feedbacks

Hi Shahrukhqasim,

Thanks for sharing your project. I tried to train the model by tf-record downloaded from google drive you provided. Used the same tfrecord for train, val and test.
For the training process, loss converged quickly, after 2000 iterations, I got the result as below:
Training Iteration 1999:
Accuracy - cells: 0.999771655 rows: 1 cols: 0.999677658
Loss - cells: 0.00083513 rows: 0.00029534733 cols: 0.00229700166
Fraction - cells: 0.503221631 rows: 0.493117601 cols: 0.492601216
Total loss: 0.00114249298

But when I checked the visual feedbacks, I got poor visualized result :

Does anybody have some advice or have similar problems? Many thanks

Indices do not index into param shape

I'm trying to run a training with the data you provided, but get some index problems after some seconds into iteration 0:

python bin/iterate/table_adjacency_parsing.py /home/johannes/devel/projects/tr/configs/gravnet_fast_conv_partial.ini gravnet_fast_conv
(25, 900, 64)
(25, 900, 64)
(25, 900, 64)
(25, 900, 64)
The model has 972848 parameters.
Cleaned summary directory
Cleaned visual feedback output directory
2019-08-01 12:21:21.721217: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING:tensorflow:From /home/johannes/devel/src/git/python/TIES-2.0/python/iterators/table_adjacency_parsing_iterator.py:67: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:`tf.train.start_queue_runners()` was called when no queue runners were defined. You can safely remove the call to this deprecated function.
Starting iterations
Training Iteration 0:
2019-08-01 12:21:41.314411: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
2019-08-01 12:21:41.317000: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
2019-08-01 12:21:41.320124: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[9,899,5] = [9, 899, 900] does not index into param shape [25,900,900]
2019-08-01 12:21:43.658247: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[19,899,5] = [19, 900] does not index into param shape [25,900,128]
2019-08-01 12:21:43.668727: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[24,899,5] = [24, 900] does not index into param shape [25,900,128]
2019-08-01 12:21:43.673979: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[4,899,5] = [4, 900] does not index into param shape [25,900,128]
Traceback (most recent call last):
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
	 [[{{node conv_grav_net_fast_conv/GatherNd_6}} = GatherNd[Tindices=DT_INT32, Tparams=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_conv_grav_net_fast_conv/Placeholder_3_0_3, conv_grav_net_fast_conv/concat_11)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/iterate/table_adjacency_parsing.py", line 31, in <module>
    trainer.train()
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/iterators/table_adjacency_parsing_iterator.py", line 82, in train
    model.run_training_iteration(sess, summary_writer, iteration_number)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 384, in run_training_iteration
    ops_result = sess.run(ops_to_run, feed_dict = feed_dict)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
	 [[node conv_grav_net_fast_conv/GatherNd_6 (defined at /home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py:201)  = GatherNd[Tindices=DT_INT32, Tparams=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_conv_grav_net_fast_conv/Placeholder_3_0_3, conv_grav_net_fast_conv/concat_11)]]

Caused by op 'conv_grav_net_fast_conv/GatherNd_6', defined at:
  File "bin/iterate/table_adjacency_parsing.py", line 31, in <module>
    trainer.train()
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/iterators/table_adjacency_parsing_iterator.py", line 48, in train
    model.initialize(training=True)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 92, in initialize
    self.build_computation_graphs()
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 360, in build_computation_graphs
    self.build_classification_segments(graph_features, placeholders)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 270, in build_classification_segments
    sampled_indices, computation_graph, gt_matrix = self.do_monte_carlo_sampling(graph_features, gt_sampled_adj_matrix)
  File "/home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py", line 201, in do_monte_carlo_sampling
    return samples, x, tf.gather_nd(gt_matrix, indexing_tensor_for_adj_matrices)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/johannes/devel/env/ties/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[14,899,5] = [14, 899, 900] does not index into param shape [25,900,900]
	 [[node conv_grav_net_fast_conv/GatherNd_6 (defined at /home/johannes/devel/src/git/python/TIES-2.0/python/models/basic_model.py:201)  = GatherNd[Tindices=DT_INT32, Tparams=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_conv_grav_net_fast_conv/Placeholder_3_0_3, conv_grav_net_fast_conv/concat_11)]]

Some indices seem to reference out of bounds. Do you have an idea what could be causing this?

InvalidArgumentError: Input to reshape is a tensor with 810000 values, but the requested shape has 27000 [Op:Reshape]

Hi Team,

When I am trying to run the training part(I am using the partial dataset which was provided by you in ReadMe) , I am getting the below error.

Max vertices: 900
Max word length: 30

vertex_text: tf.Tensor(
[[ 82 69 76 ... 0 0 0]
[ 76 79 78 ... 0 0 0]
[109 103 97 ... 0 0 0]
...
[ 68 105 115 ... 0 0 0]
[ 79 70 0 ... 0 0 0]
[ 74 97 110 ... 0 0 0]], shape=(30, 27000), dtype=int64)
shape of vertex_text: (30, 27000)
Traceback (most recent call last):
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8075, in reshape
tld.op_callbacks, tensor, shape)
tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "bin/iterate/table_adjacency_parsing.py", line 35, in
trainer.train()
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python/bin/iterate/../../iterators/table_adjacency_parsing_iterator.py", line 48, in train
model.initialize(training=True)
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python/bin/iterate/../../models/basic_model.py", line 72, in initialize
self.training_feeds = self.training_reader.get_feeds()
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python/bin/iterate/../../readers/image_words_reader.py", line 76, in get_feeds
vertex_text = tf.reshape(vertex_text, shape=(self.num_max_vertices, self.max_word_length))
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 193, in reshape
result = gen_array_ops.reshape(tensor, shape, name)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8080, in reshape
tensor, shape, name=name, ctx=_ctx)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8107, in reshape_eager_fallback
ctx=ctx, name=name)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 810000 values, but the requested shape has 27000 [Op:Reshape]

Here it is expecting a tensor of shape 810000 as the shape of vertex_text is (30, 27000).

To overcome this I have modified the parameter max_vertices from 900 to 27000

After this change, I am getting other error:

File "bin/iterate/table_adjacency_parsing.py", line 35, in
trainer.train()
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python/bin/iterate/../../iterators/table_adjacency_parsing_iterator.py", line 48, in train
model.initialize(training=True)
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python/bin/iterate/../../models/basic_model.py", line 72, in initialize
self.training_feeds = self.training_reader.get_feeds()
File "/home/sdpuser/Desktop/Table_Detection/TIES/TIES-2.0/python/bin/iterate/../../readers/image_words_reader.py", line 71, in get_feeds
vertex_features, vertex_text, image, global_features, adj_cells, adj_rows, adj_cols = iterator.get_next()
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 741, in get_next
return self._next_internal()
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 661, in _next_internal
return structure.from_compatible_tensor_list(self._element_spec, ret)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 1989, in execution_mode
executor_new.wait()
File "/home/sdpuser/.conda/envs/table_detection/lib/python3.6/site-packages/tensorflow/python/eager/executor.py", line 67, in wait
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Key: adjacency_matrix_cols. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]]

Can you please look into this and help me what can be done.

Thank you in Advance.

The evaluation code and the evaluation result

@shahrukhqasim Hi , I am working on the table recognization, and I find that the result in your paper is very good. But when I reproduced your code， I found there is no evaluation code in TIES-2.0. Could you give me more details about the evaluation? I will appreciate it if you make your code public.

Cols loss remains same over the iterations

The code is working great for rows and cells. However, the cols are not getting optimized. I tried to shift my focus exclusively on cols by having the following config.

; Weight of losses, for cells, rows and cols respectively
; No need to sum it to one, will be handled internally (could be floating points)
loss_alpha=0
loss_beta=0
loss_gamma=4

However, I see there is no much change in accuracy or loss for cols. Am I missing something?

Training Iteration 2041:
Accuracy - cells: 0.41997537 rows: 0.488680273 cols: 0.508866429
 Loss     - cells: 0.740955174 rows: 0.861711681 cols: 0.533341587
 Fraction - cells: 0.500545442 rows: 0.492519379 cols: 0.49120307
 Total loss: 0.533341587

pretrain model

Can you provide a pretrained model? Thanks very much.

vertex_text dimension error

I changed the parameters, there seems to be another issue, the code passed the previous checkpoint where it was failing (Ref Issue #7 ). The vertex_text has shape of (10, 27000) whereas the dimension passed to the function is shape=(self.num_max_vertices, self.max_word_length) i.e. (900, 30).
Link to the config file, splits and tfrecords: https://drive.google.com/drive/folders/1OWiAM_2ZGYr-ywoXLWJmuBjdfLXBh_XX?usp=sharing

shahrukhqasim / ties-2.0 Goto Github PK

ties-2.0's Introduction

TIES-2.0

Note to the visitors

Dataset

Development note

Preparation

Training

Inference

Installation

Coming soon

ties-2.0's People

Contributors

Stargazers

Watchers

Forkers

ties-2.0's Issues

Specs

Recommend Projects

Recommend Topics

Recommend Org