Giter Club home page Giter Club logo

the-black-knight-01 / tabulo Goto Github PK

View Code? Open in Web Editor NEW
196.0 11.0 40.0 10.87 MB

Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)

Home Page: https://interviewbubble.com

License: BSD 3-Clause "New" or "Revised" License

Python 97.88% JavaScript 1.04% CSS 0.72% HTML 0.37%
table-detection-using-deep-learning deep-learning table-detection tensorflow luminoth python detection sonnet tabulo faster-r-cnn

tabulo's Introduction

Tabulo


Tabulo is an open source toolkit for computer vision. Currently, we support table detection, but we are aiming for much more. It is built in Python, using Luminoth, TensorFlow and Sonnet.

Table of Contents

  1. Installation Instructions
  2. Avaiable API's
  3. Working with pretrained Models
  4. Runnning Tabulo
  5. Runnning Tabulo As Service
  6. Supported models
  7. Usage
  8. Working with datasets
  9. Training
  10. LICENSE

1. Installation Instructions

Tabulo currently supports Python 2.7 and 3.4โ€“3.6.

1.1 Pre-requisites

To use Tabulo, TensorFlow must be installed beforehand. If you want GPU support, you should install the GPU version of TensorFlow with pip install tensorflow-gpu, or else you can use the CPU version using pip install tensorflow.

We are using tesseract to extract data from table so you have to install tesseract also. Follow this link to install tessersact

1.2 Installing Tabulo

First, clone the repo on your machine and then install with pip:

git clone https://github.com/interviewBubble/Tabulo.git
cd tabulo
pip install -e .

1.3 Check that the installation worked

Simply run tabulo --help.

2. Avaiable API's

  • localhost:5000/api/fasterrcnn/predict/ - To detect table in the image
  • localhost:5000/api/fasterrcnn/extract/ - Extract table content from detected tables

3. Working with pretrained Models:

  • DOWNLOAD pretrained model from Google drive
  • Unzip and Copy downloaded luminoth folder inside luminoth/utils/pretrained_models folder
  • Hit this command to list all check points: tabulo checkpoint list
  • You will get output like this: Checkpoints
  • Now run server using this command: tabulo server web --checkpoint 6aac7a1e8a8e

4. Runnning Tabulo

4.1 Running Tabulo as Web Server:

Running Tabulo

4.2 Example of Table Detection with Faster R-CNN By Tabulo:

Example of Table Detection with Faster R-CNN By Tabulo

4.3 Example of Table Data Extraction with tesseract By Tabulo:

Example of Table Data Extraction with tesseract By Tabulo

5. Runnning Tabulo As Service:

5.1 Using Curl command

curl -X POST \
  http://localhost:5000/api/fasterrcnn/predict/ \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -H 'Postman-Token: 70478bd2-e1e8-442f-b0bf-ea5ecf7bf4d8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
  -F image=@/path/to/image/page_8-min.jpg

5.2 With PostMan

Header Section:

Table Detection using Postman

Data Section:

Table Detection using Postman

6. Supported models

Currently, we support the following models:

We also provide pre-trained checkpoints for the above models trained on popular datasets such as COCO and Pascal.

7. Usage

There is one main command line interface which you can use with the tabulo command. Whenever you are confused on how you are supposed to do something just type:

tabulo --help or tabulo <subcommand> --help

and a list of available options with descriptions will show up.

8. Working with datasets

DataSet to train your custom model.

9. Training

See Training your own model to learn how to train locally or in Google Cloud.

10. LICENSE

Released under the BSD 3-Clause.


References

tabulo's People

Contributors

stubbyb avatar the-black-knight-01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tabulo's Issues

Text extraction technique

Hi, Great work.! I am more interested on text extraction. Could you guide me which module is being used from the text extraction in the pipeline.

Luminoth archived

Hi guys, past month Luminoth was archived. You have plans to migrate to another toolkit? The project is awesome.

Version conflict while installing click for Tabulo

While running the tabulo --help , I am encountering the following error.
There seem to be a conflict between the two libraries, where each library want some specific version of click. I have downloaded both versions of click, i.e. 6.7 and 7.1.2. but still encountering error.

Traceback (most recent call last):
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (click 6.7 (/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages), Requirement.parse('click>=7.1.2'), {'Flask'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gurjot/Tabulo/tabenv/bin/tabulo", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3252, in <module>
    def _initialize_master_working_set():
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
    f(*args, **kwargs)
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 585, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 598, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (click 6.7 (/home/gurjot/Tabulo/tabenv/lib/python3.8/site-packages), Requirement.parse('click>=7.1.2'), {'Flask'})

slow process

Is it normal that the command tabulo predict takes so much time to process (like 30 seconds by image)?

clarificatons..

Hi ,

I have few questions on tabulo

  1. i am unable to download the checkpoint which is trained on tables i am able to see a cooc dataset faster rcnn and ssd.

  2. how does tabulo work? i mean i dont see any installation of tesseract how is the able to display the characters from a table image?

Regards
Sekar

What training data used

Great work. May I ask what training data you used for the pre-trained table detection model? Thanks!

Unable to predict table

I have tried tabulo predict on one of the pdf images from your sample's but it didn't generate anything, Below is the full message of the output.

Found 1 files to predict.
Neither checkpoint not config specified, assuming accurate.
Checkpoint not present locally. Want to download it? [y/N]: y
Downloading checkpoint...
Importing checkpoint... done.
Checkpoint imported successfully.
2020-02-19 17:16:24.245652: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: AVX AVX2
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-02-19 17:16:24.248831: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
Predicting page_8-min.jpg...OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 13264 thread 0 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 12156 thread 1 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 16980 thread 2 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 17288 thread 3 bound to OS proc set 6
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 22584 thread 4 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 18916 thread 5 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 22892 thread 6 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 896 thread 8 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 10468 thread 7 bound to OS proc set 7
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 11156 thread 9 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 20656 thread 10 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 11852 tid 14932 thread 11 bound to OS proc set 6
done.
{"file": "page_8-min.jpg", "objects": []}

Tensorflow 2.0 Support

I followed the installation process until the following command :

tabulo checkpoint list

I get the following error after executing the command:

Error:
AttributeError: module 'tensorflow' has no attribute 'contrib'

I am guessing its because I am using Tensorflow 2.0 and models dont support it. Is there a quick work around to this ?

resnet_v1_101 checkpoint does not load

I placed the 6aac7a1e8a8e checkpoint dir under Tabulo/luminoth/utils/pretrained_models

and ran from Tabulo:
tabulo server web --checkpoint 6aac7a1e8a8e

and it returned:
Checkpoint not found. Check remote repository? [y/N]: y

Retrieving remote index... done.
No changes in remote index.
Checkpoint isn't available in remote repository either.
Traceback (most recent call last):
File "C:\Users\jay\Anaconda2\envs\py36\Scripts\tabulo-script.py", line 11, in
load_entry_point('tabulo', 'console_scripts', 'tabulo')()
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 722, in call
return self.main(*args, **kwargs)
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 697, in main
rv = self.invoke(ctx)
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 1066, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\jay\anaconda2\envs\py36\lib\site-packages\click\core.py", line 535, in invoke
return callback(*args, **kwargs)
File "c:\users\jay\documents\ml_projects\luminoth\luminoth\tools\server\web.py", line 83, in web
config = get_checkpoint_config(checkpoint)
File "c:\users\jay\documents\ml_projects\luminoth\luminoth\tools\checkpoint_init
.py", line 193, in get_checkpoint_config
raise ValueError('Checkpoint not found.')
ValueError: Checkpoint not found.

Additionally, functionality to list existing checkpoints does not work.

I'm running on Windows 10, just cloned the repo this AM.

Is there any news on support of newer versions of Tensorflow?

I'm a complete newbie when it comes to data science, but this project would greatly help mine own, but sadly I cannot run it due to the TF versions and as far as I'm aware Apple doesn't support TF1.5. (I'm using an M1 Mac) Any help is greatly appreciated

tabulo: command not found

When installing Tabulo, I get below:

Installing collected packages: tabulo
  Attempting uninstall: tabulo
    Found existing installation: tabulo 0.2.4.dev0
    Uninstalling tabulo-0.2.4.dev0:
      Successfully uninstalled tabulo-0.2.4.dev0
  Running setup.py develop for tabulo
Successfully installed tabulo

After this, when running below from my CLI:

tabulo --help

I get below error:

-bash: tabulo: command not found

I am running Python on Mac OS X. What am I doing wrong?

Failed to create process

i installed tabulo and when I tried to run tabulo --help , I get an error saying Failed to create process. Can someone help me solve it??

Error: No such command "sever".

Error: No such command "sever". when I try to run the Tabulo service using tabulo sever web --checkpoint 6aac7a1e8a8e after downloading and unzipping the pretrained models into the luminoth/utils/pretrained_models directory.

Official Docker Image

Thanks for your project.

An official docker image would be great to get started quickly.

bug in predict.py script

Predict function doesn't work in command line but well in the web app interface

You should replace :

Open and read the image to predict.

with tf.gfile.Open(path, 'rb') as f:
    try:
        image = Image.open(f).convert('RGB')
    except (tf.errors.OutOfRangeError, OSError) as e:
        click.echo()
        click.echo('Error while processing {}: {}'.format(path, e))
        return

by :

Open and read the image to predict.

with tf.gfile.Open(path, 'rb') as f:
    try:
        image = Image.open(f).convert('RGB')
        img = np.asarray(image)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        b = cv2.distanceTransform(img, distanceType=cv2.DIST_L2, maskSize=5)
        g = cv2.distanceTransform(img, distanceType=cv2.DIST_L1, maskSize=5)
        r = cv2.distanceTransform(img, distanceType=cv2.DIST_C, maskSize=5)

        # merge the transformed channels back to an image
        transformed_image = cv2.merge((b, g, r))
    except (tf.errors.OutOfRangeError, OSError) as e:
        click.echo()
        click.echo('Error while processing {}: {}'.format(path, e))
        return

to be able to use tabulo predict in command line

How to call extract api

Hi I have installed the project and predict api is working but their is no documentation how i can call extract api can you help me with this,

Thanks in advance,

Not able to predict default page_8-min.jpg using Postman

I've downloaded a JPEG into the Tabulo directory (that I've cloned) and tried to run PostMan by importing the curl command onto PostMan
Key as 'image'
Value as '2_resized.jpg'

The error is
{
"error": "Missing image"
}

400 BAD REQUEST Error

Checkpoints not downloading

Upon hitting the command "tabulo checkpoint list" I got the below response

image

I haven't got this checkpoint in the list' 6aac7a1e8a8e' though It is present in the luminoth/checkpoints folder.
I have tried hitting the command with another checkpoint it showed in the list -> server web --checkpoint aad6912e94d9
but it says Checkpoint not present locally and downloading... which is never downloaded.

image

Please help me with this.

#luminoth #checkpoints #tabulo

Thanks.

Pretrained models not showing up checkpoint list

Hi, I'm trying to use the pre-trained model. I've downloaded the google drive folder and placed it in pretrained_models. When I run tabulo checkpoint list I do not see the local checkpoint.

Any idea what is going on?

Error: Missing image

Hi,
I have installed all the packages properly and I have been trying the following curl command from my MacBook terminal:

curl -X POST http://localhost:5000/api/fasterrcnn/predict/ -H 'Content-Type: application/x-www-form-urlencoded' -H 'Postman-Token: 70478bd2-e1e8-442f-b0bf-ea5ecf7bf4d8' -H 'cache-control: no-cache' -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' -F "image=@/Users/rudra/Desktop/page_12.jpg"
But it keeps outputting an error saying: {"error":"Missing image"}

Can anyone please help me solve this? I also tried -F image=@/Users/rudra/Desktop/page_12.jpg but no success. I even tried converting the curl request into a python post type request, but the same error.

Thanks in advance.

Text extraction technique

Hi, Great work.! I am more interested on text extraction. Could you guide me which module is being used from the text extraction in the pipeline.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.