Giter Club home page Giter Club logo

publaynet-models's Introduction

Here I present Detectron2 object detection models trained on PubLayNet dataset, ranging from 81.139 to 86.690 in validation AP scores (possibly even better results can be achieved with longer training times).

Preview

Dataset overview

PubLayNet is a very large (over 300k images & over 90 GB in weight) dataset for document layout analysis. It contains images of research papers and articles and annotations for various elements in a page such as “text”, “list”, “figure” etc in these research paper images. The dataset was obtained by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. Originally provided by IBM here.

Data Description Zipped File Name Purpose
Train 0 Dataset, 13 GB train-0.tar.gz Training
Train 1 Dataset, 13 GB train-1.tar.gz Training
Train 2 Dataset, 13 GB train-2.tar.gz Training
Train 3 Dataset, 13 GB train-3.tar.gz Training
Train 4 Dataset, 13 GB train-4.tar.gz Training
Train 5 Dataset, 13 GB train-5.tar.gz Training
Train 6 Dataset, 13 GB train-6.tar.gz Training
Evaluation Dataset, 3 GB val.tar.gz Evaluation
Labels Dataset, 314 MB labels.tar.gz

Models overview

Models were trained on train part of the dataset, consisting of 335 703 images, and evaluated on val part of the dataset with 11 245 images. All the AP scores were obtained on the val dataset. Inference times were taken from official Detectron model zoo descriptions.

To provide you with some options, and to experiment with different models I have trained 4 versions for this dataset. 2 of them are from basic object detection, and 2 contain segmentation masks. You can choose which better suits your task by comparing accuracy scores and inference times of these models.

Name Inference time (s/im) Box AP Folder name Model zoo config Trained model
R50-FPN 0.038 81.139 faster_rcnn_R_50_FPN_3x Click me! Click me!
R101-FPN 0.051 84.295 faster_rcnn_R_101_FPN_3x Click me! Click me!
Name Inference time (s/im) Box AP Mask AP Folder name Model zoo config Trained model
R50-FPN 0.043 83.666 82.268 mask_rcnn_R_50_FPN_3x Click me! Click me!
R101-FPN 0.056 86.690 82.105 mask_rcnn_R_101_FPN_3x Click me! Click me!

Model folder

Each model's directory in git will contain these files:

File name Description
download.txt A text file that contains a link from where you can download the trained model
train.py Training script, that trains the model found in training_output sub-folder for a given number of epochs
test.py Testing script, that runs the model found in training_output in inference mode for 6 randomly preselected images (3 from training, 3 from evaluation datasets) and displays all predictions on each image in a pop-up window
eval.py Testing script, that runs the model found in training_output in inference mode for all images found in dataset's val folder and val.json. Evaluation is performed through Detectron's COCOEvaluator
evaluation.txt As eval.py takes some time to execute (from 10 to 20 minutes) I've recorded last output of evaluation in this separate text file

Generally all trained models .pth files should be available through here or here, but if not - refer to individual download.txt links

Using the models

In usage_example module I've provided a sample script example.py that builds Detectron objects (config and predictor) for my trained model and runs inference on it, with a sample interpretation of Detectron's outputs. Please note that test.py and eval.py in model folders use common functions shared between models in this project. While example.py provides a completely clean setup, assuming you only want to download the models and use them for inference directly. It requires only Pillow, OpenCV, numpy and Detectron2 to run.

Hardware used

Believe it or not but training of these models was performed on a regular consumer-grade personal gaming PC with one NVIDIA 2070 SUPER (8GB) GPU, Intel Core i5-10600K CPU and 32 GB RAM.

Links

In case you’d like to check my other work or contact me:

publaynet-models's People

Contributors

jpleorx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sgundu0506

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.