Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver

This repository is a C++ implementation for TMM 19 paper

Liang, Yongqing, and Xin Li. "Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver." IEEE Transactions on Multimedia (2019).

If you use these codes in your research, please cite the paper.

Paper correction: The Equation (5) should be

1. Environment

We build and evaluate our codes under Ubuntu 18.04 and Mac OS X 10.14.5. The following packages are used in this repository:

OpenCV: 3.2.0
OpenMP: 2.1.1
CMake: 3.5
g++: 7.4.0
Python: 3.6.8

2. DocDataset description

Click here to download the DocDataset. Unzip the package and copy the gt and stripes into the /data/ folder of the repository.

DocDataset contains:

60 striped document puzzles with four types of complexities of 20, 30, 40, and 60 stripes. They are named as doc*_*.
3 physically shredded document puzzles. They are named as real*_*.
1 randomly oriented puzzle named doc3_36.

The comparison performance results are reported in the paper Table I, Table II, and Table III.

3. Usage

Download this repository, the source code can be compiled into debug and release executable files.

3.1 Compile

To generate the executable file in the debug mode:

./autogen debug

To generate the executable file in the release mode:

./autogen release

3.2 Reassemble a stripe puzzle

A quick example to reassemble the synthesized stripe puzzle

./bin/release/solver --text doc0 --num 40 --comp 2 --metric 2 --samples 300

Another example to reassemble the real-word stripe puzzle:

./bin/release/solver -t real1 -n 27  -c 2 -m 2 -s 10000 -r --word_conf_thres 70 --lambda0 0.5 --lambda1 0.7 --u_a 1 --filter_rate 0.2 --candidate_factor 5

The detailed document can be found in

./bin/release/solver --help

3.3 Run benchmark

We also provide an option to run the whole dataset instead of running each test case individually.

./benchmark.sh doc [gen]

When you run the benchmark or add --benchmark option to ./bin/release/solver, the results are saved in data/scores.

3.3.1 Generate stripe puzzles

In most cases, we recommend you to use the provided dataset for fair comparison.

The gen provides you an alternative option when running the benchmark. It which will run the ./bin/release/generator to generate the stripe puzzles from the groundtruth.

Details about randomly generating stripes from groundtruths can be found in src/generator/generate_puzzle.cpp.

3.3.2 Recommend parameters

For synthetic data, default parameters are good enough.

const double word_conf_thres = 70;
const double lambda0 = 0.3;
const double lambda1 = 0.5;
const double U_a = 2;
const double filter_rate = 0.7;
const int candidate_factor = 4;

We recommend the --samples at least 150, 300, 1000, 8000 for 20-, 30-, 40-, and 60-stripe puzzles.

For real-word data, real1, real2, and real3. We report our results in the following parameters.

const double word_conf_thres {70}; // or 60
const double lambda0 = 0.5;
const double lambda1 = 0.7;
const double U_a = 1;

// For Real Case 1
const double filter_rate = 0.2;
const int candidate_factor {5};

// For Real Case 2
const double filter_rate = 0.5;
const int candidate_factor {3};

// For Real Case 3
const double filter_rate = 0.6;
const int candidate_factor {5};

We recommend the --samples at least larger than 8000.

3.4 Clean

./autoclean debug

./autoclean release

4. Reference

@article{liang2019reassembling,
  title={Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver},
  author={Liang, Yongqing and Li, Xin},
  journal={IEEE Transactions on Multimedia},
  year={2019},
  publisher={IEEE}
}

xmlyqing00 / docreassembly Goto Github PK

docreassembly's Introduction

Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver

1. Environment

2. DocDataset description

3. Usage

3.1 Compile

3.2 Reassemble a stripe puzzle

3.3 Run benchmark

3.3.1 Generate stripe puzzles

3.3.2 Recommend parameters

3.4 Clean

4. Reference

docreassembly's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent