Experimental Evidence Extraction in Data Science with Hybrid Table Features and Ensemble Learning
Authors: Wenhao Yu (ND), Wei Peng (ZJU), Yu Shu (SCU), Qingkai Zeng (ND), Meng Jiang (ND)
This paper propose a novel system that extracts experimental evidences from data science literature in PDF format and builds up the first experimental database for related research.
-
The left figure is the workflow of Tablepedia system: (1) PDF collection, (2)table extraction, (3) experimental evidence database construction, (4)database operations and visualization.
-
The right figure is an example DB constructed by Tablepedia from data science paper PDFs. For a dataset and an evaluation metric, one can use the database to check what the state-of-the-art (highlighted in yellow) is and whether the reported numbers in existing research are consistent (green box) or conflicting (red box).
python data/load_data.py
python anno/load_anno.py
python tabula/tabula-java.py
@inproceedings{yu2020experimental,
title={Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning},
author={Yu, Wenhao and Peng, Wei and Shu, Yu and Zeng, Qingkai and Jiang, Meng},
booktitle={Proceedings of The Web Conference 2020},
pages={951--961},
year={2020}
}
Please contact Wenhao Yu ([email protected]) if you have any questions.