Giter Club home page Giter Club logo

divea's Introduction

DivEA

This repo is for reproducing our work High-quality Task Division for Large-scale Entity Alignment, which has been accepted at CIKM 2022 (arXiv).

Download the code and data. The structure of folders should be organized as below

divea/
|- datasets/    # datasets are put under this folder
   |- dbp15k/
   |- dwy100k/
   |- 2m/   # dataset fb_dbp of size 2M
|- divea/   # code of our method
|- RREA/    # RREA model
|- GCN-Align/     # GCN-Align model
|- scripts/    # scripts files for running our method with RREA
|- scripts2/    # scripts files for running our method with GCN-Align
|- environment.yml   # conda environment file
|- README.md

Python Environment

cd to project directory firstly.

Create the environment named divea and install most packages by running command:

conda env create -f environment.yml

Then, activate the environment:

conda activate divea

Finally, install package networkx-metis as below. Other installation instructions of networkx-metis can be found here.

git clone https://github.com/networkx/networkx-metis.git
cd networkx-metis/
python setup.py build
python setup.py install

Run scripts

The scripts for running our method with RREA are put under scripts/.

  • bash run_over_perf_vs_cps.sh. Overall performance. Table 1.
  • bash run_over_perf_vs_sbp.sh. Overall performance. Table 2.
  • bash run_over_perf_vs_cps_2m.sh. Overall performance. Table 1.
  • bash run_over_perf_vs_sbp_2m.sh. Overall performance. Table 2.

The scripts for running our method with GCN-Align are put under scripts2/. The script file names and corresponding functions can be aligned with scripts under scripts/.

Citation

Please cite this paper if you use the released code in your work.

@inproceedings{DBLP:conf/cikm/LiuHZZZ22,
  author    = {Bing Liu and
               Wen Hua and
               Guido Zuccon and
               Genghong Zhao and
               Xia Zhang},
  editor    = {Mohammad Al Hasan and
               Li Xiong},
  title     = {High-quality Task Division for Large-scale Entity Alignment},
  booktitle = {Proceedings of the 31st {ACM} International Conference on Information
               {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022},
  pages     = {1258--1268},
  publisher = {{ACM}},
  year      = {2022},
  url       = {https://doi.org/10.1145/3511808.3557352},
  doi       = {10.1145/3511808.3557352},
  timestamp = {Wed, 04 Jan 2023 07:33:22 +0100},
  biburl    = {https://dblp.org/rec/conf/cikm/LiuHZZZ22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acknowledgement

We used the source codes of RREA and GCN-Align.

divea's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

uqwhua

divea's Issues

Question about the running time

Thank you for your work again !
I have an another question about the running time in your paper.

I have noticed that in Table 5(Time cost of The EA division), DivEA achieved a high level quality of running time which only causes 398 seconds in 100K datasets. I wonder the time here represents the total training time, right? If not so, can you explain the what it includes?

Thank you again!

About the parameter "subtask_size"

Thanks for the open source code and nicely written essay :)!

The setting of "subtask_size" seems different. For example, in "run_over_perf_vs_sbp.sh", it's in ctx_size_dict=( ['fr']=9594 ['ja']=10323 ['zh']=9421 ['wd']=28693 ['yg']=29521 ). What is the actual meaning of the "subtask_size" and how we can get it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.