Giter Club home page Giter Club logo

stea's Introduction

STEA

This repo contains the source code of paper "Dependency-aware Self-training for Entity Alignment", which has been accepted at WSDM 2023.

Download the used data from this Dropbox directory. Decompress it and put it under STEA_code/ as shown in the folder structure below.

📌 The code has been tested. Feel free to create issues if you cannot run it successfully. Thanks!

Structure of Folders

STEA_code/
  |- datasets/
  |- OpenEA/
  |- scripts/
  |- stea/
    |- Dual_AMN/
    |- GCN-Align/
    |- RREA/
  |- environment.yml
  |- README.md

After you run a certain script, the program will automatically create one folder output/ which stores the evaluation results.

Device

The configurations of my devices are as below:

  • The experiments on 15K datasets were run on one GPU server, which is configured with an Intel(R) Xeon(R) Gold 6128 3.40GHz CPU, 128GB memory, 3 NVIDIA GeForce GTX 2080Ti GPUs and Ubuntu 20.04 OS.
  • The experiments on 100K datasets were run on one computing cluster, which runs CentOS 7.8.2003, and allocates us 200GB memory and 2 NVidia Volta V100 SXM2 GPUs.

I think one basic configuration can be: 12GB GPU for 15K datasets, and 32GB GPU for 100K datasets.

Install Conda Environment

cd to the project directory first. Then, run the following command to install the major environment packages.

conda env create -f environment.yml

Activate the env via conda activate stea, and then install package graph-tool:

conda install -c conda-forge graph-tool==2.29

(It seems slow to install this package. So be patient.)

With the installed environment above, you can run STEA for Dual-AMN, RREA and GCN-Align.

If you also want to run STEA for AliNet, please also install the following packages with pip:

pip install igraph
pip install python-Levenshtein
pip install dataclasses

Run Scripts

Some shell scripts with parameter settings are provided under scripts/ folder. Some brief

  • run_{Self-training_method}_w_{EA_Model}.sh. Run a certain self-training method with a certain EA model. You can set the name of dataset, the annotation amount, and other settings as you need.
  • run_analyze_paramK.sh. Analyze the sensitivity to the hyperparameter K.
  • run_analyze_norm_minmax.sh. Replace the softmax-based normalisation module with a MinMax scaler for analyzing the necessity of our normalisation module.

For each task, the evaluation results as well as some other outputs can be found in a certain folder under the output/ directory.

Note: AliNet runs much slower than the other EA models. So you can explore the self-training methods with the other EA models first.

You Want to Report Issues?

We are willing to hear from you if you have any problem in running our code, or find inconsistency between your running results and what reported in the paper.

Citation

Please cite this paper if you use the released code in your work.

@inproceedings{DBLP:conf/wsdm/0025LHZ23,
  author    = {Bing Liu and
               Tiancheng Lan and
               Wen Hua and
               Guido Zuccon},
  editor    = {Tat{-}Seng Chua and
               Hady W. Lauw and
               Luo Si and
               Evimaria Terzi and
               Panayiotis Tsaparas},
  title     = {Dependency-aware Self-training for Entity Alignment},
  booktitle = {Proceedings of the Sixteenth {ACM} International Conference on Web
               Search and Data Mining, {WSDM} 2023, Singapore, 27 February 2023 -
               3 March 2023},
  pages     = {796--804},
  publisher = {{ACM}},
  year      = {2023},
  url       = {https://doi.org/10.1145/3539597.3570370},
  doi       = {10.1145/3539597.3570370},
  timestamp = {Fri, 24 Feb 2023 13:56:00 +0100},
  biburl    = {https://dblp.org/rec/conf/wsdm/0025LHZ23.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acknowledgement

We used the source codes of RREA, Dual-AMN, OpenEA, and GCN-Align.

stea's People

Contributors

uqbingliu avatar

Stargazers

谢锋 avatar  avatar  avatar  avatar

Watchers

Tiny Tom avatar  avatar

stea's Issues

conda install -c conda-forge graph-tool==2.29

使用conda安装graph-tool时
conda install -c conda-forge graph-tool==2.29

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment:
一直卡在这一步,都快一下午时间了。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.