Giter Club home page Giter Club logo

probexpan's Introduction

ProbExpan

Pytorch Implementation for SIGIR2022 full paper "Contrastive Learning with Hard Negative Entities for Entity Set Expansion".

Prerequisites

python >= 3.7

pytorch >= 1.6.0

Data

The download links of datasets used in our experiments are all public available in their original paper mentioned in Appendix B. After downloading the dataset, put them under the folder "./data/"

Following instructions use Wiki as defalut dataset

Data Preprocessing

Run

python make_entity2sents.py

to get "./data/wiki/entity2sents.pkl"

Learning Phase 1 and 2

To train multiple models with Masked Entity Prediction task and ensemble top models, run

python main.py -num_model 5 -num_top_model 2

After pretraining, run

python main.py -num_model 5 -num_top_model 2 -pretrained_model epoch_5.pkl

Expansion result will be saved under "./data/wiki/ensemble+winodw+rank"

Learning Phase 3 and 4

To train multiple models with Contrastive Loss and ensemble top models, first run

python make_cls2eids-wiki.py

then run

python main.py -CL -num_model 5 -num_top_model 2 -output cl+ensemble+winodw+rank

After pretraining, run

python main.py -CL -num_model 5 -num_top_model 2 -output cl+ensemble+winodw+rank -pretrained_model epoch_5.pkl

Expansion result will be saved under "./data/wiki/cl+ensemble+winodw+rank"

Multi GPU

To train single model with Masked Entity Prediction task on Multi GPU, run

python -m torch.distributed.launch --nproc_per_node=[NUM_GPU] mlm-pretrain-multiGPU.py

Citation

If you consider our paper or code useful, please cite our paper:

@inproceedings{10.1145/3477495.3531954,
author = {Li, Yinghui and Li, Yangning and He, Yuxin and Yu, Tianyu and Shen, Ying and Zheng, Hai-Tao},
title = {Contrastive Learning with Hard Negative Entities for Entity Set Expansion},
year = {2022},
isbn = {9781450387323},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3477495.3531954},
doi = {10.1145/3477495.3531954},
abstract = {Entity Set Expansion (ESE) is a promising task which aims to expand entities of the target semantic class described by a small seed entity set. Various NLP and IR applications will benefit from ESE due to its ability to discover knowledge. Although previous ESE methods have achieved great progress, most of them still lack the ability to handle hard negative entities (i.e., entities that are difficult to distinguish from the target entities), since two entities may or may not belong to the same semantic class based on different granularity levels we analyze on. To address this challenge, we devise an entity-level masked language model with contrastive learning to refine the representation of entities. In addition, we propose the ProbExpan, a novel probabilistic ESE framework utilizing the entity representation obtained by the aforementioned language model to expand entities. Extensive experiments and detailed analyses on three datasets show that our method outperforms previous state-of-the-art methods.},
booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {1077–1086},
numpages = {10},
keywords = {knowledge discovery, entity set expansion, contrastive learning},
location = {Madrid, Spain},
series = {SIGIR '22}
}

probexpan's People

Contributors

geekjuruo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

probexpan's Issues

关于SE2数据集 处理的问题

您好,我这边下载SE2数据集的vocabulary.txt有150W个实体,请问您在处理该数据集时有初筛的过程嘛,还是直接将150W实体作为候选实体集?

ECOPO开源计划

楼主大佬您好,

最近我们教研室讨论了您的ACL22大作ECOPO算法,关于中文纠错的,我们老师希望我们复现出来

请问一下代码有开源计划吗?不好意思在这里发言寻求帮助

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.