Giter Club home page Giter Club logo

transc's Introduction

Differentiating Concepts and Instances for Knowledge Graph Embedding

Source code and datasets of EMNLP2018 paper: "Differentiating Concepts and Instances for Knowledge Graph Embedding".

Dataset

We provide YAGO39K and M-YAGO39K dataset we used in this work in data/YAGO39K directory. The meanings of these files are explained as follows:

data/YAGO39K/Train

  • instance2id.txt, concept2id.txt, relation2id.txt: the id of every instance, concept and relation in this dataset.
  • triple2id.txt: normal triples represented as ids with format [head_instance_id tail_instance_id relation_id].
  • instanceOf2id.txt: instanceOf triples represented as ids with format [instance_id concept_id].
  • subClassOf2id.txt: subClassOf triples represented as ids with format [sub_concept_id concept_id].

data/YAGO39K/Valid

  • triple2id_positive.txt, instanceOf2id_positive.txt, subClassOf2id_positive.txt: positive triples from YAGO dataset. Their formats are the same as those in data/YAGO39K/Train.
  • triple2id_negative.txt, instanceOf2id_negative.txt, subClassOf2id_negative.txt: negative triples generated by us, which will be used in Triple Classification experiments.

data/YAGO39K/Test

  • the test dataset, which is similar to data/YAGO39K/Valid.

data/YAGO39K/M-Valid

  • the valid dataset for M-YAGO39K. The formats of all files are the same as those in data/YAGO39K/Valid.

data/YAGO39K/M-Test

  • the test dataset for M-YAGO39K. The formats of all files are the same as those in data/YAGO39K/Test.

Codes

The source codes of our work are put in the folder src/.

Compile

cd src
make

Train

cd src
./transC

You can add the following optional parameters when running transC:

  • -dim: the vector size.
  • -margin: the margin for normal triples.
  • -margin_ins: the margin for instanceOf triples.
  • -margin_sub: the margin for subClassOf triples.
  • -data: the dataset that you want to use. You can only choose YAGO39K in current vision.
  • -l1flag: it can be 1 or 0, 1 means using L1 loss and 0 means L2 loss.
  • -bern: it can be 1 or 0, 1 means using "bern" strategy and 0 means using "unif" strategy.

The Vectors of entities, concepts and relations will be stored in the folder vector/ after training.

Experiment

Link Precidtion

For Link Prediction experiment, you need to

cd src
./test_link_prediction

You can add the following optional parameters when running test_link_prediction:

  • -dim: the vector size.
  • -data: the dataset that you want to use. You can only choose YAGO39K in current vision.
  • -threads: how many threads you want to use.

Normal Triple Classification

For normal Triple Classification experiment, you need to

cd src
./test_triple_classification_normal

You can add the following optional parameters when running test_triple_classification_normal:

  • -dim: the vector size.
  • -data: the dataset that you want to use. You can only choose YAGO39K in current vision.

InstanceOf and SubClassOf Triple Classification

For instanceOf and subClassOf Triple Classification experiments, you need to

cd src
./test_triple_classification_isA

You can add the following optional parameters when running test_triple_classification_isA:

  • -dim: the vector size.
  • -data: the dataset that you want to use. You can only choose YAGO39K in current vision.
  • -mix: it can be 1 or 0, 1 means doing this experiment in M-YAGO39K and 0 means doing this experiment in YAGO39K.

Cite

If you use the code, please cite this paper:

Xin Lv, Lei Hou, Juanzi Li, Zhiyuan Liu. Differentiating Concepts and Instances for Knowledge Graph Embedding. The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).

transc's People

Watchers

Steven Du avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.