Giter Club home page Giter Club logo

openbg500's Introduction

OpenBG500

Information

OpenBG500 is an open chinese E-commerce and bussiness knowledge graph dataset contained 500 relations. This dataset is refined from the OpenBG, a million-scale multi-modal dataset evolving products and consumption demands in a unified schema. AliOpenKG500 is developed for several knowledge graph embedding evaluations.

The dataset splits all data into 3 parts. Base statistical information is shown in the table below.

#Relation #Entity #Train (opened) #Valid (opened) #Test
500 249,743 1242550 5000 5000

Data

OpenBG500 is available at Google Drive and Baidu Netdisk(password: 78fw). The main derectory of the dataset is as follows.

OpenBG500
├── OpenBG500_train.tsv 			# Training set
├── OpenBG500_dev.tsv 				# Validation set
├── OpenBG500_test.tsv 			    # Test set
├── OpenBG500_entity2text.tsv 		# Description of entities in Chinese
├── OpenBG500_relation2text.tsv 	# Description of relations in Chinese
└── OpenBG500_example_pred.tsv 	    # Submit example

Usage

Format

  • Triples
# OpenBG500_train.tsv/OpenBG500_dev.tsv
Head<\t>Relation<\t>Tail<\n>
  • Description of entities/relations in Chinese
# OpenBG500_entity2text.tsv/OpenBG500_relation2text.tsv
Entity(Relation)<\t>Description of entitie(relation)<\n>
  • Test and submit
# For OpenBG500_test.tsv, participants are required to predict 10 Tails for one instance. OpenBG500_example_pred.tsv is a submit example.
Head<\t>Relation<\n>

# OpenBG500_example_pred.tsv
Head<\t>Relation<\t>Tail 1<\t>Tail 2<\t>...<\t>Tail 10<\n>

Check the data

$ head -n 3 OpenBG500_train.tsv
ent_135492      rel_0352        ent_015651
ent_020765      rel_0448        ent_214183
ent_106905      rel_0418        ent_121073

Read the datasets

  1. Read the original data:
with open('OpenBG500_train.tsv', 'r') as fp:
    data = fp.readlines()
    train = [line.strip('\n').split('\t') for line in data]
    _ = [print(line) for line in train[:2]]
    # ['ent_135492', 'rel_0352', 'ent_015651']
    # ['ent_020765', 'rel_0448', 'ent_214183']
  1. Get the map of Entity(Relatioin)-Description: ent2text and rel2text:
with open('OpenBG500_entity2text.tsv', 'r') as fp:
    data = fp.readlines()
    lines = [line.strip('\n').split('\t') for line in data]
    _ = [print(line) for line in lines[:2]]
    # ['ent_101705', '短袖T恤']
    # ['ent_116070', '套装']

ent2text = {line[0]: line[1] for line in lines}

with open('OpenBG500_relation2text.tsv', 'r') as fp:
    data = fp.readlines()
    lines = [line.strip().split('\t') for line in data]
    _ = [print(line) for line in lines[:2]]
    # ['rel_0418', '细分市场']
    # ['rel_0290', '关联场景']

rel2text = {line[0]: line[1] for line in lines}
  1. Transfer the data to description:
train = [[ent2text[line[0]],rel2text[line[1]],ent2text[line[2]]] for line in train]
_ = [print(line) for line in train[:2]]
# ['苦荞茶', '外部材质', '苦荞麦']
# ['精品三姐妹硬糕', '口味', '原味硬糕850克【10包40块糕】']

Submit in Alibaba TIANCHI

OpenBG Benchmark:Large Scale Open Business Knowledge Graph Benchmark is a benchmark open for a long time. Welcome to submit your result of OpenBG500.

Baseline result

We do some baseline method on this dataset. TransE, DistMult and ComplEx result are based on OpenKE toolkit, KG-BERT and GenKGC results are based our code.

Method Hits@1 Hits@3 Hits@10
TransE 0.207 0.340 0.531
DistMult 0.049 0.088 0.216
ComplEx 0.053 0.120 0.266
KG-BERT 0.023 0.049 0.241
GenKGC 0.203 0.280 0.351

openbg500's People

Contributors

cheasim avatar timelordri avatar zxlzr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.