Giter Club home page Giter Club logo

kgist's Introduction

KGist: Knowledge Graph Summarization for Anomaly Detection & Completion

Caleb Belth, Xinyi Zheng, Jilles Vreeken, and Danai Koutra. What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. ACM The Web Conference (WWW), April 2020. [Link to the paper]

If used, please cite:

@inproceedings{belth2020normal,
  title={What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization},
  author={Belth, Caleb and Zheng, Xinyi and Vreeken, Jilles and Koutra, Danai},
  booktitle={Proceedings of The Web Conference 2020},
  pages={1115--1126},
  year={2020}
}

Presentation: https://youtu.be/Ql7VEfliPXo

Setup

  1. git clone [email protected]:GemsLab/KGist.git
  2. cd data/
  3. unzip nell.zip
  4. unzip dbpedia.zip
  5. cd ../src/
  6. cd test/
  7. python tester.py

Requirements

  • Python 3
  • numpy
  • scipy
  • networkx

Data

Nell and DBpedia are zipped in the data/ directory. Yago is too big to distribute via Github.

{KG_name}.txt format: space separated, one triple per line.

s1 p1 o1
s2 p2 o2
...

{KG_name}_labels.txt format: space separated, one entity per line followed by a variable number of labels, also space separated.

e1 l1 l2 ...
e2 l1 l2 l3 ...
...

Example usage (from src/ dir)

Command Line

python main.py --graph nell

Interface

from graph import Graph
from searcher import Searcher
from model import Model

# load graph
graph = Graph('nell', idify=True)
# create a Searcher object to search for a model (set of rules)
searcher = Searcher(graph)
# build initial model
model = searcher.build_model()
model.print_stats()
# perform rule merging refinement
model = model.merge_rules()
model.print_stats()
# perform rule nesting refinement
model = model.nest_rules()
model.print_stats()

To compute anomaly scores for triples as in Section 4.3:

from anomaly_detector import AnomalyDetector

# construct an anomaly detector with the KGist model
anomaly_detector = AnomalyDetector(model)
# an edge/triple to score
edge = ('concept:company:limited_brands', 'concept:companyceo', 'concept:ceo:leslie_wexner')
anomaly_detector.score_edge(edge)
>>> 26.5164

Larger numbers mean more anomalous. Note that in our experiments in Section 5.2, we used KGist+m, which would be the model without running model.nest_rules().

Arguments

--graph {KG_name} Expects {KG_name}.txt and {KG_name}_labels.txt to be in data/ directory in format as described above for NELL and DBpedia.

--rule_merging / -Rm True/False (Optional; Default = False) Use rule merging refinement (Section 4.2.2)

--rule_nesting / -Rn True/False (Optional; Default = False) Use rule nesting refinement (Section 4.2.2)

--idify / -i True/False (Optional; Default = True) Convert entities and predicates to integer ids internally for faster processing

--verbosity / -v [0, infinity) (Optional; Default = 1,000,000) How frequently to log progress (use integers)

--output_path / -o (Optional; Default = 'output/') What directory to write the output to (log will still be printed to stdout)

Output

  • output/{KG_name}_model.pickle saves a Model object.
  • output/{KG_name}_model.rules saves the rules, which are recursively defined, in parenthetical form.

Frequently Asked Questions (FAQ)

I want to run KGist on my own dataset. How did you construct the labels file?

We constructed the labels file by moving the rdf:type triples to the labels file. Thus, if, for example, there are triples (LaRose, rdf:type, book) and (LaRose, rdf:type, novel) in the KG, then LaRose book novel would be a row in the labels file.

Comments or Questions

Contact Caleb Belth with comments or questions: [email protected]

kgist's People

Contributors

cbelth avatar danai112358 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kgist's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.