Giter Club home page Giter Club logo

kgtk's Introduction

KGTK: Knowledge Graph Toolkit

doi travis ci Coverage Status

The Knowledge Graph Toolkit (KGTK) is a comprehensive framework for the creation and exploitation of large hyper-relational knowledge graphs (KGs), designed for ease of use, scalability, and speed. KGTK represents KGs in tab-separated (TSV) files with four columns: edge-identifier, head, edge-label, and tail. All KGTK commands consume and produce KGs represented in this simple format, so they can be composed into pipelines to perform complex transformations on KGs. KGTK provides:

  • a suite of import commands to import Wikidata, RDF and popular graph representations into KGTK format;
  • a rich collection of transformation commands make it easy to clean, union, filter, and sort KGs;
  • graph combination commands support efficient intersection, subtraction, and joining of large KGs;
  • a query language using a variant of Cypher, optimized for querying KGs stored on disk supports efficient ad hoc queries;
  • graph analytics commands support scalable computation of centrality metrics such as PageRank, degrees, connected components and shortest paths;
  • advanced commands support lexicalization of graph nodes, and computation of multiple variants of text and graph embeddings over the whole graph;
  • a suite of export commands supports the transformation of KGTK KGs into commonly used formats, including the Wikidata JSON format, RDF triples, JSON documents for ElasticSearch indexing and graph-tool;
  • a development environment using Jupyter notebooks provides seamless integration with Pandas.

KGTK can process Wikidata-sized KGs with billions of edges on a laptop. We have used KGTK in multiple use cases, focusing primarily on construction of subgraphs of Wikidata, analysis of over 300 Wikidata dumps since the inception of the Wikidata project, linking tables to Wikidata, construction of a commonsense KG combining multiple existing sources, creation of Wikidata extensions for food security and the pharmaceutical industry.

KGTK is open source software, well documented, actively used and developed, and released using the MIT license. We invite the community to try KGTK. It is easy to get started with our tutorial notebooks available and executable online.

Installation

The following instructions install KGTK and the KGTK Jupyter Notebooks on Linux and MacOS systems.

If you want to install KGTK on a Microsoft Windows system, please
contact the KGTK team.

Our KGTK installations use a Conda virtual environment. If you don't have the Conda tools installed, follow this guide to install it. We recommend installing Miniconda installation rather than the full Anaconda installation.

Next, execute the following steps to install the latest stable release of KGTK:

conda create -n kgtk-env python=3.9
conda activate kgtk-env
conda install -c conda-forge graph-tool
conda install -c conda-forge jupyterlab
pip --no-cache install -U kgtk

Please see our installation document for more details. If you encounter problems with your installation, or are interested in a detailed explanation of these commands, read more about the installation procedure here.

Installation issues on Macbooks with M1 chip

Running pip install -e . (development mode) throws an error about 3 libraries,

  1. thinc
  2. blis
  3. tokenizers

Fixed the thinc issue by ,

a. commenting out [this line in requirements.txt](https://github.com/usc-isi-i2/kgtk/blob/dev/requirements.txt#L11)

b. running `pip install thinc-apple-ops`

Fixed the tokenizers issue by running the following commands in the conda environment

# download and install Rust. Follow the on screen instructions

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python/
pip install setuptools_rust
python setup.py install

continue installing kgtk, pip install -e .

Installing KGTK with Docker

Please refer to this document for installing KGTK with Docker

Getting started

Online Documentation

You can read our latest documentation online with:

https://kgtk.readthedocs.io/en/latest/

KGTK Notebooks

For examples of using KGTK, please see our Tutorial Notebooks.

Releases

KGTK Text Search API

The documentation for the KGTK Text Search API is here

KGTK Semantic Similarity API

The documentation for the KGTK Semantic Similarity API is here

How to cite

@inproceedings{ilievski2020kgtk,
  title={{KGTK}: A Toolkit for Large Knowledge Graph Manipulation and Analysis}},
  author={Ilievski, Filip and Garijo, Daniel and Chalupsky, Hans and Divvala, Naren Teja and Yao, Yixiang and Rogers, Craig and Li, Ronpeng and Liu, Jun and Singh, Amandeep and Schwabe, Daniel and Szekely, Pedro},
  booktitle={International Semantic Web Conference},
  pages={278--293},
  year={2020},
  organization={Springer}
  url={https://arxiv.org/pdf/2006.00088.pdf}
}

kgtk's People

Contributors

aidankelley avatar bhatiadivij avatar bin-go2 avatar chalypso avatar ckxz105 avatar cmungall avatar craigmilorogers avatar dangiankit avatar dgarijo avatar filievski avatar g1eb avatar grantxie avatar greatyyx avatar kartik2112 avatar kyao avatar naren954 avatar nicklein avatar rijulvohra avatar rongpenl avatar saggu avatar shashank73744 avatar shreya027 avatar szeke avatar thadguidry avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.