Giter Club home page Giter Club logo

kgtosa's Introduction

KGTOSA: Task-Oriented GNNs Training on Large Knowledge Graphs for Accurate and Efficient Modeling.

Accepted ICDE-2024

Hussein Abdallah, Walid Afandi, Panos Kalnis, and Essam Mansour
Contact: Hussein Abdallah ([email protected])

Latest version of the paper.

Abstract: A Knowledge Graph (KG) is a heterogeneous graph encompassing a diverse range of node and edge types. Heterogeneous Graph Neural Networks (HGNNs) are popular for training machine learning tasks like node classification and link prediction on KGs. However, HGNN methods exhibit excessive complexity influenced by the KG’s size, density, and the number of node and edge types. AI practitioners handcraft a subgraph of a KG G relevant to a specific task. We refer to this subgraph as a task-oriented subgraph (TOSG), which contains a subset of taskrelated node and edge types in G. Training the task using TOSG instead of G alleviates the excessive computation required for a large KG. Crafting the TOSG demands a deep understanding of the KG’s structure and the task’s objectives. Hence, it is challenging and time-consuming. This paper proposes KG-TOSA, an approach to automate the TOSG extraction for task-oriented HGNN training on a large KG. In KG-TOSA, we define a generic graph pattern that captures the KG’s local and global structure relevant to a specific task. We explore different techniques to extract subgraphs matching our graph pattern: namely (i) two techniques sampling around targeted nodes using biased random walk or influence scores, and (ii) a SPARQL-based extraction method leveraging RDF engines’ built-in indices. Hence, it achieves negligible preprocessing overhead compared to the sampling techniques. We develop a benchmark of real KGs of large sizes and various tasks for node classification and link prediction. Our experiments show that KG-TOSA helps state-of-the-art HGNN methods reduce training time and memory usage by up to 70% while improving the model performance, e.g., accuracy and inference time.

Fig.1: The TOSG’s generic graph pattern is based on two parameters: (i) the direction (outgoing and incoming) predicates, and (i) the number of hops.

KGTOSA is the HGNN sampler utilized by KGNet system (Published at ICDE2023).

Installation

  • Clone the KGTOSA repo
  • Create KGTOSA Conda environment (Python 3.8) and install pip requirements.
  • Activate the KGTOSA environment
conda activate KGTOSA

KGTOSA and Full-graph Datasets

These datasets are extracted from the knoweldge graph using SPARQL Queries and transformed into PYG dataloader format. The d1h1 datasets are extrated using the KGTOSA Algo.3 (here).

Download the ready datasets below

Download KGTOSA NC datasets

  • MAG_42M_PV_FG
  • MAG_42M_PV_d1h1
  • DBLP-15M_PV_FG
  • DBLP-15M_PV_d1h1
  • YAGO4-30M_PC_FG
  • YAGO4-30M_PC_d1h1
  • Download KGTOSA LP datasets

  • YAGO3-10_FG_d2h1
  • WikiKG2_FG_d2h1
  • DBLP2023-010305_FG_d2h1
  • OR

    Extract and Transform the dataset triples:

    1. Node Classification
    python -u TOSG_Extraction/TOSG_Extraction_NC.py --sparql_endpoint http://206.12.98.118:8890/sparql --graph_uri http://dblp.org --target_rel_uri https://dblp.org/rdf/schema#publishedIn --TOSG d1h1 --batch_size 1000000 --out_file DBLP-15M_PV --threads_count 32  
    1. Link Prediction
    python -u TOSG_Extraction/TOSG_Extraction_LP.py --target_rel_uri=isConnectedTo --data_path=<path> --dataset=YAGO3-10 --TOSG=d1h1 --file_sep=tab

    Transform NC TOSG dataset into PYG dataset

    python -u DatasetTransformer/TSV_TO_PYG_dataset.py --traget_node_type=Paper --target_rel=publishedIn --csv_path=<path> --dataset_name=DBLP-15M_PV_d1h1 --file_sep=tab --split_rel=publish_year 

    Train your Model:

    1. Node Classification
    # run RGCN  
    python rgcn-KGTOSA.py --Dataset <DatasetPath>
    # run GraphSaint  
    python graph_saint_KGTOSA.py --Dataset <DatasetPath>
    # run ShaDowSaint  
    python graph_saint_Shadow_KGTOSA.py --Dataset <DatasetPath>
    # run SeHGNN  
    python SeHGNN/ogbn/main.py --Dataset <DatasetPath>
    # run IBS
    python  IBS/run_ogbn_ppr.py --with config/<Config_path>  
    1. Link Prediction
      extract the dataset folder under the data folder under each method path
    # run RGCN  
    python RGCN/main.py --Dataset <DatasetName> --TargetRel <target_rel>
    # run MorsE  
    python Morse/main.py --dataset <DatasetName> --TargetRel <target_rel
    # run LHGNN  
    python LHGNN/main.py --dataset <DatasetName> --TargetRel <target_rel

    Citing Our Work

    If you find our work useful, please cite it in your research:

    @article{KGTOSA,
      title={Task-Oriented GNNs Training on Large Knowledge Graphs for Accurate and Efficient Modeling},
      author={Abdallah, Hussein and Afandi, Waleed and Kalnis, Panos and Mansour, Essam},
      booktitle={2024 IEEE 40th International Conference on Data Engineering (ICDE)}, 
      year={2024},
    }

    kgtosa's People

    Contributors

    hussien avatar mansoure avatar

    Watchers

     avatar  avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.