Giter Club home page Giter Club logo

diffbot-graph-learning's Introduction

Diffbot Graph Learning

Use extracted company and people entities from Diffbot to build a heterogenous subgraph of Diffbot's Knowledge Graph.

Diffbot Entity Extraction

For querying the diffbot api and downloading the entity respoinses this repo includes two options:

  1. BFS given initial diffbot uri starting entities. This uses Diffbot's knowledge graph API.
  2. Diffbot enhance API which matches the entity in Diffbot's knowledge graph given a name and/or url.

The implementation relies on Python's asyncio for quickly sending requests and saving responses.

To see how to use the BFS extraction and enhance api scripts use the -h option

$ python main_extract_entities_bfs.py -h
$ python main_extract_entities_enhance_api.py -h

A demo example for each of these extraction methods can be ran as follows. An api key in each of the yaml files needs to be specified before running.

$ python main_extract_entities_bfs.py --config_file ./demo/extract_entities_bfs.yaml
$ python main_extract_entities_enhance_api.py --config_file ./demo/extract_entities_enhance.yaml 

The keys and values in the yaml config files can also be passed directly as command line arguments.

Building Graph from Diffbot Downloaded Entities

Once we download the entities from diffbot we can build a graph from the saved jsons. We can save this as a gexf file. Building on the bfs diffbot entity extraction demo we can

$ python main_build_gexf_graph.py --config_file ./demo/build_gexf_graph.yaml

We can change the node_filter method to build different graphs.

Running Heterogenous Graph Representation Models

This repo includes the Deep Graph Library implementation of the following two models

  1. Heterogenous Relational Graph Convolutional Network (HRGCN). This model builds on RGCN from Schlichtkrull et al.: Modeling Relational Data with Graph Convolutional Networks (ESWC 2018) to handle heterogenous node types.
  2. Heterogenous Graph Attention Network (HAN) from Wang et al.: Heterogeneous Graph Attention Network (WWW 2019)

There is also a MLP module on node features to run sanity checks on.

Custom Diffbot Dataset

For your own dataset created from the bfs extraction and build graph scripts, make a folder with the name of the dataset in data/raw. Inside the folder place the gexf file built from main_build_gexf_graph.py and name it graph.gexf.

Included Example

The repo includes a demo graph dataset in the data/raw/top_100_VCs_BFS_20000_LCC. This is a graph built using main_extract_entities_bfs.py and main_build_gexf_graph.py with the top 100 venture capital investors and firms in 2019 as the starting seed nodes for Breadth First Search and taking the lagest connected component.

Alt text

From this graph we can for example, do node classification on the Diffbot categories of each organization.

Some examples of categories are Software Companies, Financial Services Companies, and Software As A Service Companies.

Running Experiment

Run python main.py -h to see the hyperparameter and experiment configurations. To quickly run an example

$ python main.py ./demo/example_train_config_hrgcn.json

There are also train config examples for HAN and MLP.

Some results for the demo graph are as follows for 5 fold cross validation.

Diffbot Category Num Positive Num Negative
Software Companies 5140 6124
Financial Services Companies 2014 9250
Software As A Service Companies 685 10579
Model Diffbot Category F1 ROC AUC PR AUC
HAN Software Companies 0.683 0.752 0.710
HRGCN Software Companies 0.695 0.762 0.714
MLP Software Companies 0.644 0.623 0.546
****
HAN Financial Services Companies 0.471 0.741 0.461
HRGCN Financial Services Companies 0.487 0.745 0.462
MLP Financial Services Companies 0.332 0.591 0.227
****
HAN Software As A Service Companies 0.181 0.668 0.098
HRGCN Software As A Service Companies 0.234 0.696 0.148
MLP Software As A Service Companies 0.163 0.628 0.099

diffbot-graph-learning's People

Contributors

codekgu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.