Giter Club home page Giter Club logo

benedekrozemberczki / feather Goto Github PK

View Code? Open in Web Editor NEW
43.0 3.0 12.0 14.6 MB

The reference implementation of FEATHER from the CIKM '20 paper "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models".

License: MIT License

Python 100.00%
node-embedding graph-embedding network-embedding deep-learning node-classification graph-classification machine-learning data-mining networkx graph-convolution

feather's Introduction

FEATHER

Arxiv codebeat badge repo size benedekrozemberczki

The Python reference implementation of FEATHER and FEATHER-G from the CIKM '20 paper Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models.


Abstract

In this paper, we propose a flexible notion of characteristic functions defined on graph vertices to describe the distribution of vertex features at multiple scales. We introduce FEATHER, a computationally efficient algorithm to calculate a specific variant of these characteristic functions where the probability weights of the characteristic function are defined as the transition probabilities of random walks. We argue that features extracted by this procedure are useful for node level machine learning tasks. We discuss the pooling of these node representations, resulting in compact descriptors of graphs that can serve as features for graph classification algorithms. We analytically prove that FEATHER describes isomorphic graphs with the same representation and exhibits robustness to data corruption. Using the node feature characteristic functions we define parametric models where evaluation points of the functions are learned parameters of supervised classifiers. Experiments on real world large datasets show that our proposed algorithm creates high quality representations, performs transfer learning efficiently, exhibits robustness to hyperparameter changes, and scales linearly with the input size.

This repository provides the reference implementation for FEATHER as described in the paper:

Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. Benedek Rozemberczki and Rik Sarkar. CIKM, 2020.

The datasets are also available on SNAP.

The model is now also available in the package Karate Club.

Table of Contents

  1. Citing
  2. Requirements
  3. Options
  4. Examples

Citing

If you find FEATHER useful in your research, please consider citing the following paper:

@inproceedings{feather,
               title={{Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models}},
               author={Benedek Rozemberczki and Rik Sarkar},
               year={2020},
	       pages = {1325–1334},
	       booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20)},
	       organization={ACM},
}

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0

Input

Node level

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0.

The feature matrix is dense it is assumed that it is stored as csv with comma separators. It has a header, and rows are separated by node identifiers (increasing). It should look like this:

Feature 1 Feature 2 Feature 3 Feature 4
3 0 1.37 1
1 1 2.54 -11
2 0 1.08 -12
1 1 1.22 -4
... ... ... ...
5 0 2.47 21

Graph level

The graphs are stored in a JSON file where keys are graph identifiers and values are edge lists. Graph identifiers are consecutive and start with 0. Each individual graph has nodes which are indexed starting with 0. We assume that graphs are connected.

{ 0: [[0, 1], [1, 2], [2, 3]],
  1: [[0, 1], [1, 2], [2, 0]],
  ...
  n: [[0, 1], [1, 2]]}

Options

Learning the embedding is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --graph-input      STR   Input edge list csv.      Default is `input/edges/ER_edges.csv`.
  --feature-input    STR   Input features csv.       Default is `input/features/ER_features.csv`.
  --graphs           STR   Input graphs json.        Default is `input/graphs/ER_graphs.json`.
  --output           STR   Embedding output path.    Default is `output/ER_node_embedding.csv`.

Model options

  --model-type    STR      FEATHER or FEATHER-G model.          Default is  `FEATHER`.
  --eval-points   INT      Number of evaluation points.         Default is  25.
  --order         INT      Matrix powers approximated.          Default is  5.
  --theta-max     FLOAT    Length of random walk per source.    Default is  2.5.

Examples

Training a FEATHER model.

$ python src/main.py

Changing the scale parameter to increase adjacency matrix powers.

$ python src/main.py --order 3

Decreasing the number of evaluation points.

$ python src/main.py --eval-points 25

Training a graph level FEATHER model with the default dataset.

$ python src/main.py --model-type FEATHER-G --output output/ER_graph_embedding.csv

License


feather's People

Contributors

benedekrozemberczki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

feather's Issues

Question about the LastfmAsia and DeezerEurope datasets

I have a question about the LastfmAsia and DeezerEurope datasets of your CIKM 2020 paper, which I found on SNAP. These datasets are provided with node features which are “extracted based on the artists liked by the users”. Does this mean that each number in the vector associated to each node corresponds to the id of an artist that the user node liked? Or is the vector a more abstract embedding? I am referring to the file lastfm_asia_features.json and deezer_europe_features.json.

Thanks in advance!

FEATHER model

Hello, Is the FEATHER model suitable for directed graph embedding or other methods ?

Thanks in advance.

Heterogeneous graphs with multiple node types

Hello, I am currently looking for a whole graph embedding method and discovered FEATHER. The implementation of FEATHER in the karateclub package does technically work on my graphs, but I am not sure it is correctly applied.

My graphs have multiple node types, and each node type has a different set of features. In your paper, you only explain the procedure for one type of node with one set of features. Does FEATHER adapt to the different node types? Do the characteristic functions only take nodes of the same type into account?

A quick answer would be immensely helpful for me, thank you in advance.

Deezer

How to predict the gender of users?

Number of embedding size

Hi there, thank you for this great work!

I am wondering if there is a way to control the number of embedding size. From the sample output data I can see that the size of embedding is 250. I am wondering if I can change it to 16, for example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.