Giter Club home page Giter Club logo

grarep's Introduction

GraRep

PWC codebeat badge repo sizeโ €benedekrozemberczkiโ €

A SciPy implementation of GraRep: Learning Graph Representations with Global Structural Information (WWW 2015).

Abstract

In this paper, we present GraRep, a novel model for learning vertex representations of weighted graphs. This model learns low dimensional vectors to represent vertices appearing in a graph and, unlike existing work, integrates global structural information of the graph into the learning process. We also formally analyze the connections between our work and several previous research efforts, including the DeepWalk model of Perozzi et al. as well as the skip-gram model with negative sampling of Mikolov et al. We conduct experiments on a language network, a social network as well as a citation network and show that our learned global representations can be effectively used as features in tasks such as clustering, classification and visualization. Empirical results demonstrate that our representation significantly outperforms other state-of-the-art methods in such tasks.

The model is now also available in the package Karate Club.

This repository provides a SciPy implementation of GraRep as described in the paper:

GraRep: Learning Graph Representations with Global Structural Information. ShaoSheng Cao, Wei Lu, and Qiongkai Xu. WWW, 2015. [Paper]

MatLab and Julia implementations are available [here] and [respectively here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
scikit-learn      0.20.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/edges/` directory.

Outputs

The embedding is saved in the `output/` directory. Each embedding has a header and a column with the node identifiers. Finally, the embedding is sorted by the identifier column.

Options

Training a model is handled by the src/main.py script which provides the following command line arguments.

  --edge-path       STR     Edge list csv.                         Default is `input/edges/cora.csv`.
  --output-path     STR     Output embedding csv.                  Default is `output/cora_grarep.csv`.
  --dimensions      INT     Number of dimensions per embedding.    Default is 16.
  --order           INT     Number of adjacency matrix powers.     Default is 5.  
  --iterations      INT     SVD iterations.                        Default is 20.
  --seed            INT     Random seed.                           Default is 42.

Examples

The following commands learn a model and save the embedding. Training a model on the default dataset:

$ python src/main.py

Training a GraRep model with higher dimension size.

$ python src/main.py --dimensions 32

Changing the batch size.

$ python src/main.py --order 3

License

grarep's People

Contributors

benedekrozemberczki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.