Giter Club home page Giter Club logo

ptgnn's Introduction

ptgnn: A PyTorch GNN Library PyPI conda

This is a library containing pyTorch code for creating graph neural network (GNN) models. The library provides some sample implementations.

If you are interested in using this library, please read about its architecture and how to define GNN models or follow this tutorial.

Note that ptgnn takes care of defining the whole pipeline, including data wrangling tasks, such as data loading and tensorization. It also defines PyTorch nn.Modules for the neural network operations. These are independent of the AbstractNeuralModels and can be used as all other PyTorch's nn.Modules, if one wishes to do so.

The library is mainly engineered to be fast for sparse graphs. For example, for the Graph2Class task (discussed below) on a V100 with the default hyperparameters and architecture ptgnn can process about 82 graphs/sec (209k nodes/sec and 1,129k edges/sec) during training and about 200 graph/sec (470k nodes/sec and 2,527k edges/sec) during testing.

Implemented Tasks

All task implementations can be found in the ptgnn.implementations package. Detailed instructions on the data and the training steps can be found here. We welcome external contributions. The following GNN-based tasks are implemented:

The tutorial gives a step-by-step example for coding the Graph2Class model.

Installation

This code was tested with PyTorch 1.4 and depends on pytorch-scatter. Please install the appropriate versions of these libraries based on your CUDA setup following their instructions. (Note that the pytorch-scatter binaries built for CUDA 10.1 also work for CUDA 10.2).

  1. To install PyTorch 1.7.0 or higher, use the up-to-date command from PyTorch Get Started, selecting the appropriate options, e.g. for Linux, pip, and CUDA 10.1 it's currently:

    pip install torch torchvision
  2. To install pytorch-scatter, follow the instructions from the GitHub repo, choosing the appropriate CUDA option, e.g., for PyTorch 1.7.0 and CUDA 10.1:

    pip install torch-scatter==2.0.5+cu101 -f https://pytorch-geometric.com/whl/torch-1.7.0.html
  3. To install ptgnn from pypi, including all other dependencies:

    pip install ptgnn

    If you want to use ptgnn sampels with Azure ML (e.g. the --aml flag in the implementation CLIs), install with

     pip install ptgnn[aml]

    or directly from the sources, cd into the root directory of the project and run

    pip install -e .

    To check that installation was successful and run the unit tests:

    python setup.py test
  4. To install ptgnn from conda, including all other dependencies:

    conda search ptgnn --channel conda-forge

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Developing ptgnn

Unit Tests Total alerts Code style: black

To contribute to this library, first follow the next steps to setup your development environment:

  • Install the library requirements.
  • Install the pre-commit hooks:
    • Run pip3 install pre-commit
    • Install the hooks pre-commit install
Using Conda

If you are using conda, then download the correct torch-scatter wheel. If using torch==1.7.0 and Python 3.7, you can use the environment.yml included in the repo, with the following steps:

$ conda env create -f environment.yml
$ conda activate ptgnn-env
$ pip install torch_scatter-2.0.5+cu102-cp37-cp37m-linux_x86_64.whl
$ pip install -e .
$ python setup.py test
$ pip install pre-commit
$ pre-commit install
Releasing to PyPi

To create a PyPi release push a tag in the form v1.3.4 in this repository (make sure that you follow semantic versioning). The Publish on PyPi GitHub action will automatically upload a new release.

ptgnn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ptgnn's Issues

Cannot get high acc when use create_ggnn_mp_layers.

Hello,

Thanks for this great repo and I use it to play with varmisuse, but I have some questions about it. In the implementation, it seems that two different layers provided, create_mlp_mp_layers (see

def create_mlp_mp_layers(num_edges: int):
)
and create_ggnn_mp_layers (see
def create_ggnn_mp_layers(num_edges: int):
)
and in GGNN model, it invokes create_mlp_mp_layers (see
message_passing_layer_creator=create_mlp_mp_layers,
)
to build the model, however according to my understanding, mlp layers are just fully-connected layers without message passing for graph learning, so I replace it with create_ggnn_mp_layers function for learning. But the results are not promising, with only 72.50 test accuracy on the same split from #1. Furthermore, mlp layers provide much higher accuracy, 81.13 test accuracy. It seems there is something wrong, but I cannot figure it out.

Best wishes.

Add conda recipe

Describe the new feature:

It would be great to have a conda recipe so that it can be included with projects that have more complicated build processes (for example, using libraries that need C/C++ compilers).

What is the current outcome?

Have a recipe for ptgnn on the conda-forge channel.

Is it backward-compatible?

Yes, and it would be forward-compatible as well, because the conda bots can automatically fetch new sdists pushed to pypi.

How to obtain the raw function

Hi Miltos,

Thanks for this great project. When I play with VarMisuse task on the released data at https://www.microsoft.com/en-us/download/details.aspx?id=56844, I face one problem.
I want to get the original function based on the graphs, however it seems that the functions are not released. So I tried to parse the built graph to restore it based on the NextToken edge but it still failed. It seems that the entrance node for the graph or the index 1 node is not the beginning of the function and the next token can not string up a completed function, see the following
image
The filename of this sample is 'test\Nancy.Tests\Unit\Bootstrapper\Base\BootstrapperBaseFixtureBase.cs'
So may I ask for some advice about how to get the original functions?

Thanks

Training the model for varmisuse task

Hey! I tried to run training of the varmisuse model in order to explore how it works on data from unseen projects. I have a few questions regarding it:

  1. Seems like the dataset format has changed compared to the published version of data. I've found the following issue in another repository. Unfortunately, I had already reorganized data before finding the issue: converted json files into jsonlines and changed structure from project/{train|test|valid}/files to {train|test|valid}/files. It would be nice to either duplicate the reorganizing script to this repo, or add a link to the issue in README.
  2. After reorganizing the data, I tried to run training with default settings (minibatch size = 300) on an instance with 94 GB RAM and 48 CPUs. The instance doesn't have GPU because I wanted to measure the memory usage so that I can allocate a proper GPU instance afterward. Unfortunately, training fails with OOM error, because it quickly utilizes 94 GB and asks for more. Moreover, I've tried to create a smaller version of the dataset by picking only 1 project from train/validation/test, and it didn't really help: with a minibatch size of 100 and a single project in train part I still got OOM. Is it expected behavior?
  3. Which instance do you recommend for training the model? In particular, how much RAM do I need and how long does the training take on, let's say, V100?
  4. Do you have a pre-trained model that you can share? Maybe I can avoid the training at all and just run the already trained model on different data.

Thanks a lot in advance and thanks for great projects and papers!

Cannot run on the varmisuse task

Hi,

Thanks for this wonderful work, it is really helpful for others. But when I test for the varmisuse task, it cannot run correctly. Even in the first step. May I ask for help?
image

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.