Giter Club home page Giter Club logo

par-neurips21's Introduction

This is the PyTorch implementation of "Property-Aware Relation Networks (PAR) for Few-Shot Molecular Property Prediction (spotlight)" published in NeurIPS 2021 as a spotlight paper. The PaddlePaddle implementation is a part of PaddleHelix, which can be reached here.

logo

Please cite our paper if you find it helpful. Thanks.

@InProceedings{wang2021property,
  title={Property-Aware Relation Networks for Few-Shot Molecular Property Prediction},
  author={Wang, Yaqing and Abuduweili, Abulikemu and Yao, Quanming and Dou, Dejing},
  booktitle = {Advances in Neural Information Processing Systems},
  year={2021},
}

Environment

We used the following Python packages for core development. We tested on Python 3.7.

- pytorch 1.7.0
- torch-geometric 1.7.0

Datasets

Tox21, SIDER, MUV and ToxCast are previously downloaded from SNAP. You can download the data here, unzip the file and put the resultant ``muv, sider, tox21, and toxcast" in the data folder.

Experiments

To run the experiments, use the command (please check and tune the hyper-parameters in parser.py:

python main.py

If you want to quickly run PAR method on tox21 dataset, please use the command:

bash script_train.sh

par-neurips21's People

Contributors

quanmingyao avatar tata1661 avatar zgs0314 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

par-neurips21's Issues

CUDA out of memory

Hi, thank you for uploading your code it looks very interesting !

I am unfortunately facing a memory issue with CUDA when training a model on a GPU.

By tracking the evolution of allocated memory with torch.cuda.memory_allocated() I observed that it was the method test_step from the Trainer that was responsible for the memory leak. It is confirmed because I don't face any issue when I comment the line best_avg_auc = trainer.test_step() in main.py to disable the testing phase.

I am joining the logs of the run of the original script_train.sh on Tox21 dataset (nohup_tox21tox21-par_s10q16.txt). I just added a print statement to track the allocated CUDA memory along the training epochs.

Thank you in advance for your help !

Questions about the update_s_q parameter

Hi, thanks for releasing this elegant code. But I couldn't understand the update_s_q parameter a little bit. By reading the code I find that If it is True, in inner loop model is updated on support set and query set, and in outer loop model is updated on another support set and query set. But why not update model on support set in inner loop and on query set in outer loop? I don’t understand this point, and I don’t know if I understand it wrong.

Tasks in Toxcast

Hi, I noticed that there is a variable toxcast_drop_tasks in the code, which will cause some training tasks and testing tasks to be eliminated in this dataset, but this does not match the description in the paper (number of training and testing tasks).
May I ask what kind of settings are used to get the results given in the paper?

Reproducibility

Hi, we cannot reproduce the result you mention in your paper using your code. There is a huge gap between the output by this code and the result in the paper. Could you provide the setting that can reproduce the results? Thank you!

Adaptation for Single-Task Prediction on Molecular Data

Hello PAR-NeurIPS21 Team,

I'm currently exploring the capabilities of your framework for a project involving molecular data sourced from PubChem. Specifically, my dataset includes 26,564 SMILES strings, each associated with IC50 values for approximately 10 different kinases. The IC50 values have been standardized within a range of [0-1]. Notably, the majority of the molecules in the dataset typically have IC50 values for only one kinase.

Given the structure of this dataset, I am interested in adapting your multi-task learning framework for a single-task prediction setup. Here are a few questions and considerations I have:

  1. Data Compatibility: Can the current MoleculeDataset class be directly used for datasets structured as described above, or would it require modifications to handle the specifics of IC50 data and missing values efficiently?
  2. Model Adjustment: What changes would be recommended to adjust the model from handling multiple tasks to focusing solely on predicting the IC50 value for a specific kinase?
  3. Training and Evaluation: Could you provide guidance on simplifying the training loops and evaluation metrics to suit a single-task learning scenario? Are there specific parts of the Meta_Trainer class that should be modified or removed?
  4. Loss Function and Metrics: What loss function and performance metrics would you recommend for a regression task focused on IC50 value prediction?

I appreciate any insights or suggestions you could provide to help adapt this framework to my needs. Thank you for your support and for the development of this interesting project.

Best regards,

Tox21数据集

大家好,请问Tox21的数据集有达到论文中的精度嘛,参数是怎么设置的

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.