tata1661 / par-neurips21 Goto Github PK

Codes for "Property-Aware Relation Networks for Few-shot Molecular Property Prediction (NeurIPS 2021)".

Python 99.71% Shell 0.29%

few-shot-learning molecular-property meta-learning one-shot-learning molecular-representation-learning graph-structure-learning drug-discovery

par-neurips21's Introduction

This is the PyTorch implementation of "Property-Aware Relation Networks (PAR) for Few-Shot Molecular Property Prediction (spotlight)" published in NeurIPS 2021 as a spotlight paper. The PaddlePaddle implementation is a part of PaddleHelix, which can be reached here.

Please cite our paper if you find it helpful. Thanks.

@InProceedings{wang2021property,
  title={Property-Aware Relation Networks for Few-Shot Molecular Property Prediction},
  author={Wang, Yaqing and Abuduweili, Abulikemu and Yao, Quanming and Dou, Dejing},
  booktitle = {Advances in Neural Information Processing Systems},
  year={2021},
}

Environment

We used the following Python packages for core development. We tested on Python 3.7.

- pytorch 1.7.0
- torch-geometric 1.7.0

Datasets

Tox21, SIDER, MUV and ToxCast are previously downloaded from SNAP. You can download the data here, unzip the file and put the resultant ``muv, sider, tox21, and toxcast" in the data folder.

Experiments

To run the experiments, use the command (please check and tune the hyper-parameters in parser.py:

python main.py

If you want to quickly run PAR method on tox21 dataset, please use the command:

bash script_train.sh

par-neurips21's People

Contributors

Stargazers

Watchers

Forkers

nnguyen19 chang111 rnaimehaom wangjx22 zhwl2117 felixzzzxy

par-neurips21's Issues

CUDA out of memory

Hi, thank you for uploading your code it looks very interesting !

I am unfortunately facing a memory issue with CUDA when training a model on a GPU.

By tracking the evolution of allocated memory with torch.cuda.memory_allocated() I observed that it was the method test_step from the Trainer that was responsible for the memory leak. It is confirmed because I don't face any issue when I comment the line best_avg_auc = trainer.test_step() in main.py to disable the testing phase.

I am joining the logs of the run of the original script_train.sh on Tox21 dataset (nohup_tox21tox21-par_s10q16.txt). I just added a print statement to track the allocated CUDA memory along the training epochs.

Thank you in advance for your help !

PaddlePaddle implementation

When will the PaddlePaddle implementation be available? The code here is a bit difficult to read.

Questions about the update_s_q parameter

Hi, thanks for releasing this elegant code. But I couldn't understand the update_s_q parameter a little bit. By reading the code I find that If it is True, in inner loop model is updated on support set and query set, and in outer loop model is updated on another support set and query set. But why not update model on support set in inner loop and on query set in outer loop? I don’t understand this point, and I don’t know if I understand it wrong.

Tasks in Toxcast

Hi, I noticed that there is a variable toxcast_drop_tasks in the code, which will cause some training tasks and testing tasks to be eliminated in this dataset, but this does not match the description in the paper (number of training and testing tasks).
May I ask what kind of settings are used to get the results given in the paper?

When will the code be uploaded?

Reproducibility

Hi, we cannot reproduce the result you mention in your paper using your code. There is a huge gap between the output by this code and the result in the paper. Could you provide the setting that can reproduce the results? Thank you!

Adaptation for Single-Task Prediction on Molecular Data

Hello PAR-NeurIPS21 Team,

I'm currently exploring the capabilities of your framework for a project involving molecular data sourced from PubChem. Specifically, my dataset includes 26,564 SMILES strings, each associated with IC50 values for approximately 10 different kinases. The IC50 values have been standardized within a range of [0-1]. Notably, the majority of the molecules in the dataset typically have IC50 values for only one kinase.

Given the structure of this dataset, I am interested in adapting your multi-task learning framework for a single-task prediction setup. Here are a few questions and considerations I have:

Data Compatibility: Can the current MoleculeDataset class be directly used for datasets structured as described above, or would it require modifications to handle the specifics of IC50 data and missing values efficiently?
Model Adjustment: What changes would be recommended to adjust the model from handling multiple tasks to focusing solely on predicting the IC50 value for a specific kinase?
Training and Evaluation: Could you provide guidance on simplifying the training loops and evaluation metrics to suit a single-task learning scenario? Are there specific parts of the Meta_Trainer class that should be modified or removed?
Loss Function and Metrics: What loss function and performance metrics would you recommend for a regression task focused on IC50 value prediction?

I appreciate any insights or suggestions you could provide to help adapt this framework to my needs. Thank you for your support and for the development of this interesting project.

Best regards,

Hello, can you upload the code please, thank you very much.

Tox21数据集

大家好，请问Tox21的数据集有达到论文中的精度嘛，参数是怎么设置的