Giter Club home page Giter Club logo

denoise-pretrain-ml-potential's Introduction

Denoise Pretraining for ML Potentials

Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials
Journal of Chemical Theory and Computation [Paper] [arXiv] [PDF]
Yuyang Wang, Changwen Xu, Zijie Li, Amir Barati Farimani
Carnegie Mellon University

This is the official implementation of "Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials". In this work, we propose denoise pretraining on non-equilibrium molecular conformations to achieve more accurate and transferable potential predictions with invariant and equivariant graph neural networks (GNNs). If you find our work useful in your research, please cite:

@article{wang2023denoise,
  title={Denoise Pre-training on Non-equilibrium Molecules for Accurate and Transferable Neural Potentials},
  author={Wang, Yuyang and Xu, Changwen and Li, Zijie and Barati Farimani, Amir},
  journal={Journal of Chemical Theory and Computation},
  doi={10.1021/acs.jctc.3c00289},
  year={2023}
}

Getting Started

  1. Installation
  2. Dataset
  3. Pre-training
  4. Fine-tuning
  5. Pre-trained models

Installation

Set up a conda environment and clone the github repo

# create a new environment
$ conda create --name ml_potential python=3.8
$ conda activate ml_potential

# install requirements
$ conda install pytorch==1.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
$ conda install pyg -c pyg
$ conda install -c dglteam/label/cu116 dgl
$ conda install -c conda-forge tensorboard openmm
$ pip install PyYAML rdkit ase
$ pip install git+https://github.com/AMLab-Amsterdam/lie_learn

# clone the source code
$ git clone https://github.com/yuyangw/Denoise-Pretrain-ML-Potential.git
$ cd Denoise-Pretrain-ML-Potential

Dataset

The datasets used in the work are summarized in the following table, including the link to download, number of molecules, number of conformations, number of elements, number of atoms per molecule, molecule types, and whether each dataset is used for pre-training (PT) and fine-tuning (FT). GNNs are pre-trained on the combination of ANI-1 and ANI-1x, and fine-tuned on each dataset separately.

Dataset Link # Mol. # Conf. # Ele. # Atoms Molecule types Usage
ANI-1 [link] 57,462 24,687,809 4 2~26 Small molecules PT & FT
ANI-1x [link] 63,865 5,496,771 4 2~63 Small molecules PT & FT
ISO17 [link] 129 645,000 3 19 Isomers of C7O2H10 FT
MD22 [link] 7 223,422 4 42~370 Proteins, lipids, carbohydrates, nucleic acids, supramolecules FT
SPICE [link] 19,238 1,132,808 15 3~50 Small molecules, dimers, dipeptides, solvated amino acids FT

Pre-training

To pre-train the invariant or equivariant GNNs, where the configurations and detailed explaination for each variable can be found in config_pretrain.yaml

$ python pretrain.py

To monitor the training via tensorboard, run tensorboard --logdir {PATH} and click the URL http://127.0.0.1:6006/.

Fine-tuning

To fine-tune the pre-trained GNN models on molecular potential predictions, where the configurations and detailed explaination for each variable can be found in config.yaml

$ python train.py

Pre-trained models

We also provide pre-trained checkpoint model.pth and the configuration config_pretrain.yaml for each model, which can be found in the ckpt folder. Pre-trained models include:

  • Pre-trained SchNet in ckpt/schnet folder
  • Pre-trained SE(3)-Transformer in ckpt/se3transformer folder
  • Pre-trained EGNN in ckpt/egnn folder
  • Pre-trained TorchMD-Net in ckpt/torchmdnet folder

Acknowledgement

The implementation of GNNs in this work is based on:

denoise-pretrain-ml-potential's People

Contributors

changwenxu98 avatar yuyangw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

denoise-pretrain-ml-potential's Issues

Cannot reproduce errors on SPICE

Dear authors,

Thanks for your nice work! I tried to reproduce the 'MAE on SPICE / TorchMD-Net' from Figure 2 d in your preprint using your config.yaml and your provided checkpoint with torchmdnet. The only two parameters I change are batch_size=64 and num_workers=16.

In your preprint, the MAE is 9.8 or 9.0, without and with pretrain (50 epochs). But I got 17.26 and 18.50 in my two runs (10 epochs), which means the result got worse after loading the pretrained weights. What might be the problem? Besides, I wonder in your preprint, did you use a different config of torchmdnet?

Best,
Ming-an

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.