Giter Club home page Giter Club logo

graphretro's Introduction

Learning Graph Models for Retrosynthesis Prediction

(Under Construction and Subject to Change)

This is the official PyTorch implementation for GraphRetro (Somnath et al. 2021), a graph based model for one-step retrosynthesis prediction. Our model achieves the transformation from products to reactants using a two stage decomposition:

graph-retro-overview

a) Edit Prediction: Identifies edits given a product molecule, which upon application give intermediate molecules called synthons
b) Synthon Completion: Completes synthons into reactants by adding subgraphs called leaving groups from a precomputed vocabulary.

Setup

This assumes conda is installed on your system
If conda is not installed, download the Miniconda installer. If conda is installed, run the following commands:

echo 'export SEQ_GRAPH_RETRO=/path/to/dir/' >> ~/.bashrc
source ~/.bashrc

conda env create -f environment.yml
source activate seq_gr
python setup.py develop(or install)

Datasets

The original and canonicalized files are provided under datasets/uspto-50k/. Please make sure to move them to $SEQ_GRAPH_RETRO/ before use.

Input Preparation

Before preparing inputs, we canonicalize the products. This can be done by running,

python data_process/canonicalize_prod.py --filename train.csv
python data_process/canonicalize_prod.py --filename eval.csv
python data_process/canonicalize_prod.py --filename test.csv

This step can also be skipped if the canonicalized files are already present. The preprocessing steps now directly work with the canonicalized files.

1. Reaction Info preparation

python data_process/parse_info.py --mode train
python data_process/parse_info.py --mode eval
python data_process/parse_info.py --mode test

2. Prepare batches for Edit Prediction

python data_process/core_edits/bond_edits.py

3. Prepare batches for Synthon Completion

python data_process/lg_edits/lg_classifier.py
python data_process/lg_edits/lg_tensors.py

Run a Model

Trained models are stored in experiments/. You can override this by adjusting --exp_dir before training. Model configurations are stored in config/MODEL_NAME where MODEL_NAME is one of {single_edit, lg_ind}.

To run a model,

python scripts/benchmarks/run_model.py --config_file configs/MODEL_NAME/defaults.yaml

NOTE: We recently updated the code to use wandb for experiment tracking. You would need to setup wandb before being able to train a model.

Evaluate using a Trained Model

To evaluate the trained model, run

python scripts/eval/single_edit_lg.py --edits_exp EDITS_EXP --edits_step EDITS_STEP \
                                      --lg_exp LG_EXP --lg_step LG_STEP

This will setup a model with the edit prediction module loaded from experiment EDITS_EXP and checkpoint EDITS_STEP
and the synthon completion module loaded from experiment LG_EXP and checkpoint LG_STEP.

Reproducing our results

To reproduce our results, please run the command,

./eval.sh

This will display the results for reaction class unknown and known setting.

License

This project is licensed under the MIT-License. Please see LICENSE.md for more details.

Reference

If you find our code useful for your work, please cite our paper:

@inproceedings{
somnath2021learning,
title={Learning Graph Models for Retrosynthesis Prediction},
author={Vignesh Ram Somnath and Charlotte Bunne and Connor W. Coley and Andreas Krause and Regina Barzilay},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=SnONpXZ_uQ_}
}

Contact

If you have any questions about the code, or want to report a bug, please raise a GitHub issue.

graphretro's People

Contributors

hjeriksen avatar vsomnath avatar vthost avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

graphretro's Issues

Code enhancement

Hi,
At the "Limitations" part of the paper, "We leave it to future work to extend the model to realize a single reactant from multiple synthons". Is there any publication or code for this enhancement work to refer to? Thanks a lot.

Some Questions about Graph Edit prediction

According to the paper, the graph edit can be classified into two types,

  • the number of H of atoms changes
  • the bond type changes

In the paper, the edit prediction is solved by a classifier which outputs whether a bond change its type or whether a atom change it number of H. But how can we know which kind of change is done exactly (for example, how can we know that when a bond type changes, it's disconnected or it changes from a triple bond into a single bond)

Reactant generation from product

Hi! Is there a script to generate reactants from products? E.g. if I have a list of products, how can I generate the reactants from it? Thanks!

Some Questions about amap

Could you tell me how do you determine the map number of each atom(like which one should be the first and which on should be the last)? The provided data seems to have different order of atom map number compared with other baselines using USPTO-50K for training like https://github.com/uta-smile/RetroXpert. I want to know if there is any informaction leak about the atom map number if i use the provided atom map order to train model like transformer, which is not permutation-invarient about the given atoms

missing functions in atom attention layer

Good afternoon, I would like to try using the attention layer, but I noticed that the implementation of several functions in AtomAttention module is missing, such as create_scope_tensor, flat_to_batch, get_pair. Could you please add it or tell a little more about what they should do?

multi edit product problem

Hi,
Considering this multi edit product and it's corresponding reactants:
C1=CC=C(N([H])C2=CC=CC=C2)C=C1.C1=CC=C(N([H])C2=CC=CC=C2)C=C1.C(Br)1=C([H])C([H])=C(C2=C([H])C([H])=C(Br)C([H])=C2[H])C([H])=C1[H]>>C1=CC=C(N(C2=CC=C(C3=CC=C(N(C4C=CC=CC=4)C4=CC=CC=C4)C=C3)C=C2)C2C=CC=CC=2)C=C1
A656EFFC-2CED-4b5b-95AE-65A40DA7DDA7
The code and models this repo provided can not generate the reactants above, what can I do to generate this reactants. Thanks very much for your advice.

Some Questions about training

There are two different configs in the folds, one is for the Edit Prediction, what about another one? And two different settings are given in the article: Reaction class known and Reaction class unknown, how can I train the model under different settings

Duplicated atom mapping after "canonicalize_prod.py"

i followed the readme and setup the environment successfully, and i wanted to apply the code to my own dataset. However some errors occured for the preprocessing process.

One of my reactions with the atom mapping is
"[F:1][c:2]1c:3[cH:4]c:5c:6[cH:7]1.[NH2:13][c:14]1[s:15][cH:16][cH:17][c:18]1[C:19]#[N:20]>>[c:2]1([NH:13][c:14]2[s:15][cH:16][cH:17][c:18]2[C:19]#[N:20])c:3[cH:4]c:5c:6[cH:7]1".
After running "canonicalize_prod.py", the remapped reaction is "[F:1][c:9]1[cH:10]c:11c:13[cH:15][c:16]1N+:17[O-:19].[N:1]#[C:2][c:3]1[cH:4][cH:5][s:6][c:7]1[NH2:8]>>[N:1]#[C:2][c:3]1[cH:4][cH:5][s:6][c:7]1[NH:8][c:9]1[cH:10]c:11c:13[cH:15][c:16]1N+:17[O-:19]", where there are two atoms with the same atom mapping in the reactants.
And then i inputted it to the "parse_info,py", it was failed to be extract reaction info. I guess it was caused by the duplicated atom mapping.

There are a lot of similar errors in my own dataset, and if i skip the "canonicalize_prod.py", most of them could be processed normally by "parse_info.py".

could you fix this error? or could i skip the "canonicalize_prod.py" without affecting the performance?

Any help would be greatly appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.