Giter Club home page Giter Club logo

dragonfly_gen's Introduction

Prospective de novo drug design with deep interactome learning

python pytorch RDKit badge Code style: black MIT license DOI

This repository contains a reference implementation to preprocess the data, as well as to train and apply the de novo design models introduced in Kenneth Atz, Leandro Cotos, Clemens Isert, Maria Håkansson, Dorota Focht, Mattis Hilleke, David F. Nippa, Michael Iff, Jann Ledergerber, Carl C. G. Schiebroek, Valentina Romeo, Jan A. Hiss, Daniel Merk, Petra Schneider, Bernd Kuhn, Uwe Grether, & Gisbert Schneider, Nat. Commun., 15, 3408 (2024).

1. Environment

Create and activate the dragonfly environment.

cd envs/
conda env create -f environment.yml
conda activate dragonfly_gen
poetry install --no-root

Add the "dragonfly_gen path" as PYTHONPATH to your ~/.bashrc file.

export PYTHONPATH="${PYTHONPATH}:<YOUR_PATH>/dragonfly_gen/"

Source your ~/.bashrc.

source `~/.bashrc`
conda activate dragonfly_gen

Test your installation by running test_pyg.py.

python test_pyg.py 
>>> torch_geometric.__version__: 2.3.0
>>> torch.__version__: 1.13.1
>>> rdkit.__version__: 2022.09.5

2. Sampling from a binding site

To preprocess the binding site for a given protein stored as a PDB file and its ligand as an SDF file in the input/ directory, use the following commands:

cd genfromstructure/
ls input/
>>> 3g8i_ligand.sdf 3g8i_protein.pdb

Next, preprocess the files using preprocesspdb.py:

python preprocesspdb.py -pdb_file 3g8i_protein -mol_file 3g8i_ligand -pdb_key 3g8i
>>> Number of embedded atoms: 158 / 158
>>> Writing input/3g8i.h5

After preprocessing, apply Dragonfly using sampling.py.

-config 701 will sample molecules biased by the properties of the ligand in the SDF. Properties include molecular weight, rotatable bonds, hydrogen bond acceptors, hydrogen bond donors, polar surface area, and lipophilicity expressed as MolLogP.

-config 991 will sample molecules unbiased by the properties.

python sampling.py -config 701 -epoch 151 -T 0.5 -pdb 3g8i -num_mols 100
Sampling 100 molecules:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|  100/100 [00:06<00:00, 14.31it/s]
Number of valid, unique and novel molecules: 88
python sampling.py -config 991 -epoch 163 -T 0.5 -pdb 3g8i -num_mols 100
Sampling 100 molecules:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|  100/100 [00:06<00:00, 13.54it/s]
Number of valid, unique and novel molecules: 99

The generated molecules are saved in the output/ directory:

ls output/
3g8i.csv

For generating SELFIES run the following command.

-config 901 will sample molecules biased by the properties of the ligand in the SDF. Properties include molecular weight, rotatable bonds, hydrogen bond acceptors, hydrogen bond donors, polar surface area, and lipophilicity expressed as MolLogP.

python sampling.py -config 901 -epoch 194 -T 0.5 -pdb 3g8i -num_mols 100
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|  100/100 [00:08<00:00, 12.13it/s]
Number of valid, unique and novel molecules: 100

The generated molecules are saved in the output/ directory:

ls output/
3g8i.csv

3. Sampling from a template ligand

In the genfromligand/ directory, Dragonfly can be applied to generate molecules based on a template SMILES string, using the following command.

-config 603 will sample molecules biased by the properties of the ligand in the SMILES-string. Properties include molecular weight, rotatable bonds, hydrogen bond acceptors, hydrogen bond donors, polar surface area, and lipophilicity expressed as MolLogP.

-config 680 will sample molecules unbiased by the properties.

cd genfromligand/
python sampling.py -config 603 -epoch 305 -T 0.5 -smi_id rosiglitazone -smi "CN(CCOC1=CC=C(C=C1)CC2C(=O)NC(=O)S2)C3=CC=CC=N3" -num_mols 100
Here is your template SMILES: CN(CCOC1=CC=C(C=C1)CC2C(=O)NC(=O)S2)C3=CC=CC=N3
Sampling 100 molecules:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|  100/100 [00:05<00:00, 19.28it/s]
Number of valid, unique and novel molecules: 80
python sampling.py -config 680 -epoch 314 -T 0.5 -smi_id rosiglitazone -smi "CN(CCOC1=CC=C(C=C1)CC2C(=O)NC(=O)S2)C3=CC=CC=N3" -num_mols 100
Here is your template SMILES: CN(CCOC1=CC=C(C=C1)CC2C(=O)NC(=O)S2)C3=CC=CC=N3
Sampling 100 molecules:
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 18.81it/s]
Number of valid, unique and novel molecules: 95

The generated molecules are saved in the output/ directory:

ls output/
rosiglitazone.csv

For generating SELFIES run the following command.

-config 803 will sample molecules biased by the properties of the ligand in the SMILES-string. Properties include molecular weight, rotatable bonds, hydrogen bond acceptors, hydrogen bond donors, polar surface area, and lipophilicity expressed as MolLogP.

python sampling.py -config 803 -epoch 341 -T 0.5 -smi_id rosiglitazone -smi "CN(CCOC1=CC=C(C=C1)CC2C(=O)NC(=O)S2)C3=CC=CC=N3" -num_mols 100
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|  100/100 [00:08<00:00, 12.13it/s]
Number of valid, unique and novel molecules: 100

The generated molecules are saved in the output/ directory:

ls output/
rosiglitazone.csv

4. Rank generated molecules based on pharmacophore similarity to the template

To rank the generated molecules based on pharmacophore similarity to the template, navigate to ranking/qsar/ and use the following command:

cd ranking/qsar/
python cats_similarity_ranking.py -smi_file ../../genfromligand/output/rosiglitazone.csv -query "CN(CCOC1=CC=C(C=C1)CC2C(=O)NC(=O)S2)C3=CC=CC=N3"
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 631.86it/s]
The ranked molecules are stored here: ../../genfromligand/output/rosiglitazone_cats.csv

5. License

The software was developed at ETH Zurich and is licensed by the AGPL-3.0 license, i.e. described in LICENSE.

6. Citation

@article{atz2023deep,
  title={Prospective de novo drug design with deep interactome learning},
  author={Atz, Kenneth and Mu{\~n}oz, Leandro Cotos and Isert, Clemens and H{\aa}kansson, Maria and Focht, Dorota and Hilleke, Mattis and Nippa, David F and Iff, Michael and Ledergerber, Jann and Schiebroek, Carl CG and Grether, Uwe and Schneider, Gisbert and others},
  year={2024}
  journal      = {Nat. Commun.},
	publisher    = {Nature Publishing Group},
	volume       = 15,
	number       = 3408
}

dragonfly_gen's People

Stargazers

Mason Minot avatar Marco avatar  avatar Regina Ibragimova avatar Antonio M. Ferreira, Ph.D. avatar Andreas Scheck avatar  avatar  avatar Bipin Singh avatar  avatar Dhruva Rajwade avatar  avatar Thuan Phu NGUYEN-VO avatar  avatar Alex Müller avatar Leela S. Dodda avatar

Watchers

Kenneth Atz avatar modlab (Computer-Assisted Drug Design group ETH Zürich) avatar

dragonfly_gen's Issues

What does "GraphTranslation" do in the net.py?

In the "genfromstructure/net.py", row 552 says "from net_utils import GraphTranslation". I couldn't find any file named "net_utils" nor any known python library. Could you explain what that does to the pdb data?

No module named 'dragonfly_gen

I am writing to address an issue encountered during the execution of the sampling script following the successful installation of dragonfly_gen, as per the provided instructions.

Upon attempting to run the script using the following command:

python sampling.py -config 701 -epoch 151 -T 0.5 -pdb 3g8i -num_mols 100
An error is being thrown. This impediment is hindering further progress and necessitates investigation to rectify the situation.

Could you kindly provide guidance or assistance in resolving this error? Any insights or recommendations would be greatly appreciated.

Traceback (most recent call last):
File "sampling.py", line 25, in
from dragonfly_gen.drugtargetgraph.utils import (
ModuleNotFoundError: No module named 'dragonfly_gen'

undefined symbol: iJIT_NotifyEvent when running test

Hi,

I followed the installation instructions, and then get the following error when running the test script:

python test_pyg.py Traceback (most recent call last): File "test_pyg.py", line 5, in <module> import torch File "/home/evehom/miniconda3/envs/dragonfly_gen/lib/python3.8/site-packages/torch/__init__.py", line 218, in <module> from torch._C import * # noqa: F403 ImportError: /home/evehom/miniconda3/envs/dragonfly_gen/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

It seems that Dragonfly does not need GPU support, is this correct?

Please advise on how to proceed.

Thanks/Evert

pdb file format

It seems that the dictionary of atoms and atomids is hardcoded and does not encompass all atomtypes. Would you be please share the list of atom names that are appropriate, so that I may convert my pdb files to the correct atom types?

Sizes of tensors must match except in dimension 1

Hello Dragonfly Gen Team,

Thank you for creating such an impressive tool.

I successfully executed the preprocess and sampling code using the example file (3g8i). However, when I attempted to use the same code with my input file, I encountered an error. Could you please help me identify the issue? I've attached my input file for your reference.

Thank you for your assistance.

1ZUA.zip

(dragonfly_gen) hp@hp:~/Downloads/dragonfly_gen/genfromstructure$ python preprocesspdb.py -pdb_file 1ZUA_protein -mol_file 1ZUA_ligand -pdb_key 1ZUA
Number of embedded atoms: 257 / 156
Writing input/1ZUA.h5

(dragonfly_gen) hp@hp:~/Downloads/dragonfly_gen/genfromstructure$ python sampling.py -config 901 -epoch 194 -T 0.5 -pdb 1ZUA -num_mols 100
Sampling 100 molecules:
0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "sampling.py", line 233, in
novels, probs_abs = temperature_sampling(
File "sampling.py", line 100, in temperature_sampling
hiddens = egnn(g)
File "/home/hp/anaconda3/envs/dragonfly_gen/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hp/Downloads/dragonfly_gen/genfromstructure/net.py", line 178, in forward
torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 257 but got size 156 for tensor number 1 in the list.

Is the generative model biased towards Sulfer atoms.

Hello Dragonfly Team,

When sampling ligands using the example file (3g8i), I've noticed that many of the generated ligands contain a high number of sulfur (S) atoms. Does this indicate a bias towards sulfur atoms in the generative model? If so, could you explain why this occurs and suggest how it might be avoided? Additionally, is there a way to set filter conditions during the sampling process to limit the number of sulfur atoms in the generated ligands?

I am attaching the example file generated molecule.

3g8i.csv

Thank you for addressing my concerns.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.