Giter Club home page Giter Club logo

edm-dock's Introduction

EDM-Dock

Code for our paper Deep Learning Model for Flexible and Efficient Protein-Ligand Docking

Installation

git clone https://github.com/MatthewMasters/EDM-Dock
cd EDM-Dock
conda env create -f environment.yaml -n edm-dock
conda activate edm-dock
python setup.py install

Usage

Dock your own molecules using the pre-trained model

Step 1: Prepare a dataset using the following format

dataset_path/
    sys1/
        protein.pdb
        ligand.sdf
        box.csv
    sys2/
        protein.pdb
        ligand.sdf
        box.csv
    ...

The box.csv file defines the binding site box and should have six comma-seperated values:

center_x, center_y, center_z, width_x, width_y, width_z

Step 2: Prepare the features using the following command

python scripts/prepare.py --dataset_path [dataset_path]

Step 3: Download DGSOL

Since DGSOL does not have an MIT license, it's code is included in a seperate repository (https://github.com/MatthewMasters/DGSOL.git). Once you have downloaded DGSOL independently, update the path at the top of edmdock/utils/dock.py to reflect the path on your system. Remember to rebuild the package by issuing the command python setup.py install.

Step 4: Run Docking

By default this will run the docking including the minimization process. You can turn off minimization for much faster docking, however it may generate unrealistic molecular structures by editing the last line in runs/paper_baseline/config.yml.

python scripts/dock.py --run_path runs/paper_baseline --dataset_path [dataset_path]

The final docked poses are saved in the folder runs/paper_baseline/results as [ID]_docked.pdb.

Train model using your own dataset

Step 1: Prepare a dataset using the format described above

Step 2: Prepare the features using the following command

python scripts/prepare.py --dataset_path [dataset_path]

Step 3: Write a configuration file

An example can be found at runs/paper_baseline/config.yml

Step 4: Begin training with the following command

python scripts/train.py --config_path [config_path]

Reference

Under Review

edm-dock's People

Contributors

matthewmasters avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

edm-dock's Issues

ModuleNotFoundError: No module named 'openforcefield'

Input:
python3 scripts/dock.py --run_path runs/paper_baseline --dataset_path ./examples

Output:
Traceback (most recent call last):
File "/Users/chiragpatel/Desktop/EDM-Dock/scripts/dock.py", line 12, in
from edmdock.utils.dock import Minimizer
File "/Users/chiragpatel/anaconda3/envs/edm-dock/lib/python3.10/site-packages/EDM_Dock-1.0-py3.10.egg/edmdock/utils/dock.py", line 8, in
from openforcefield.topology import Molecule
ModuleNotFoundError: No module named 'openforcefield'

Error: TypeError: 'method' object is not iterable

Original Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/usr/local/lib/python3.10/site-packages/edmdock/utils/data.py", line 224, in __call__ def __call__(self, batch): File "/usr/local/lib/python3.10/site-packages/edmdock/utils/data.py", line 95, in from_data_list TypeError: 'method' object is not iterable

Hello Matthew, Thank you so much for this amazing package and implementation!

I've been trying to create a colab notebook to ease the process of running your code, but I keep running into this problem in data.py. Could you help provide some clarity on what could possibly be occuring?

Thank you very much in advance!

Data processing and dataset

Hi, are there any data processing and data spliting scripts available? I have confusion about the complex structures filtering from PDBBindv2020 and BioLip: Why did you remove ligands binding with multiple chains, even if their percentage is relatively small from your reported number in SI? I have downloaded the released dataset (without data spliting), and I found thousands of mol2 files is invalid after rdkit loading (rdkit version 2022.09.5), although using schrodinger-2021 to convert the mol2 file to sdf file, there are still 1738 sdf files failed. How did you solve it? Or can you update the released dataset?
Thanks,
David

RuntimeError: Attempting to deserialize object on CUDA device 6 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

input:
python scripts/dock.py --run_path runs/paper_baseline --dataset_path examples

output:
Warning: Unable to load toolkit 'OpenEye Toolkit'. The Open Force Field Toolkit does not require the OpenEye Toolkits, and can use RDKit/AmberTools instead. However, if you have a valid license for the OpenEye Toolkits, consider installing them for faster performance and additional file format support: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html OpenEye offers free Toolkit licenses for academics: https://www.eyesopen.com/academic-licensing
Using weights from... runs/paper_baseline/weights.ckpt
set seed for random, numpy and torch
Traceback (most recent call last):
File "/home/jacek/EDMDockTutorial/EDM-Dock/scripts/dock.py", line 119, in
model.load_state_dict(torch.load(weight_path)['state_dict'])
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/serialization.py", line 712, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/serialization.py", line 1049, in _load
result = unpickler.load()
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/pickle.py", line 1212, in load
dispatchkey[0]
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/pickle.py", line 1253, in load_binpersid
self.append(self.persistent_load(pid))
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/serialization.py", line 1019, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/serialization.py", line 1001, in load_tensor
wrap_storage=restore_location(storage, location),
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/serialization.py", line 152, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/jacek/Install/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/serialization.py", line 143, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on CUDA device '
RuntimeError: Attempting to deserialize object on CUDA device 6 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

CUDAAccelerator

input:
python scripts/dock.py --run_path runs/paper_baseline --dataset_path examples

output:
Warning: Unable to load toolkit 'OpenEye Toolkit'. The Open Force Field Toolkit does not require the OpenEye Toolkits, and can use RDKit/AmberTools instead. However, if you have a valid license for the OpenEye Toolkits, consider installing them for faster performance and additional file format support: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html OpenEye offers free Toolkit licenses for academics: https://www.eyesopen.com/academic-licensing
Using weights from... runs/paper_baseline/weights.ckpt
set seed for random, numpy and torch
Loading test set...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 113.23it/s]
/home/yanapatj/miniconda3/envs/edm-dock/lib/python3.9/site-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
/home/yanapatj/miniconda3/envs/edm-dock/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:441: LightningDeprecationWarning: Setting Trainer(gpus=1) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=1) instead.
rank_zero_deprecation(
Traceback (most recent call last):
File "/gpfs/projects/parisahlab/yanapatj/EDM-Dock/scripts/dock.py", line 130, in
trainer = Trainer(gpus=config['cuda'])
File "/home/yanapatj/miniconda3/envs/edm-dock/lib/python3.9/site-packages/pytorch_lightning/utilities/argparse.py", line 340, in insert_env_defaults
return fn(self, **kwargs)
File "/home/yanapatj/miniconda3/envs/edm-dock/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 414, in init
self._accelerator_connector = AcceleratorConnector(
File "/home/yanapatj/miniconda3/envs/edm-dock/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 208, in init
self._set_parallel_devices_and_init_accelerator()
File "/home/yanapatj/miniconda3/envs/edm-dock/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 528, in _set_parallel_devices_and_init_accelerator
raise MisconfigurationException(
lightning_lite.utilities.exceptions.MisconfigurationException: CUDAAccelerator can not run on your system since the accelerator is not available. The following accelerator(s) is available and can be passed into accelerator argument of Trainer: ['cpu'].

Weird docked pose for generated molecule

Hi, many thanks for sharing this great work!
I was trying to predict binding pose for some DL-method generated molecule like this:

image

whose smiles is CN1C=NC2=C(C1)O[C@H](C1=C[C@@H](O)C=C[C@H]1O)[C@H]2CC[C@@H](O)C(=O)O.

However, it was found that after EDM prediction and DGSOL alignment, the conformation for this molecule turned from left to right, which is probably not correct since the atoms of neither the five- nor six-member ring are coplanar.
imageimage

I was wondering what might be causing this problem, and how may I fix it? Thanks in advance!

no module named openforcefield

as i use the command
python scripts/dock.py
there is a error,and i can't find the module 'openforcefield' , how can i solve this problem?

Traceback (most recent call last):
File "/home/zlf/Downloads/EDM-Dock/scripts/dock.py", line 12, in
from edmdock.utils.dock import Minimizer
File "/home/zlf/anaconda3/envs/edmdock/lib/python3.9/site-packages/EDM_Dock-1.0-py3.9.egg/edmdock/utils/dock.py", line 8, in
from openforcefield.topology import Molecule
ModuleNotFoundError: No module named 'openforcefield'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.