Giter Club home page Giter Club logo

gnn_dove's Introduction

GNN_DOVE

GNN-Dove is a computational tool using graph neural network that can evaluate the quality of docking protein-complexes.

Copyright (C) 2020 Xiao Wang, Sean T Flannery, Daisuke Kihara, and Purdue University.

License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)

Contact: Daisuke Kihara ([email protected])

Citation:

Protein Docking Model Evaluation by Graph Neural Networks

@ARTICLE{10.3389/fmolb.2021.647915,
AUTHOR={Wang, Xiao and Flannery, Sean T. and Kihara, Daisuke},   	 
TITLE={Protein Docking Model Evaluation by Graph Neural Networks},      	
JOURNAL={Frontiers in Molecular Biosciences},      	
VOLUME={8},      
PAGES={402},     	
YEAR={2021},      	  
URL={https://www.frontiersin.org/article/10.3389/fmolb.2021.647915},       
DOI={10.3389/fmolb.2021.647915},      
ISSN={2296-889X},   
}

Introduction

Physical interactions of proteins play key roles in many important cellular processes. Therefore, it is crucial to determine the structure of protein complexes to understand molecular mechanisms of interactions. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed to predict the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning-based approach named Graph Neural Network-based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph. GNN-DOVE was trained and validated on docking models in the Dockground database. GNN-DOVE performed better than existing methods including DOVE, which is our previous development that uses convolutional neural network on voxelized structure models.

Overall Protocol

(1) Extract the interface region of protein-complex;
(2) Construct two graphs with/wihout intermolecular interactions based on interface region;
(3) Apply GNN with attention mechanism to process two input graphs;
(4) Output the evaluation score for input protein-complex.

protocol

Network Architecture

network

The illustration of graph neural network (GNN) with attention and gate-augmented mechanism (GAT)

Pre-required software

Python 3 : https://www.python.org/downloads/
rdkit: https://www.rdkit.org/docs/Install.html
chimera (optional): https://www.cgl.ucsf.edu/chimera/download.html

Installation

2. Clone the repository in your computer

git clone [email protected]:kiharalab/GNN_DOVE.git && cd GNN_DOVE

3. Build dependencies.

You have two options to install dependency on your computer:

3.1 Install with pip and python(Ver 3.6.9).

3.1.2 Install dependency in command line.
pip install -r requirements.txt --user

If you encounter any errors, you can install each library one by one:

pip install torch==1.7.0
pip install numpy==1.18.1
pip install scipy==1.4.1

3.2 Install with anaconda

3.2.2 Install dependency in command line
conda create -n GNN_DOVE python=3.6.10
conda activate GNN_DOVE
pip install -r requirements.txt 

Each time when you want to run my code, simply activate the environment by

conda activate GNN_DOVE
conda deactivate(If you want to exit) 

Usage

python3 main.py
  -h, --help            show this help message and exit
  -F F                  decoy example path
  --mode MODE           0: evaluate for single docking model 
                        1: evaluate for multi docking models
                        2: visualize attention for w/w.o intermolecular graphs from interface region
  --gpu GPU             Choose gpu id, example: '1,2'(specify use gpu 1 and 2)
  --batch_size          batch_size
  --num_workers         number of workers
  --n_graph_layer       number of GNN layer
  --d_graph_layer       dimension of GNN layer
  --n_FC_layer          number of FC layer
  --d_FC_layer          dimension of FC layer
  --initial_mu          initial value of mu
  --initial_dev         initial value of dev
  --dropout_rate        dropout_rate
  --seed SEED           random seed for shuffling
  --fold FOLD           specify fold model for prediction

1 Evaluate single protein-complex

python main.py --mode=0 -F [pdb_file] --gpu=[gpu_id] --fold=[fold_model_id]

Here -F should specify a pdb file with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id; --fold should specify the fold model you will use, where -1 denotes that you want to use the average prediction of 4 fold models and 1,2,3,4 will choose different model for predictions. (Recommend)You can specify --fold=5 to use the pretrained model with a much larger benchmark (Dockground+Zdock). The output will be kept in [Predict_Result/Single_Target]. The prediction result will be kept in Predict.txt.

Example Command (Fold 1 Model):
python main.py --mode=0 -F=example/input/correct.pdb --gpu=0 --fold=1

2 Evaluate many protein-complexes

python main.py --mode=1 -F [pdb_dir] --gpu=[gpu_id] --fold=[fold_model_id]

Here -F should specify the directory that inclues pdb files with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id; --fold should specify the fold model you will use, where -1 denotes that you want to use the average prediction of 4 fold models and 1,2,3,4 will choose different model for predictions. (Recommend)You can specify --fold=5 to use the pretrained model with a much larger benchmark (Dockground+Zdock). The output will be kept in [Predict_Result/Multi_Target]. The prediction results will be kept in Predict.txt.

Example Command (All Model):
python main.py --mode=1 -F=example/input --gpu=0 --fold=-1

3 Evaluate with model pretrained on Dockground+Zdock benchmark (Recommend)

3.1 Evaluate single protein-complex

python main.py --mode=0 -F [pdb_file] --gpu=[gpu_id] --fold=5

Here -F should specify a pdb file with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id. The output will be kept in [Predict_Result/Single_Target]. The prediction result will be kept in Predict.txt.

Example Command:
python main.py --mode=0 -F=example/input/correct.pdb --gpu=0 --fold=5

3.2 Evaluate many protein-complexes

python main.py --mode=1 -F [pdb_dir] --gpu=[gpu_id] --fold=5

Here -F should specify the directory that inclues pdb files with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id. The output will be kept in [Predict_Result/Multi_Target]. The prediction results will be kept in Predict.txt.

Example Command (All Model):
python main.py --mode=1 -F=example/input --gpu=0 --fold=5

4 Visualize attention for interface region

python main.py --mode=2 -F [pdb_file] --gpu=[gpu_id] --fold=[fold_model_id]

Here -F should specify a pdb file with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id; --fold should specify the fold model you will use, where 1,2,3,4 can be used to choose different model for predictions.
The output will be kept in [Predict_Result/Visulize_Target]. The attention of graph with/without intermolecular will be saved in attention2_receptor.pdb + attention2_ligand.pdb and attention1_receptor.pdb + attention1_ligand.pdb, respectively. To visualize attention weights, please use chimera to visualize them: https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/bfactor.html. We saved the weights for each atom in the b-factor column, you can also visualize it by pymol.

Example Command (Fold 1 Model):
python main.py --mode=2 -F=example/input/correct.pdb --gpu=0 --fold=1

Here is an visualization example:

network

The left panel represents the graph with intermolecular interaction (attention2) and the right panel shows the graph only with covalent bonds (attention1).

Example

Input

1 Correct protein-Complex example: https://github.com/kiharalab/GNN_DOVE/blob/main/example/input/correct.pdb
2 Incorrect protein-Complex example: https://github.com/kiharalab/GNN_DOVE/blob/main/example/input/incorrect.pdb

Output

1 Single protein-complex output (mode=0): https://github.com/kiharalab/GNN_DOVE/tree/main/example/output/single
2 Multi protein-complexes output (mode=1): https://github.com/kiharalab/GNN_DOVE/tree/main/example/output/multi
3 Visualize graph attention (mode=2): https://github.com/kiharalab/GNN_DOVE/tree/main/example/output/visualize

gnn_dove's People

Contributors

wang3702 avatar yauz3 avatar

Stargazers

Suhaib Shekfeh avatar John Hayden Hill avatar Colin Roberson avatar Ye avatar Simon Tie avatar OliverTao avatar Surya avatar  avatar Dr. Santosh Shah avatar Akash Bahai avatar jsk avatar daiyizheng avatar  avatar  avatar  avatar Badr-Eddine Marani avatar Zhenping Li avatar  avatar  avatar  avatar Hazem Mslati avatar Ming Hao avatar  avatar  avatar Qian Peisheng avatar  avatar Kevin Ling avatar Qingyang Ding avatar  avatar MuhammadAnwar avatar 徐罡 avatar Henz6 avatar  avatar Ratthachat (Jung) avatar  avatar Pietro Morerio avatar Kun Chen avatar Johnny Tam avatar Limei Wang avatar Haoqi Fan avatar  avatar sjtu-Ma lab-Sun Chuance avatar Computational Structural Biology - Karaca Lab avatar Leela S. Dodda avatar  avatar  avatar  avatar  avatar  avatar Eric Alcaide avatar Alexander Goncearenco avatar Yasuhiro Matsunaga avatar Talha Karabıyık avatar Yunguan Fu avatar Sean Flannery avatar  avatar

Watchers

James Cloos avatar  avatar Kostas Georgiou avatar Hazem Mslati avatar  avatar

gnn_dove's Issues

How to solve the problem that the dataset was highly imbalanced?

Thanks for your reply, I would like to know how to solve this problem when the training dataset is imbalanced. I read your article seriously but don't know how to implement it('Since the dataset was highly imbalanced with more incorrect decoys than acceptable ones, we balanced the training data by sampling the same number of acceptable and incorrect decoys in each batch. We sampled the same number of correct and incorrect decoys. To achieve this, a positive (i.e., correct)decoy may be sampled multiple times in one epoch of training’.) . Can you give me some advice?Thank you very much!

Which GPU or should I use?

Hi author,
All of my package version is same as README, but I am using a RTX 4090 GPU.
After I run this command: python main.py --mode=0 -F=../1ypo.pdb --gpu=0 --fold=1
I got this:

(gnn_dove) [email protected]:/GNN_DOVE$ python main.py --mode=0 -F=../1ypo.pdb --gpu=0 --fold=1
/GNN_DOVE/Predict_Result created                                                                                                                                                      
/GNN_DOVE/Predict_Result/Single_Target created                                                                                                                                        
/GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result created                                                                                                                          
/GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result/1ypo created                                                                                                                     
Extracting 1044/0 atoms for receptor, 7314/8358 atoms for ligand                                                                                                                      
start alarm signal.                                                                                                                                                                   
After filtering the interface region, 718/1044 residue in receptor, 873/7314 residue in ligand                                                                                        
After filtering the interface region, 718 receptor, 1264 ligand                                                                                                                       
['ATOM      1  N   ALA A 142      29.218  41.101  17.063  1.00 42.98           N  \n', 'ATOM      2  CA  ALA A 142      27.727  41.114  17.117  1.00 43.02           C  \n', 'ATOM      3  C   ALA A 142      27.217  40.248  18.265  1.00 42.75 ... ]
close alarm signal.
/GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result/1ypo/Input.rinterface /GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result/1ypo/Input.linterface
    Total params: 0.2175270000M
/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/cuda/__init__.py:104: UserWarning: 
NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "main.py", line 46, in <module>
    predict_single_input(input_path,params)
  File "/GNN_DOVE/predict/predict_single_input.py", line 112, in predict_single_input
    Final_Pred=Get_Predictions(dataloader, device, model)
  File "/GNN_DOVE/predict/predict_single_input.py", line 62, in Get_Predictions
    pred= model.test_model((H, A1, A2, V, Atom_count), device)
  File "/GNN_DOVE/model/GNN_Model.py", line 144, in test_model
    c_hs = self.embede(c_hs)
  File "/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

I am wondering what GPU and CUDA version did you use?
Thank you

Solution suggest for AttributeError: 'NoneType' object has no attribute 'GetNumAtoms' #1

Hi.
I think your project is great. Thank you for sharing this project.

Unfortunately, once protein-protein complex does not have CHAIN B. Determination of receptor and ligand is giving this error:
AttributeError: 'NoneType' object has no attribute 'GetNumAtoms' #1
because llist returns none and Rdkit is giving this error.
Problem 1
Therefore, I changed some lines to use in my research. I also compare with original code results for a couple of complexes and the result are the same.
def Extract_Interface(pdb_path):
b=0
if (dat_in[0] == 'TER'):
b=b+1
Just before **if (dat_in[0] == 'ATOM'): **
.....
if b == 0:
rlist.append(tmp_list)
tmp_list = []
tmp_list.append([x, y, z, atom_type, count_l])
count_l += 1
receptor_list.append(line)
else:
llist.append(tmp_list)
tmp_list = []
tmp_list.append([x, y, z, atom_type, count_l])
count_l += 1
ligand_list.append(line)
Just Before print("After filtering the interface region, %d receptor, %d ligand"%(len(final_receptor),len(final_ligand)))

I will arrange my structure with TER between ligand and receptor. ZDock is already providing this format.

PROBLEM 2
Sometimes atom number in protein complex is not in order. It can be started from 1 again. So, I changed these lines.

def Form_interface(rlist,llist,receptor_list,ligand_list,cut_off=10):
....................
control=0
try:
for residue in newllist:
for tmp_atom in residue:
our_index = tmp_atom[4]
"""print("ouurrr_index")
print(our_index)"""
final_ligand.append(ligand_list[our_index])
control=control+1
except:
if control > 0:
pass
else:
b = 0
for residue in newllist:
for tmp_atom in residue:
our_index = b
"""print("ouurrr_index")
print(our_index)"""
final_ligand.append(ligand_list[our_index])
b = b + 1
Just before print("After filtering the interface region, %d receptor, %d ligand"%(len(final_receptor),len(final_ligand)))

I am planning to use original form as much as I can. I hope these can help some.

AttributeError: 'NoneType' object has no attribute 'GetNumAtoms'

Dear developers,GNN_DOVE is pretty cool tool for docking protein-complexes,traditional docking methods are too slow(auto-dock vina or zdock etc),I have successfully downloaded and deployed your software in centos7(GPU RTX 3090),then i test it with official '"correct.pdb",The software runs very quickly and successfully,just like bellow:

>**python main.py --mode=0 -F ./test2/correct.pdb --gpu=0 --fold=5**  
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result/correct created
Extracting 1900/1900 atoms for receptor, 983/983 atoms for ligand
start alarm signal.
After filtering the interface region, 48/234 residue in receptor, 60/109 residue in ligand
After filtering the interface region, 387 receptor, 522 ligand
close alarm signal.
    Total params: 0.2175270000M

and then,I then input it from my own PDB files that genarate from MODELLER 10.1,i use the sub-tool pdb-tool(pdb_rplchain) change the chain ID for the two raw pdb files(set alpha.pdb with chain A and set beta.pdb with chain B ),then i run the GNN_DOVE with many protein-complexes model(only put alpha.pdb and beta.pdb in./test/),but an error occurred,just like bellow:

>python main.py --mode=1 -F ./test/ --gpu=0 --fold=5
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target created
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target/Fold_5_Result created
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target/Fold_5_Result/test created
    Total params: 0.2175270000M
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target/Fold_5_Result/test/alpha created
Extracting 1628/1628 atoms for receptor, 0/0 atoms for ligand
start alarm signal.
After filtering the interface region, 0/200 residue in receptor, 0/0 residue in ligand
After filtering the interface region, 0 receptor, 0 ligand
close alarm signal.
Traceback (most recent call last):
  File "main.py", line 53, in <module>
    predict_multi_input(input_path, params)
  File "/Tools/AnalysisTools/GNN_DOVE-main/predict/predict_multi_input.py", line 89, in predict_multi_input
    input_file = Prepare_Input(structure_path)
  File "/Tools/AnalysisTools/GNN_DOVE-main/data_processing/Prepare_Input.py", line 50, in Prepare_Input
    receptor_count = receptor_mol.GetNumAtoms()
AttributeError: 'NoneType' object has no attribute 'GetNumAtoms'

there is my two pdb files(Set the txt suffix just for upload github),
alpha.pdb.txt
beta.pdb.txt
In addition,i use the pdb_merge tools merge tow pdb files into one pdb file ,then testing the GNN_DOVE,an error occurred,lust like bellow:

>python main.py --mode=0 -F ./test/ab.pdb --gpu=0 --fold=5
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result/ab created
Extracting 1628/1628 atoms for receptor, 1751/1751 atoms for ligand
start alarm signal.
After filtering the interface region, 0/201 residue in receptor, 0/208 residue in ligand
After filtering the interface region, 0 receptor, 0 ligand
close alarm signal.
Traceback (most recent call last):
  File "main.py", line 46, in <module>
    predict_single_input(input_path,params)
  File "/Tools/AnalysisTools/GNN_DOVE-main/predict/predict_single_input.py", line 85, in predict_single_input
    input_file=Prepare_Input(structure_path)
  File "/Tools/AnalysisTools/GNN_DOVE-main/data_processing/Prepare_Input.py", line 50, in Prepare_Input
    receptor_count = receptor_mol.GetNumAtoms()
AttributeError: 'NoneType' object has no attribute 'GetNumAtoms'

In the end, i check the difference between official '"correct.pdb" and my own alpha.pdb and beta.pdb and ab.pdb,i found that official '"correct.pdb" has an extra column of information,just like bellow:
fig

So my question is that how to correctly prepare the input file format of GNN_DOVE, or how to convert my pdb format to official '"correct.pdb" format ,howerer maybe there are no residue in my case after filtering ,Looking forward to your reply ,thanks!

Is the code going to run on pretrained model directly?

Hi author,
Great work!
In the README, I didn't see there is code for loading the pretrained model. I was wondering, if I run this command python main.py --mode=0 -F [pdb_file] --gpu=[gpu_id] --fold=[fold_model_id] is the code going to run on the pretrained model automatically?

consult

Excuse me, I want to ask if DOVE supports using its own data to get a model through training?

Answer for questions of dataset configuration for training

Dataset weights setting:

        labels=np.array(labels)
        num_train_correct=len(np.argwhere(labels==1))
        num_train_incorrect =len(np.argwhere(labels==0))
        print("In this dataset, we have %d examples, with %d/%d"%(len(labels),num_train_correct,num_train_incorrect))
        train_weights = [1 / num_train_correct if labels[k]==1 else 1 / num_train_incorrect for k in range(len(labels))]

Data Sampler code:

from torch.utils.data.sampler import Sampler
import numpy as np

class Data_Sampler(Sampler):

    def __init__(self, weights, num_samples, replacement=True):
        weights = np.array(weights) / np.sum(weights)
        self.weights = weights
        self.num_samples = num_samples
        self.replacement = replacement

    def __iter__(self):
        # return iter(torch.multinomial(self.weights, self.num_samples, self.replacement).tolist())
        retval = np.random.choice(len(self.weights), self.num_samples, replace=self.replacement, p=self.weights)
        return iter(retval.tolist())

    def __len__(self):
        return self.num_samples

Data Loader configuration code:

train_sampler = Data_Sampler(train_dataset.weights, len(train_dataset.weights), replacement=True)
train_dataloader = DataLoader(train_dataset, batch_size, shuffle=False,
                                  num_workers=params['num_workers'], sampler=train_sampler, collate_fn=collate_fn)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.