kiharalab / gnn_dove Goto Github PK

View Code? Open in Web Editor NEW

56.0 5.0 22.0 23.8 MB

Code for "Protein Docking Model Evaluation by Graph Neural Networks"

License: GNU General Public License v3.0

Python 100.00%

gnn_dove's Introduction

GNN_DOVE

GNN-Dove is a computational tool using graph neural network that can evaluate the quality of docking protein-complexes.

License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)

Contact: Daisuke Kihara ([email protected])

Citation:

Protein Docking Model Evaluation by Graph Neural Networks

@ARTICLE{10.3389/fmolb.2021.647915,
AUTHOR={Wang, Xiao and Flannery, Sean T. and Kihara, Daisuke},   	 
TITLE={Protein Docking Model Evaluation by Graph Neural Networks},      	
JOURNAL={Frontiers in Molecular Biosciences},      	
VOLUME={8},      
PAGES={402},     	
YEAR={2021},      	  
URL={https://www.frontiersin.org/article/10.3389/fmolb.2021.647915},       
DOI={10.3389/fmolb.2021.647915},      
ISSN={2296-889X},   
}

Introduction

Physical interactions of proteins play key roles in many important cellular processes. Therefore, it is crucial to determine the structure of protein complexes to understand molecular mechanisms of interactions. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed to predict the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning-based approach named Graph Neural Network-based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph. GNN-DOVE was trained and validated on docking models in the Dockground database. GNN-DOVE performed better than existing methods including DOVE, which is our previous development that uses convolutional neural network on voxelized structure models.

Overall Protocol

(1) Extract the interface region of protein-complex;
(2) Construct two graphs with/wihout intermolecular interactions based on interface region;
(3) Apply GNN with attention mechanism to process two input graphs;
(4) Output the evaluation score for input protein-complex.

Network Architecture

The illustration of graph neural network (GNN) with attention and gate-augmented mechanism (GAT)

Pre-required software

Python 3 : https://www.python.org/downloads/
rdkit: https://www.rdkit.org/docs/Install.html
chimera (optional): https://www.cgl.ucsf.edu/chimera/download.html

Installation

1. `Install git`

2. Clone the repository in your computer

git clone [email protected]:kiharalab/GNN_DOVE.git && cd GNN_DOVE

3. Build dependencies.

You have two options to install dependency on your computer:

3.1 Install with pip and python(Ver 3.6.9).

3.1.1`install pip`.

3.1.2 Install dependency in command line.

pip install -r requirements.txt --user

If you encounter any errors, you can install each library one by one:

pip install torch==1.7.0
pip install numpy==1.18.1
pip install scipy==1.4.1

3.2 Install with anaconda

3.2.1 `install conda`.

3.2.2 Install dependency in command line

conda create -n GNN_DOVE python=3.6.10
conda activate GNN_DOVE
pip install -r requirements.txt

Each time when you want to run my code, simply activate the environment by

conda activate GNN_DOVE
conda deactivate(If you want to exit)

Usage

python3 main.py
  -h, --help            show this help message and exit
  -F F                  decoy example path
  --mode MODE           0: evaluate for single docking model 
                        1: evaluate for multi docking models
                        2: visualize attention for w/w.o intermolecular graphs from interface region
  --gpu GPU             Choose gpu id, example: '1,2'(specify use gpu 1 and 2)
  --batch_size          batch_size
  --num_workers         number of workers
  --n_graph_layer       number of GNN layer
  --d_graph_layer       dimension of GNN layer
  --n_FC_layer          number of FC layer
  --d_FC_layer          dimension of FC layer
  --initial_mu          initial value of mu
  --initial_dev         initial value of dev
  --dropout_rate        dropout_rate
  --seed SEED           random seed for shuffling
  --fold FOLD           specify fold model for prediction

1 Evaluate single protein-complex

python main.py --mode=0 -F [pdb_file] --gpu=[gpu_id] --fold=[fold_model_id]

Here -F should specify a pdb file with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id; --fold should specify the fold model you will use, where -1 denotes that you want to use the average prediction of 4 fold models and 1,2,3,4 will choose different model for predictions. (Recommend)You can specify --fold=5 to use the pretrained model with a much larger benchmark (Dockground+Zdock). The output will be kept in [Predict_Result/Single_Target]. The prediction result will be kept in Predict.txt.

Example Command (Fold 1 Model):

python main.py --mode=0 -F=example/input/correct.pdb --gpu=0 --fold=1

2 Evaluate many protein-complexes

python main.py --mode=1 -F [pdb_dir] --gpu=[gpu_id] --fold=[fold_model_id]

Here -F should specify the directory that inclues pdb files with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id; --fold should specify the fold model you will use, where -1 denotes that you want to use the average prediction of 4 fold models and 1,2,3,4 will choose different model for predictions. (Recommend)You can specify --fold=5 to use the pretrained model with a much larger benchmark (Dockground+Zdock). The output will be kept in [Predict_Result/Multi_Target]. The prediction results will be kept in Predict.txt.

Example Command (All Model):

python main.py --mode=1 -F=example/input --gpu=0 --fold=-1

3 Evaluate with model pretrained on Dockground+Zdock benchmark （Recommend）

3.1 Evaluate single protein-complex

python main.py --mode=0 -F [pdb_file] --gpu=[gpu_id] --fold=5

Here -F should specify a pdb file with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id. The output will be kept in [Predict_Result/Single_Target]. The prediction result will be kept in Predict.txt.

Example Command:

python main.py --mode=0 -F=example/input/correct.pdb --gpu=0 --fold=5

3.2 Evaluate many protein-complexes

python main.py --mode=1 -F [pdb_dir] --gpu=[gpu_id] --fold=5

Here -F should specify the directory that inclues pdb files with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id. The output will be kept in [Predict_Result/Multi_Target]. The prediction results will be kept in Predict.txt.

Example Command (All Model):

python main.py --mode=1 -F=example/input --gpu=0 --fold=5

4 Visualize attention for interface region

python main.py --mode=2 -F [pdb_file] --gpu=[gpu_id] --fold=[fold_model_id]

Here -F should specify a pdb file with Receptor chain ID 'A' and ligand chain ID 'B'; --gpu is used to specify the gpu id; --fold should specify the fold model you will use, where 1,2,3,4 can be used to choose different model for predictions.
The output will be kept in [Predict_Result/Visulize_Target]. The attention of graph with/without intermolecular will be saved in attention2_receptor.pdb + attention2_ligand.pdb and attention1_receptor.pdb + attention1_ligand.pdb, respectively. To visualize attention weights, please use chimera to visualize them: https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/bfactor.html. We saved the weights for each atom in the b-factor column, you can also visualize it by pymol.

Example Command (Fold 1 Model):

python main.py --mode=2 -F=example/input/correct.pdb --gpu=0 --fold=1

Here is an visualization example:

The left panel represents the graph with intermolecular interaction (attention2) and the right panel shows the graph only with covalent bonds (attention1).

Example

Input

1 Correct protein-Complex example: https://github.com/kiharalab/GNN_DOVE/blob/main/example/input/correct.pdb
2 Incorrect protein-Complex example: https://github.com/kiharalab/GNN_DOVE/blob/main/example/input/incorrect.pdb

Output

1 Single protein-complex output (mode=0): https://github.com/kiharalab/GNN_DOVE/tree/main/example/output/single
2 Multi protein-complexes output (mode=1): https://github.com/kiharalab/GNN_DOVE/tree/main/example/output/multi
3 Visualize graph attention (mode=2): https://github.com/kiharalab/GNN_DOVE/tree/main/example/output/visualize

gnn_dove's People

Contributors

Stargazers

Watchers

gnn_dove's Issues

After getting the iinterface and linterface files, how to input them into the model?

Hello, sorry to bother you, now I have finished the data_processing step with my own decoys, and got the iinterface and linterface files. How to divide it into training set, test set and import it into the model for training?

How to solve the problem that the dataset was highly imbalanced?

Thanks for your reply, I would like to know how to solve this problem when the training dataset is imbalanced. I read your article seriously but don't know how to implement it（'Since the dataset was highly imbalanced with more incorrect decoys than acceptable ones, we balanced the training data by sampling the same number of acceptable and incorrect decoys in each batch. We sampled the same number of correct and incorrect decoys. To achieve this, a positive (i.e., correct)decoy may be sampled multiple times in one epoch of training’.) . Can you give me some advice?Thank you very much!

Which GPU or should I use?

Hi author,
All of my package version is same as README, but I am using a RTX 4090 GPU.
After I run this command: python main.py --mode=0 -F=../1ypo.pdb --gpu=0 --fold=1
I got this:

(gnn_dove) [email protected]:/GNN_DOVE$ python main.py --mode=0 -F=../1ypo.pdb --gpu=0 --fold=1
/GNN_DOVE/Predict_Result created                                                                                                                                                      
/GNN_DOVE/Predict_Result/Single_Target created                                                                                                                                        
/GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result created                                                                                                                          
/GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result/1ypo created                                                                                                                     
Extracting 1044/0 atoms for receptor, 7314/8358 atoms for ligand                                                                                                                      
start alarm signal.                                                                                                                                                                   
After filtering the interface region, 718/1044 residue in receptor, 873/7314 residue in ligand                                                                                        
After filtering the interface region, 718 receptor, 1264 ligand                                                                                                                       
['ATOM      1  N   ALA A 142      29.218  41.101  17.063  1.00 42.98           N  \n', 'ATOM      2  CA  ALA A 142      27.727  41.114  17.117  1.00 43.02           C  \n', 'ATOM      3  C   ALA A 142      27.217  40.248  18.265  1.00 42.75 ... ]
close alarm signal.
/GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result/1ypo/Input.rinterface /GNN_DOVE/Predict_Result/Single_Target/Fold_1_Result/1ypo/Input.linterface
    Total params: 0.2175270000M
/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/cuda/__init__.py:104: UserWarning: 
NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "main.py", line 46, in <module>
    predict_single_input(input_path,params)
  File "/GNN_DOVE/predict/predict_single_input.py", line 112, in predict_single_input
    Final_Pred=Get_Predictions(dataloader, device, model)
  File "/GNN_DOVE/predict/predict_single_input.py", line 62, in Get_Predictions
    pred= model.test_model((H, A1, A2, V, Atom_count), device)
  File "/GNN_DOVE/model/GNN_Model.py", line 144, in test_model
    c_hs = self.embede(c_hs)
  File "/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/opt/conda/envs/gnn_dove/lib/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

I am wondering what GPU and CUDA version did you use?
Thank you

Solution suggest for AttributeError: 'NoneType' object has no attribute 'GetNumAtoms' #1

Hi.
I think your project is great. Thank you for sharing this project.

Unfortunately, once protein-protein complex does not have CHAIN B. Determination of receptor and ligand is giving this error:
AttributeError: 'NoneType' object has no attribute 'GetNumAtoms' #1
because llist returns none and Rdkit is giving this error.
Problem 1
Therefore, I changed some lines to use in my research. I also compare with original code results for a couple of complexes and the result are the same.
def Extract_Interface(pdb_path):
b=0
if (dat_in[0] == 'TER'):
b=b+1
Just before **if (dat_in[0] == 'ATOM'): **
.....
if b == 0:
rlist.append(tmp_list)
tmp_list = []
tmp_list.append([x, y, z, atom_type, count_l])
count_l += 1
receptor_list.append(line)
else:
llist.append(tmp_list)
tmp_list = []
tmp_list.append([x, y, z, atom_type, count_l])
count_l += 1
ligand_list.append(line)
Just Before print("After filtering the interface region, %d receptor, %d ligand"%(len(final_receptor),len(final_ligand)))

I will arrange my structure with TER between ligand and receptor. ZDock is already providing this format.

PROBLEM 2
Sometimes atom number in protein complex is not in order. It can be started from 1 again. So, I changed these lines.

def Form_interface(rlist,llist,receptor_list,ligand_list,cut_off=10):
....................
control=0
try:
for residue in newllist:
for tmp_atom in residue:
our_index = tmp_atom[4]
"""print("ouurrr_index")
print(our_index)"""
final_ligand.append(ligand_list[our_index])
control=control+1
except:
if control > 0:
pass
else:
b = 0
for residue in newllist:
for tmp_atom in residue:
our_index = b
"""print("ouurrr_index")
print(our_index)"""
final_ligand.append(ligand_list[our_index])
b = b + 1
Just before print("After filtering the interface region, %d receptor, %d ligand"%(len(final_receptor),len(final_ligand)))

I am planning to use original form as much as I can. I hope these can help some.

AttributeError: 'NoneType' object has no attribute 'GetNumAtoms'

Dear developers,GNN_DOVE is pretty cool tool for docking protein-complexes，traditional docking methods are too slow(auto-dock vina or zdock etc),I have successfully downloaded and deployed your software in centos7(GPU RTX 3090)，then i test it with official '"correct.pdb",The software runs very quickly and successfully，just like bellow:

>**python main.py --mode=0 -F ./test2/correct.pdb --gpu=0 --fold=5**  
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result/correct created
Extracting 1900/1900 atoms for receptor, 983/983 atoms for ligand
start alarm signal.
After filtering the interface region, 48/234 residue in receptor, 60/109 residue in ligand
After filtering the interface region, 387 receptor, 522 ligand
close alarm signal.
    Total params: 0.2175270000M

and then,I then input it from my own PDB files that genarate from MODELLER 10.1,i use the sub-tool pdb-tool(pdb_rplchain) change the chain ID for the two raw pdb files(set alpha.pdb with chain A and set beta.pdb with chain B ),then i run the GNN_DOVE with many protein-complexes model(only put alpha.pdb and beta.pdb in./test/),but an error occurred,just like bellow:

>python main.py --mode=1 -F ./test/ --gpu=0 --fold=5
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target created
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target/Fold_5_Result created
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target/Fold_5_Result/test created
    Total params: 0.2175270000M
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Multi_Target/Fold_5_Result/test/alpha created
Extracting 1628/1628 atoms for receptor, 0/0 atoms for ligand
start alarm signal.
After filtering the interface region, 0/200 residue in receptor, 0/0 residue in ligand
After filtering the interface region, 0 receptor, 0 ligand
close alarm signal.
Traceback (most recent call last):
  File "main.py", line 53, in <module>
    predict_multi_input(input_path, params)
  File "/Tools/AnalysisTools/GNN_DOVE-main/predict/predict_multi_input.py", line 89, in predict_multi_input
    input_file = Prepare_Input(structure_path)
  File "/Tools/AnalysisTools/GNN_DOVE-main/data_processing/Prepare_Input.py", line 50, in Prepare_Input
    receptor_count = receptor_mol.GetNumAtoms()
AttributeError: 'NoneType' object has no attribute 'GetNumAtoms'

there is my two pdb files(Set the txt suffix just for upload github),
alpha.pdb.txt
beta.pdb.txt
In addition,i use the pdb_merge tools merge tow pdb files into one pdb file ,then testing the GNN_DOVE,an error occurred,lust like bellow:

>python main.py --mode=0 -F ./test/ab.pdb --gpu=0 --fold=5
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result existed
/Tools/AnalysisTools/GNN_DOVE-main/Predict_Result/Single_Target/Fold_5_Result/ab created
Extracting 1628/1628 atoms for receptor, 1751/1751 atoms for ligand
start alarm signal.
After filtering the interface region, 0/201 residue in receptor, 0/208 residue in ligand
After filtering the interface region, 0 receptor, 0 ligand
close alarm signal.
Traceback (most recent call last):
  File "main.py", line 46, in <module>
    predict_single_input(input_path,params)
  File "/Tools/AnalysisTools/GNN_DOVE-main/predict/predict_single_input.py", line 85, in predict_single_input
    input_file=Prepare_Input(structure_path)
  File "/Tools/AnalysisTools/GNN_DOVE-main/data_processing/Prepare_Input.py", line 50, in Prepare_Input
    receptor_count = receptor_mol.GetNumAtoms()
AttributeError: 'NoneType' object has no attribute 'GetNumAtoms'

In the end, i check the difference between official '"correct.pdb" and my own alpha.pdb and beta.pdb and ab.pdb,i found that official '"correct.pdb" has an extra column of information,just like bellow:

So my question is that how to correctly prepare the input file format of GNN_DOVE, or how to convert my pdb format to official '"correct.pdb" format ,howerer maybe there are no residue in my case after filtering ,Looking forward to your reply ,thanks!

Is the code going to run on pretrained model directly?

Hi author,
Great work!
In the README, I didn't see there is code for loading the pretrained model. I was wondering, if I run this command python main.py --mode=0 -F [pdb_file] --gpu=[gpu_id] --fold=[fold_model_id] is the code going to run on the pretrained model automatically?

Lots of protein pdb file cannot be processed

Hi author,
I can run your code successfully, but it seems like it struggles with lots of protein PDB files, taking hours without producing results. I am wondering why.
Here are some example protein pdb files:

2YUW
3PDZ
3MTK
2RRE
4BCZ
Can you please take a look, thank you

consult

Excuse me, I want to ask if DOVE supports using its own data to get a model through training？

Answer for questions of dataset configuration for training

Dataset weights setting:

        labels=np.array(labels)
        num_train_correct=len(np.argwhere(labels==1))
        num_train_incorrect =len(np.argwhere(labels==0))
        print("In this dataset, we have %d examples, with %d/%d"%(len(labels),num_train_correct,num_train_incorrect))
        train_weights = [1 / num_train_correct if labels[k]==1 else 1 / num_train_incorrect for k in range(len(labels))]

Data Sampler code:

from torch.utils.data.sampler import Sampler
import numpy as np

class Data_Sampler(Sampler):

    def __init__(self, weights, num_samples, replacement=True):
        weights = np.array(weights) / np.sum(weights)
        self.weights = weights
        self.num_samples = num_samples
        self.replacement = replacement

    def __iter__(self):
        # return iter(torch.multinomial(self.weights, self.num_samples, self.replacement).tolist())
        retval = np.random.choice(len(self.weights), self.num_samples, replace=self.replacement, p=self.weights)
        return iter(retval.tolist())

    def __len__(self):
        return self.num_samples

Data Loader configuration code:

train_sampler = Data_Sampler(train_dataset.weights, len(train_dataset.weights), replacement=True)
train_dataloader = DataLoader(train_dataset, batch_size, shuffle=False,
                                  num_workers=params['num_workers'], sampler=train_sampler, collate_fn=collate_fn)

kiharalab / gnn_dove Goto Github PK

gnn_dove's Introduction

GNN_DOVE

Citation:

Introduction

Overall Protocol

Network Architecture

Pre-required software

Installation

1. Install git

2. Clone the repository in your computer

3. Build dependencies.

3.1 Install with pip and python(Ver 3.6.9).

3.1.1install pip.

3.1.2 Install dependency in command line.

3.2 Install with anaconda

3.2.1 install conda.

3.2.2 Install dependency in command line

Usage

1 Evaluate single protein-complex

Example Command (Fold 1 Model):

2 Evaluate many protein-complexes

Example Command (All Model):

3 Evaluate with model pretrained on Dockground+Zdock benchmark （Recommend）

3.1 Evaluate single protein-complex

Example Command:

3.2 Evaluate many protein-complexes

Example Command (All Model):

4 Visualize attention for interface region

Example Command (Fold 1 Model):

Example

Input

Output

gnn_dove's People

Contributors

Stargazers

Watchers

Forkers

gnn_dove's Issues

Recommend Projects

Recommend Topics

Recommend Org

1. `Install git`

3.1.1`install pip`.

3.2.1 `install conda`.