hkmztrk / deepdta Goto Github PK

Python 99.23% Shell 0.77%

protein-ligand-interactions convolutional-neural-networks

deepdta's Introduction

About DeepDTA: deep drug-target binding affinity prediction

The approach used in this work is the modeling of protein sequences and compound 1D representations (SMILES) with convolutional neural networks (CNNs) to predict the binding affinity value of drug-target pairs.

Installation

Data

Please see the README for detailed explanation.

Requirements

You'll need to install following in order to run the codes. Refer to deepdta.yml for a conda environment tested in Linux.

Python 3.4 <=
Keras 2.x
Tensorflow 1.x
numpy
matplotlib
scikit-learn

You have to place "data" folder under "source" directory.

Usage

python run_experiments.py --num_windows 32 \
                          --seq_window_lengths 8 12 \
                          --smi_window_lengths 4 8 \
                          --batch_size 256 \
                          --num_epoch 100 \
                          --max_seq_len 1000 \
                          --max_smi_len 100 \
                          --dataset_path 'data/kiba/' \
                          --problem_type 1 \
                          --log_dir 'logs/'

For citation:

@article{ozturk2018deepdta,
  title={DeepDTA: deep drug--target binding affinity prediction},
  author={{\"O}zt{\"u}rk, Hakime and {\"O}zg{\"u}r, Arzucan and Ozkirimli, Elif},
  journal={Bioinformatics},
  volume={34},
  number={17},
  pages={i821--i829},
  year={2018},
  publisher={Oxford University Press}
}

deepdta's People

Contributors

Stargazers

Watchers

Forkers

sky-xian zfisfrank halilbilgin aspirincode sacdallago stjordanis mhosoda highdxy zjujdj sunitach10 hoytwen hipracheta ml-lab jiaxin-ustc jmche flurinh lifeixianshen bbyun28 amirunpri2018 ltj468 minghao2016 aabbccgithub pohjie kiwichen2003 xuzhang5788 janos-code cgh2797 elianaly jxshi xzhang2016 cameron2363 bayeslabs zergey deep-learning-aided-drug-designing ronak-44 amanzadi hcji jhkcia mnikdan ieee820 zengyuni rezaebrh maryghysr fanganpai dominikaj09 zhenglz joshuagithub archit342000 dualword greitzmann floscha masterwhook liuyunwu slnguyen alperenbolat0 thinnerichs rjd55 jungel2star ddnguyenmath lihong-zhang1 dongoknam zhangjiahuan17 xuelianl shunsunsun gieses tanxiaoqin888 chbwang20 ryankzhu skye-jiang sinaghadermarzi sailfish009 kiki-win shunyang2018 jiamingouyang vidok-bk johnshouie yifanyang0810 toddlin9 yumengmao61 lizongquan01 rnaimehaom deepdti attiqrafiq fatihgonen tejasgautam viegof ksun63 living1069 k-ee-nn guanzhuang12138 gaoshan2006 jnithishkumar akshayonly marvinvanaalst souvik-snh gilkang phazerooman gchb4hk menna44 jyaacoub

deepdta's Issues

arguments.py err

in arguments.py,
type of "--seq_window_lengths", "--smi_window_lengths","--num_windows" are "int"
(Also, I don't know what to assign value for these three.)

but, at "general_nfold_cv" in "run_exeperiments.py"
The code looks like this:
len(paramser1)*len(paramser2)*len(paramser3)

but! int has no len() !!
what should I do?

also , another ploblem.
in general_nfold_cv

this is may be error...
param1value = paramset1[param1ind]
param2value = paramset2[param2ind]
param3value = paramset3[param3ind]

Complete Code

Hi, is it a complete code for the paper "deepDTA" or just some helper functions code?
Thanks

What is the use of two similarity files in the data?

Hello, I saw your dataset introduction and ran your code. First, I confirmed that, according to the information I understand, your method is actually based on the smiles-protein sequence as input, so it is not used. Drug structure information or protein structure information, right?

In addition, I did not see the use of kiba_drug_sim.txt and kiba_target_sim.txt files in the code, that is, these two files are redundant and useless?

Maybe I don't fully understand your method, please also give me a lot of guidance, thank you!

Could the pre-trained model be made available?

Hi @hkmztrk,

Congratulations on DeepDTA! I really enjoyed going through the work. I was wondering whether you would be be willing to share the weights of the models you trained in the process. I am building some DTI models myself and would like to benchmark against your results.

error message when I ran /DeepDTA/deepdta-toy/run_experiments.py

I got the following error message:

File "run_experiments.py", line 548, in
os.makedirs(FLAGS.log_dir)
File "/home/pharma1/venv_silico/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/tmp1588488527.3336222/'

Please help me. Many thanks

Help with retraining the combined model with a different dataset

Hi there,

I am interested in retraining your model with a different dataset, but I am having trouble downloading the right version of TensorFlow. I was able to create a conda environment with python 2.7, but I couldn't install any version of the TensorFlow via either pip or conda that is compatible with python 2.7 (they all require at least python 3.5). In this case, do you happen to have any suggestions to bypass this issue? (For you context, I am working with a windows machine)

Best,
Oliver

Some questions about metrics

Hello，I run the code inside "source" on Windows and it works well. But now I want to get more metrics of this model. There is two questions and I hope to get some helps from you.

In "run_experiments.py" there is a funtion named "cindex_score" to calculate Concordance Index .
And "emetrics.py" also have a function "get_cindex" to get CI. What's the differences between the two functions?
I noticed that there is another two metrics function in "emetrics.py", get_aupr and get_rm2. But it seems that they are not used in the code.
I try to use them by changing the code in function "build_combined_categorical",from
interactionModel.compile(optimizer='adam', loss='mean_squared_error', metrics=[cindex_score])
to
interactionModel.compile(optimizer='adam', loss='mean_squared_error', metrics=[cindex_score , get_aupr])
But the result is

and if I add rm2 as a metric, code is
interactionModel.compile(optimizer='adam', loss='mean_squared_error', metrics=[cindex_score , get_rm2])
the result is

I wonder how to use the two metric functions correctly.

Sorry for disturbing you. And looking Forward to your reply!
Best wishes!

Regarding data leakage caused by amino acid sequences in the data set

In the data/davis/Proteins.txt file, different mutations of the EGFR gene (such as G719C, G719S, L747E749del) appear to have the same amino acid sequence. This raised a concern for me: why were these genetic mutations not reflected in the amino acid sequence? Furthermore, these samples all have the same affinity in the dataset. Randomly divide 20% as the test data set. About 25% of the data in the test data has appeared in the training data. I am worried that there is a potential risk of data leakage in the model based on these sequences.

MSE for davis dataset is very high than that is mentioned in the paper

DeepDTA paper mentioned that the MSE for davis dataset is 0.261 but I did not find the value nearer when I run the code.

I cloned the DeepDTA code from github. Created the environment mentioned in the Readme.md file and then tried running the code. But the loss for davis dataset is very huge, though for KIBA the loss is as per expectations. Did I miss something while setting up the code? I have attached the screenshot of training in this comment.

DeepDTA: deep drug–target binding affinity prediction

Hello, if I have the sequences of a given protein and the smiles of a given drug, how can I use your DeepDTA program to predict the drug–target binding affinity? Look forward to your reply.

The meaning of name for ligands

Dear hkmztrk,

I am new to this area. Could you please tell me what does it means by can and iso for ligands?

Best regards,
pykao

What does the data mean?

Hello, I am concerned about the work you do, and then I try to use my data, but the data under your data file, I do not quite understand what it means, can provide a data description? For example, data/davis What do you mean by each txt file under the file? Can you give me a guide? Thank you.

Reimplementation of DeepDTA with Pytorch

Hi there,

I was experimenting with your model previously and built a pytorch model to the best of my knowledge. Just want to leave a comment here to direct people to check out my repo if you don't mind :)

The repo is here: https://github.com/KSUN63/DeepDTA-Pytorch

Best,
Oliver

Adding prediction code?

Hello, are you considering adding a prediction code script? For example, after I have trained the model, I want to use the trained model to predict the affinity of molecules and proteins. I have smiles files of molecules and sequence files of proteins. I can run the prediction script directly, get the prediction results, and save the prediction results as files.

version of python

Hi,

It is kind of confused. Do you need python version less than 3.4?

Best,
Po-Yu Kao

Help with installation

Hi,
I am new to python and would like to install and use DeepDTA, but the README file does not give any details about installation. Please could you let me know how best to install and start running DeepDTA? Is there a way I can install it with pip? Do I have to clone the repo? If so, how do I start using the package from there?
Any help much appreciated.

Question about implementation

hi i have attempted to try and study the code and thanks for your effort
this may be a silly question but
in the file "run_experiments.py" i found that in line 535,536 there weren't any parameters for get_cindex and build_combined_categorical, but in the file "emetrics.py" the function get_cindex(Y, P) requires Y and P and build_combined_categorical requires 4.
do i have to fill it up myself? in case can you explain Y and P for me?

some questions

Hi!I have a few questions to ask:

After installing the required environment (where python=2.7)
I moved the data folder to the source folder and ran the code using the "bash go.sh" command. Because the KIBA data set is used in the go.sh file, I tried to reproduce the data set first (I don't know if I understand it correctly).
I ran on the server for 3 days and only ran 54 epoch. The results are as follows:

Is this running speed normal?
Are the obtained parameters close to your results?
If I want to reproduce Davis dataset, I need to change the default=0 of the arguments.py file to default=1. Is it right?

I have just come into contact with this field and am very interested in your code. I want to reproduce your code and hope to receive your reply. Thank you.

Running "S-W and Pubchem-Sim" case from DeepDTA

At first thanks for your interesting work and framework. I run its CNN-CNN case and it's constructive for me. Now I want to run S-W and Pubchem-Sim case instead of CNN-CNN case. for this purpose I replace build_baseline function in run_regression function and run the code but I receive untrackable errors. Maybe the required data for this is not in your default database? or other settings (like TensorFlow version and etc.) that I must observe in my run? I ran code in google colab. Could I have your advice on this?

How to embed proteins using ligands' smiles?

Thank you for your post. I read your paper, but I am not very clear about how to represent proteins using ligands' smiles. Would you like to explain it again? Many thanks

Question about KIBA score

Hello, I am very interested in your work.

In the KIBA dataset, you use the KIBA score as suggested in
'Making Sense of Large-Scale Kinase Inhibitor Bioactivity Data Sets: A Comparative and Integrative Analysis'. (Tang et al, 2014)

It seems the KIBA score suggested in Tang's research and DeepDTA is different.

For example, the KIBA score between 'CHEMBL98350' and 'P48736' is reported as 3.21982 in Tang et al. However, the KIBA score of DeepDTA is not.

So I would like to know how you make the KIBA score detail, such as the parameter of Li and Ld when calculating the KIBA score.

Thank you

loss value

Hi, I am interested in the work, and then I test this model on the DAVIS dataset. During the training, the loss value is very large. Have you ever met? What could be the reason?

How can I obtain Pickel file Y for my own ligands and protien datasets ?

new datas

Hello, I have successfully run 'python run_experiments.py' on the server, how can I use your model to predict my data? For example, I have some compound smiles, protein sequences, and their IC50s. Looking forward to receiving your answer, thank you

I’ve tested on my mac

Hi, I've tried to run your code on my Mac, after I input the command the terminal just looks like this.
Would you take a look, and see if this is normal?

_YJMacAir:source yuanjun$ sudo python3 run_experiments.py --num_windows 32 --seq_window_lengths 8 12 --smi_window_lengths 4 8 --batch_size 256 --num_epoch 100 --max_seq_len 1000 --max_smi_len 100 --dataset_path 'data/kiba/‘ --problem_type 1 --log_dir 'logs'

Whether to consider the use of protein 3D structure to predict affinity

Hello, I recently saw some predictions about the use of protein 3D structures to predict affinity. Using 3D convolutional neural networks, are you interested? Or have you been exposed to some feature extraction methods for protein 3D structures, can you share them with me? ?

How to open Y correctly?

Hi, I am using your code to learn how to predict interactions between drugs and proteins. I have a problem about how to open the file Y. It is random code when I open the file Y. I have tried many methods to solve it, but these methods do not work. So how to open Y correctly? Should I decode it or open it on Linux?

Question about cindex_score function

Hi! Thank you very much for the awesome work!

I have a question regarding the c-index below.

DeepDTA/deepdta-toy/run_experiments.py

Line 404 in 62b157c

def cindex_score(y_true, y_pred):

My question is: should the line tf.cast(g == 0.0, tf.float32) be tf.cast(tf.math.equal(g, 0.0), tf.float32) ?

Thank you very much in advance.

Kyohei

A question about Davis dataset

Hello!
In your article 'DeepDTA: Deep Drug-Target Binding Affinity Prediction', you explained the two dataset: Davis and KIBA, and that's a good summary.
But I have a qusetion that in Davis dataset, you say there is 68 drugs, but the original dataset said they have 72 drug. I am wondering why,could you tell me?

the question about the two datasets.

targets from the Davis dataset are human proteins? could you mind shared the UniprotID of targets of the Davis dataset?

About Confidence and interval and p value

Could you please let us know how did you calculate the Confidence interval and p-value? I did not see the codes for this calculation. I would appreciate if you could upload them.
Thank you very much.

bestparamind is not defined in run_experiments.py

File "run_experiments.py", line 256, in nfold_1_2_3_setting_sample
testperf = all_predictions[bestparamind]##pointer pos
TypeError: bestparamind is not defined

what does Y represent in your deepDTA toy data?

Hi, what does Y represent in your deepDTA toy data? I need to replicate data for my own purpose . So I need to know how many columns and rows are there? Can you give a brief description of the dataset you used for the toy data?

model save and load

Thank you very much for your contribution!!

Could you add a model save and load function?

Prediction of binding affinity using DeepDTA interpretation

First of all, thank you for developing this framework. I tested the deepdta-toy and I was not super sure how to find the affinity values of interest. I used the exact same files as the ones in the deepdta-toy folder and added the Y to mytest folder. I think I got 4 txt files (predicted labels 0-3) with 4 numbers in each file. Are these files describing the affinity? If so, which file should I use to look for the predicted affinity between each compound and the proteins?

Thanks for your help in advance!

setting up number of training for experiments

Hi @running Z @hkmztrk how do u set the number of trainings for the experiments?
Can you provide me a code snippet?

During implementation on my own dataset, I got weird results

Hi! always thank you for your nice work in Drug Target prediction.

I run the run_experiments.py in the deepdta-toy folder for training my own dataset.
But I only could get this result.

Could I get any advice about this result?

I guess since my GPU is 3090RTX, I installed conda environments with tensorflow2.4.1 and keras 2.4.3. (for more detail, I added txt file) it can be some problem during backpropagation or something in your code which based on tf 1.x version.
env.txt

or maybe I misunderstood the dataset format for training on my own data.

I made My_train, My_test datafolder for saving my own dataset.
Both have same format of ligands.tab/proteins.fasta/Y.tab (at the first place, I made My_train dataset along with DTC folder which you gave us training dataset in the ReadMe. But with this format of training dataset, run_experiments.py required 'proteins.fasta' file in the training folder which is not included in your DTC folder. So I changed My_train folder dataset along with 'mytest' folder)

I'm pretty newbie in Computer Science field TT.
The code is running. but only result is weird, So I don't know how to debug this.
If you let me know something suspicious, I will inspect that point.

Best regards,

TypeError: read_sets() takes 2 positional arguments but 3 were given

I'm trying to run your code, but I encounter the following error:

Traceback (most recent call last):
  File "run_experiments.py", line 547, in <module>
    run_regression( FLAGS )
  File "run_experiments.py", line 534, in run_regression
    experiment(FLAGS, perfmeasure, deepmethod)
  File "run_experiments.py", line 494, in experiment
    need_shuffle = False )
  File "/Users/lorenzo/Downloads/DeepDTA/source/datahelper.py", line 122, in __init__
    self._raw = self.read_sets( fpath, setting_no )
TypeError: read_sets() takes 2 positional arguments but 3 were given

I saw you updated the repository a few days ago. Did you try running it with this changes? Thank you.

I have some questions, can you help me！！！

When I run deepdta-toy/run_experiments.py, I always have Traceback (most recent call last):
File "run_experiments.py", line 551, in
prepare_new_data(FLAGS. test_path, test=True)
File "C:\Users\75172\Desktop\School of Life Sciences\code\DeepDTA-master\deepdta-toy\testdatahelper.py", line 14, in prepare_new_data
prots = read_proteins(fpath)
File "C:\Users\75172\Desktop\School of Life Sciences\code\DeepDTA-master\deepdta-toy\testdatahelper.py", line 60, in read_proteins
with open(filename) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/mytest/proteins.fasta'
this mistake

Filter length

DeepDTA/deepdta-toy/run_experiments.py

Line 75 in a3405c6

 encode_protein = Conv1D(filters=NUM_FILTERS*2, kernel_size=FILTER_LENGTH1, activation='relu', padding='valid', strides=1)(encode_protein) 

DeepDTA/deepdta-toy/run_experiments.py

Line 76 in a3405c6

 encode_protein = Conv1D(filters=NUM_FILTERS*3, kernel_size=FILTER_LENGTH1, activation='relu', padding='valid', strides=1)(encode_protein) 

In lines 75 and 76 it says FILTER_LENGTH1. Should it say FILTER_LENGTH2?

Awesome work and repository BTW

Where can I download Kinese KIBA and Davis datasets?

I'm implementing DeepDTA with TensorFlow. By the way, I couldn't download KIBA and Davis Datasets.
Where can I download Kinese KIBA and Davis datasets?

Thanks in advance.

hkmztrk / deepdta Goto Github PK

deepdta's Introduction

About DeepDTA: deep drug-target binding affinity prediction

Installation

Data

Requirements

Usage

deepdta's People

Contributors

Stargazers

Watchers

Forkers

deepdta's Issues

Recommend Projects

Recommend Topics

Recommend Org