Giter Club home page Giter Club logo

deepdta's Introduction

About DeepDTA: deep drug-target binding affinity prediction

The approach used in this work is the modeling of protein sequences and compound 1D representations (SMILES) with convolutional neural networks (CNNs) to predict the binding affinity value of drug-target pairs.

Figure

Installation

Data

Please see the README for detailed explanation.

Requirements

You'll need to install following in order to run the codes. Refer to deepdta.yml for a conda environment tested in Linux.

You have to place "data" folder under "source" directory.

Usage

python run_experiments.py --num_windows 32 \
                          --seq_window_lengths 8 12 \
                          --smi_window_lengths 4 8 \
                          --batch_size 256 \
                          --num_epoch 100 \
                          --max_seq_len 1000 \
                          --max_smi_len 100 \
                          --dataset_path 'data/kiba/' \
                          --problem_type 1 \
                          --log_dir 'logs/'


For citation:

@article{ozturk2018deepdta,
  title={DeepDTA: deep drug--target binding affinity prediction},
  author={{\"O}zt{\"u}rk, Hakime and {\"O}zg{\"u}r, Arzucan and Ozkirimli, Elif},
  journal={Bioinformatics},
  volume={34},
  number={17},
  pages={i821--i829},
  year={2018},
  publisher={Oxford University Press}
}

deepdta's People

Contributors

hkmztrk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepdta's Issues

arguments.py err

in arguments.py,
type of "--seq_window_lengths", "--smi_window_lengths","--num_windows" are "int"
(Also, I don't know what to assign value for these three.)

but, at "general_nfold_cv" in "run_exeperiments.py"
The code looks like this:
len(paramser1)*len(paramser2)*len(paramser3)

but! int has no len() !!
what should I do?

also , another ploblem.
in general_nfold_cv

this is may be error...
param1value = paramset1[param1ind]
param2value = paramset2[param2ind]
param3value = paramset3[param3ind]

Complete Code

Hi, is it a complete code for the paper "deepDTA" or just some helper functions code?
Thanks

What is the use of two similarity files in the data?

Hello, I saw your dataset introduction and ran your code. First, I confirmed that, according to the information I understand, your method is actually based on the smiles-protein sequence as input, so it is not used. Drug structure information or protein structure information, right?

In addition, I did not see the use of kiba_drug_sim.txt and kiba_target_sim.txt files in the code, that is, these two files are redundant and useless?

Maybe I don't fully understand your method, please also give me a lot of guidance, thank you!

Could the pre-trained model be made available?

Hi @hkmztrk,

Congratulations on DeepDTA! I really enjoyed going through the work. I was wondering whether you would be be willing to share the weights of the models you trained in the process. I am building some DTI models myself and would like to benchmark against your results.

error message when I ran /DeepDTA/deepdta-toy/run_experiments.py

I got the following error message:

File "run_experiments.py", line 548, in
os.makedirs(FLAGS.log_dir)
File "/home/pharma1/venv_silico/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/tmp1588488527.3336222/'

Please help me. Many thanks

Help with retraining the combined model with a different dataset

Hi there,

I am interested in retraining your model with a different dataset, but I am having trouble downloading the right version of TensorFlow. I was able to create a conda environment with python 2.7, but I couldn't install any version of the TensorFlow via either pip or conda that is compatible with python 2.7 (they all require at least python 3.5). In this case, do you happen to have any suggestions to bypass this issue? (For you context, I am working with a windows machine)

Best,
Oliver

Some questions about metrics

Hello,I run the code inside "source" on Windows and it works well. But now I want to get more metrics of this model. There is two questions and I hope to get some helps from you.

  1. In "run_experiments.py" there is a funtion named "cindex_score" to calculate Concordance Index .
    And "emetrics.py" also have a function "get_cindex" to get CI. What's the differences between the two functions?

  2. I noticed that there is another two metrics function in "emetrics.py", get_aupr and get_rm2. But it seems that they are not used in the code.
    I try to use them by changing the code in function "build_combined_categorical",from
    interactionModel.compile(optimizer='adam', loss='mean_squared_error', metrics=[cindex_score])
    to
    interactionModel.compile(optimizer='adam', loss='mean_squared_error', metrics=[cindex_score , get_aupr])
    But the result is
    image
    and if I add rm2 as a metric, code is
    interactionModel.compile(optimizer='adam', loss='mean_squared_error', metrics=[cindex_score , get_rm2])
    the result is
    image
    I wonder how to use the two metric functions correctly.

Sorry for disturbing you. And looking Forward to your reply!
Best wishes!

Regarding data leakage caused by amino acid sequences in the data set

In the data/davis/Proteins.txt file, different mutations of the EGFR gene (such as G719C, G719S, L747E749del) appear to have the same amino acid sequence. This raised a concern for me: why were these genetic mutations not reflected in the amino acid sequence? Furthermore, these samples all have the same affinity in the dataset. Randomly divide 20% as the test data set. About 25% of the data in the test data has appeared in the training data. I am worried that there is a potential risk of data leakage in the model based on these sequences.
1706088241770

MSE for davis dataset is very high than that is mentioned in the paper

DeepDTA paper mentioned that the MSE for davis dataset is 0.261 but I did not find the value nearer when I run the code.

I cloned the DeepDTA code from github. Created the environment mentioned in the Readme.md file and then tried running the code. But the loss for davis dataset is very huge, though for KIBA the loss is as per expectations. Did I miss something while setting up the code? I have attached the screenshot of training in this comment.
davis_issue

The meaning of name for ligands

Dear hkmztrk,

I am new to this area. Could you please tell me what does it means by can and iso for ligands?

Best regards,
pykao

What does the data mean?

Hello, I am concerned about the work you do, and then I try to use my data, but the data under your data file, I do not quite understand what it means, can provide a data description? For example, data/davis What do you mean by each txt file under the file? Can you give me a guide? Thank you.

Adding prediction code?

Hello, are you considering adding a prediction code script? For example, after I have trained the model, I want to use the trained model to predict the affinity of molecules and proteins. I have smiles files of molecules and sequence files of proteins. I can run the prediction script directly, get the prediction results, and save the prediction results as files.

version of python

Hi,

It is kind of confused. Do you need python version less than 3.4?

Best,
Po-Yu Kao

Help with installation

Hi,
I am new to python and would like to install and use DeepDTA, but the README file does not give any details about installation. Please could you let me know how best to install and start running DeepDTA? Is there a way I can install it with pip? Do I have to clone the repo? If so, how do I start using the package from there?
Any help much appreciated.

Question about implementation

hi i have attempted to try and study the code and thanks for your effort
this may be a silly question but
in the file "run_experiments.py" i found that in line 535,536 there weren't any parameters for get_cindex and build_combined_categorical, but in the file "emetrics.py" the function get_cindex(Y, P) requires Y and P and build_combined_categorical requires 4.
do i have to fill it up myself? in case can you explain Y and P for me?

some questions

Hi!I have a few questions to ask:

  1. After installing the required environment (where python=2.7)
    I moved the data folder to the source folder and ran the code using the "bash go.sh" command. Because the KIBA data set is used in the go.sh file, I tried to reproduce the data set first (I don't know if I understand it correctly).
    I ran on the server for 3 days and only ran 54 epoch. The results are as follows:
    image
    image
    image
    Is this running speed normal?
    Are the obtained parameters close to your results?

  2. If I want to reproduce Davis dataset, I need to change the default=0 of the arguments.py file to default=1. Is it right?

I have just come into contact with this field and am very interested in your code. I want to reproduce your code and hope to receive your reply. Thank you.

Running "S-W and Pubchem-Sim" case from DeepDTA

At first thanks for your interesting work and framework. I run its CNN-CNN case and it's constructive for me. Now I want to run S-W and Pubchem-Sim case instead of CNN-CNN case. for this purpose I replace build_baseline function in run_regression function and run the code but I receive untrackable errors. Maybe the required data for this is not in your default database? or other settings (like TensorFlow version and etc.) that I must observe in my run? I ran code in google colab. Could I have your advice on this?

Question about KIBA score

Hello, I am very interested in your work.

In the KIBA dataset, you use the KIBA score as suggested in
'Making Sense of Large-Scale Kinase Inhibitor Bioactivity Data Sets: A Comparative and Integrative Analysis'. (Tang et al, 2014)

It seems the KIBA score suggested in Tang's research and DeepDTA is different.

For example, the KIBA score between 'CHEMBL98350' and 'P48736' is reported as 3.21982 in Tang et al. However, the KIBA score of DeepDTA is not.

So I would like to know how you make the KIBA score detail, such as the parameter of Li and Ld when calculating the KIBA score.

eq1

eq2

Thank you

loss value

Hi, I am interested in the work, and then I test this model on the DAVIS dataset. During the training, the loss value is very large. Have you ever met? What could be the reason?
1

new datas

Hello, I have successfully run 'python run_experiments.py' on the server, how can I use your model to predict my data? For example, I have some compound smiles, protein sequences, and their IC50s. Looking forward to receiving your answer, thank you

I’ve tested on my mac

Hi, I've tried to run your code on my Mac, after I input the command the terminal just looks like this.
Would you take a look, and see if this is normal?

_YJMacAir:source yuanjun$ sudo python3 run_experiments.py --num_windows 32 --seq_window_lengths 8 12 --smi_window_lengths 4 8 --batch_size 256 --num_epoch 100 --max_seq_len 1000 --max_smi_len 100 --dataset_path 'data/kiba/‘ --problem_type 1 --log_dir 'logs'

How to open Y correctly?

Hi, I am using your code to learn how to predict interactions between drugs and proteins. I have a problem about how to open the file Y. It is random code when I open the file Y. I have tried many methods to solve it, but these methods do not work. So how to open Y correctly? Should I decode it or open it on Linux?

A question about Davis dataset

Hello!
In your article 'DeepDTA: Deep Drug-Target Binding Affinity Prediction', you explained the two dataset: Davis and KIBA, and that's a good summary.
But I have a qusetion that in Davis dataset, you say there is 68 drugs, but the original dataset said they have 72 drug. I am wondering why,could you tell me?

About Confidence and interval and p value

Could you please let us know how did you calculate the Confidence interval and p-value? I did not see the codes for this calculation. I would appreciate if you could upload them.
Thank you very much.

what does Y represent in your deepDTA toy data?

Hi, what does Y represent in your deepDTA toy data? I need to replicate data for my own purpose . So I need to know how many columns and rows are there? Can you give a brief description of the dataset you used for the toy data?

model save and load

Thank you very much for your contribution!!

Could you add a model save and load function?

Prediction of binding affinity using DeepDTA interpretation

First of all, thank you for developing this framework. I tested the deepdta-toy and I was not super sure how to find the affinity values of interest. I used the exact same files as the ones in the deepdta-toy folder and added the Y to mytest folder. I think I got 4 txt files (predicted labels 0-3) with 4 numbers in each file. Are these files describing the affinity? If so, which file should I use to look for the predicted affinity between each compound and the proteins?

Thanks for your help in advance!

During implementation on my own dataset, I got weird results

Hi! always thank you for your nice work in Drug Target prediction.

I run the run_experiments.py in the deepdta-toy folder for training my own dataset.
But I only could get this result.

image

Could I get any advice about this result?

I guess since my GPU is 3090RTX, I installed conda environments with tensorflow2.4.1 and keras 2.4.3. (for more detail, I added txt file) it can be some problem during backpropagation or something in your code which based on tf 1.x version.
env.txt

or maybe I misunderstood the dataset format for training on my own data.

  1. I made My_train, My_test datafolder for saving my own dataset.
  2. Both have same format of ligands.tab/proteins.fasta/Y.tab (at the first place, I made My_train dataset along with DTC folder which you gave us training dataset in the ReadMe. But with this format of training dataset, run_experiments.py required 'proteins.fasta' file in the training folder which is not included in your DTC folder. So I changed My_train folder dataset along with 'mytest' folder)
image image image

I'm pretty newbie in Computer Science field TT.
The code is running. but only result is weird, So I don't know how to debug this.
If you let me know something suspicious, I will inspect that point.

Best regards,

TypeError: read_sets() takes 2 positional arguments but 3 were given

I'm trying to run your code, but I encounter the following error:

Traceback (most recent call last):
  File "run_experiments.py", line 547, in <module>
    run_regression( FLAGS )
  File "run_experiments.py", line 534, in run_regression
    experiment(FLAGS, perfmeasure, deepmethod)
  File "run_experiments.py", line 494, in experiment
    need_shuffle = False )
  File "/Users/lorenzo/Downloads/DeepDTA/source/datahelper.py", line 122, in __init__
    self._raw = self.read_sets( fpath, setting_no )
TypeError: read_sets() takes 2 positional arguments but 3 were given

I saw you updated the repository a few days ago. Did you try running it with this changes? Thank you.

I have some questions, can you help me!!!

When I run deepdta-toy/run_experiments.py, I always have Traceback (most recent call last):
File "run_experiments.py", line 551, in
prepare_new_data(FLAGS. test_path, test=True)
File "C:\Users\75172\Desktop\School of Life Sciences\code\DeepDTA-master\deepdta-toy\testdatahelper.py", line 14, in prepare_new_data
prots = read_proteins(fpath)
File "C:\Users\75172\Desktop\School of Life Sciences\code\DeepDTA-master\deepdta-toy\testdatahelper.py", line 60, in read_proteins
with open(filename) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/mytest/proteins.fasta'
this mistake

Filter length

encode_protein = Conv1D(filters=NUM_FILTERS*2, kernel_size=FILTER_LENGTH1, activation='relu', padding='valid', strides=1)(encode_protein)

encode_protein = Conv1D(filters=NUM_FILTERS*3, kernel_size=FILTER_LENGTH1, activation='relu', padding='valid', strides=1)(encode_protein)

In lines 75 and 76 it says FILTER_LENGTH1. Should it say FILTER_LENGTH2?

Awesome work and repository BTW

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.