deargen / mt-dti Goto Github PK

View Code? Open in Web Editor NEW

34.0 8.0 20.0 44 KB

An official Molecule Transformer Drug Target Interaction (MT-DTI) model

License: MIT License

Python 100.00%

mt-dti dti drug-target-interactions molecule-transformer drug protein affinity drug-discovery

mt-dti's Introduction

MT-DTI

An official Molecule Transformer Drug Target Interaction (MT-DTI) model

Author: Bonggun Shin
Paper: Shin, B., Park, S., Kang, K. & Ho, J.C.. (2019). Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction. Proceedings of the 4th Machine Learning for Healthcare Conference, in PMLR 106:230-248

Required Files

Download data.tar.gz

 cd mt-dti
 wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=16dTynXCKPPdvQq4BiXBdQwNuxilJbozR' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=16dTynXCKPPdvQq4BiXBdQwNuxilJbozR" -O data.tar.gz && rm -rf /tmp/cookies.txt
 tar -zxvf data.tar.gz

This includes;
- Orginal KIBA dataset from DeepDTA
- tfrecord for KIBA dataset
- Pretrained weights of the molecule transformer
- Finetuned weights of the MT-DTI model for KIBA fold0

Unzip it (folder name is data) and place under the project root

cd mt-dti
# place the downloaded file (data.tar.gz) at "mt-dti"
tar xzfv data.tar.gz

These files sholud be in the right places

mt-dti/data/chembl_to_cids.txt
mt-dti/data/CID_CHEMBL.tsv
mt-dti/data/kiba/*
mt-dti/data/kiba/folds/*
mt-dti/data/kiba/mbert_cnn_v1_lr0.0001_k12_k12_k12_fold0/*
mt-dti/data/kiba/tfrecord/*.tfrecord
mt-dti/data/pretrain/*
mt-dti/data/pretrain/mbert_6500k/*

VirtualEnv

install mkvirtualenv
create a dti env with the following commands

mkvirtualenv --python=`which python3` dti
pip install tensorflow-gpu==1.12.0

Preprocessing

If downloaded data.tar.gz, then you can skip these preprocessings
Transform kiba dataset into one pickle file

python kiba_to_pkl.py 

# Resulted files
mt-dti/data/kiba/kiba_b.cpkl

Prepare Tensorflow Record files

cd src/preprocess
export PYTHONPATH='../../'
python tfrecord_writer.py 

# Resulted files
mt-dti/data/kiba/tfrecord/*.tfrecord

PreTraining

Download Pubchem smiles

$ head CID-SMILES
1	CC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
2	CC(=O)OC(CC(=O)O)C[N+](C)(C)C
3	C1=CC(C(C(=C1)C(=O)O)O)O
4	CC(CN)O
5	C(C(=O)COP(=O)(O)O)N
6	C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl
7	CCN1C=NC2=C(N=CN=C21)N
8	CCC(C)(C(C(=O)O)O)O
9	C1(C(C(C(C(C1O)O)OP(=O)(O)O)O)O)O

Split into several files
- CID-SMILES -> smiles00.txt, smiles01.txt, ...
- Place these files to
```
 mt-dti/data/pretrain/molecule/smiles*
```
Make tfrecords for pretraining

cd src/pretrain
export PYTHONPATH='../../'
python tfrecord_smiles.py

This will create tfrecord files in the "output folder" of your google cloud storage

# for example
gs://your_gs/mbert/tfr/smiles.001
gs://your_gs/mbert/tfr/smiles.002
...

Now pretrain (Need TPU in google cloud)

cd src/pretrain
export PYTHONPATH='../../'
python pretrain_smiles_tpu.py

The resulting pretrained model will be stored at the checkpoint folder of your google cloud storage

# for example
gs://your_gs/mbert/pretrain-mini/model.ckpt-6500000.*

Result

INFO:tensorflow:Saving checkpoints for 6500000 into gs://bdti/mbert/pretrain/model.ckpt.
INFO:tensorflow:loss = 0.098096184, step = 6500000 (48.736 sec)
INFO:tensorflow:global_step/sec: 20.5185
INFO:tensorflow:examples/sec: 10505.5
INFO:tensorflow:Stop infeed thread controller
INFO:tensorflow:Shutting down InfeedController thread.
INFO:tensorflow:InfeedController received shutdown signal, stopping.
INFO:tensorflow:Infeed thread finished, shutting down.
INFO:tensorflow:infeed marked as finished
INFO:tensorflow:Stop output thread controller
INFO:tensorflow:Shutting down OutfeedController thread.
INFO:tensorflow:OutfeedController received shutdown signal, stopping.
INFO:tensorflow:Outfeed thread finished, shutting down.
INFO:tensorflow:outfeed marked as finished
INFO:tensorflow:Shutdown TPU system.
INFO:tensorflow:Loss for final step: 0.098096184.
INFO:tensorflow:training_loop marked as finished

mini model
INFO:tensorflow:***** Eval results *****
INFO:tensorflow:  global_step = 6500000
INFO:tensorflow:  loss = 0.15356757
INFO:tensorflow:  masked_lm_accuracy = 0.94406235
INFO:tensorflow:  masked_lm_loss = 0.1413514

FineTuning

If downloaded data.tar.gz, then you can skip this finetuning

cd src/finetune
export PYTHONPATH='../../'
python finetune_demo.py

Prediction

cd src/predict
export PYTHONPATH='../../'
python predict_demo.py

mt-dti's People

Contributors

Stargazers

Watchers

mt-dti's Issues

errors when I ran finetune_demo.py

Thank you for your upload.
I finetuned v11 and v14, it went great. However, when I do the same thing for v1, v2, v3 and v4, I got different error messages as the followings:

$ python finetune_demo.py --fold 0 --model_version 1
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v1 of <src.finetune.dti_model.MbertPcnnModel object at 0x7fc00f61c400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc00f61c470>, '_num_ps_replicas': 0, '_task_id': 0, '_cluster': None, '_num_worker_replicas': 1, '_save_checkpoints_steps': 150, '_global_id_in_cluster': 0, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
, '_model_dir': '../../data/kiba/mbert_cnn_v1_lr0.0001_k12_k12_k12_fold0/', '_tf_random_seed': None, '_keep_checkpoint_max': 5, '_evaluation_master': '', '_master': '', '_device_fn': None, '_train_distribute': None, '_task_type': 'worker', '_is_chief': True, '_save_summary_steps': 100, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_log_step_count_steps': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 153974.
INFO:tensorflow:Finished training up to step 153974. Elapsed seconds 0.
INFO:tensorflow:************************** [kiba-V1-lr(0.0001)-f(12,12,12)step(153974/153974)] ***************************
INFO:tensorflow:************************** Final (sel_mse) Best @ [0] ***************************
INFO:tensorflow:********** [dev] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********** [tst] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********************************************************************
INFO:tensorflow:************************** [kiba-V1-lr(0.0001)-f(12,12,12)step(153974/153974)] ***************************
INFO:tensorflow:************************** Final(sel_ci) Best @ [0] ***************************
INFO:tensorflow:********** [dev] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********** [tst] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********************************************************************

$ python finetune_demo.py --fold 0 --model_version 2
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v2 of <src.finetune.dti_model.MbertPcnnModel object at 0x7fd67286a400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd67286a470>, '_train_distribute': None, '_task_type': 'worker', '_cluster': None, '_keep_checkpoint_every_n_hours': 10000, '_tf_random_seed': None, '_evaluation_master': '', '_service': None, '_global_id_in_cluster': 0, '_model_dir': '../../data/kiba/mbert_cnn_v2_lr0.0001_k12_k12_k12_fold0/', '_master': '', '_save_summary_steps': 100, '_device_fn': None, '_save_checkpoints_steps': 150, '_keep_checkpoint_max': 5, '_num_ps_replicas': 0, '_save_checkpoints_secs': None, '_num_worker_replicas': 1, '_log_step_count_steps': None, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_task_id': 0, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 0.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*********************************** MbertPcnnModel V2 ***********************************
Traceback (most recent call last):
File "finetune_demo.py", line 374, in
tf.app.run(main)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "finetune_demo.py", line 337, in main
estimator.train(input_fn=input_fn_trn, max_steps=next_checkpoint)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2162, in _call_model_fn
features, labels, mode, config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2391, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1244, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1505, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 826, in model_fn_v2
cnn_molecule = DeepConvolutionModelWithoutEmbedding(config_molecule, training, molecule_tokens)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 207, in init
kernel_size=config.kernel_size,
AttributeError: 'DeepConvolutionModelConfig' object has no attribute 'kernel_size'

$ python finetune_demo.py --fold 0 --model_version 3
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v3 of <src.finetune.dti_model.MbertPcnnModel object at 0x7fe76421b400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_evaluation_master': '', '_cluster': None, '_num_worker_replicas': 1, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe76421b470>, '_task_type': 'worker', '_task_id': 0, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_service': None, '_save_checkpoints_steps': 150, '_keep_checkpoint_every_n_hours': 10000, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
, '_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_save_checkpoints_secs': None, '_device_fn': None, '_train_distribute': None, '_num_ps_replicas': 0, '_save_summary_steps': 100, '_model_dir': '../../data/kiba/mbert_cnn_v3_lr0.0001_k12_k12_k12_fold0/', '_is_chief': True, '_tf_random_seed': None, '_log_step_count_steps': None, '_master': ''}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 0.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*********************************** MbertPcnnModel V3 ***********************************
Traceback (most recent call last):
File "finetune_demo.py", line 374, in
tf.app.run(main)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "finetune_demo.py", line 337, in main
estimator.train(input_fn=input_fn_trn, max_steps=next_checkpoint)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2162, in _call_model_fn
features, labels, mode, config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2391, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1244, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1505, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 1031, in model_fn_v3
scaffold_fn=scaffold_fn)
TypeError: new() got an unexpected keyword argument 'training_hooks'

$ python finetune_demo.py --fold 0 --model_version 4
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v4 of <src.finetune.dti_model.MbertPcnnModel object at 0x7f4934566400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_save_summary_steps': 100, '_evaluation_master': '', '_master': '', '_is_chief': True, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': '../../data/kiba/mbert_cnn_v4_lr0.0001_k12_k12_k12_fold0/', '_save_checkpoints_secs': None, '_save_checkpoints_steps': 150, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
, '_log_step_count_steps': None, '_keep_checkpoint_max': 5, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4934566470>, '_cluster': None, '_tf_random_seed': None, '_num_worker_replicas': 1, '_global_id_in_cluster': 0, '_device_fn': None, '_train_distribute': None, '_task_type': 'worker', '_num_ps_replicas': 0}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 0.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*********************************** MbertPcnnModel V4 ***********************************
Traceback (most recent call last):
File "finetune_demo.py", line 374, in
tf.app.run(main)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "finetune_demo.py", line 337, in main
estimator.train(input_fn=input_fn_trn, max_steps=next_checkpoint)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2162, in _call_model_fn
features, labels, mode, config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2391, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1244, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1505, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 1144, in model_fn_v4
loss, self.learning_rate, self.num_train_steps, self.num_warmup_steps, self.use_tpu)
File "/mnt/HD/mt-dti/src/bert/optimization.py", line 66, in create_optimizer
grads = tf.gradients(loss, tvars)
NameError: name 'tvars' is not defined

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.