Giter Club home page Giter Club logo

mt-dti's Introduction

MT-DTI

An official Molecule Transformer Drug Target Interaction (MT-DTI) model

Required Files

  • Download data.tar.gz

     cd mt-dti
     wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=16dTynXCKPPdvQq4BiXBdQwNuxilJbozR' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=16dTynXCKPPdvQq4BiXBdQwNuxilJbozR" -O data.tar.gz && rm -rf /tmp/cookies.txt
     tar -zxvf data.tar.gz
    
    • This includes;
      • Orginal KIBA dataset from DeepDTA
      • tfrecord for KIBA dataset
      • Pretrained weights of the molecule transformer
      • Finetuned weights of the MT-DTI model for KIBA fold0
  • Unzip it (folder name is data) and place under the project root

cd mt-dti
# place the downloaded file (data.tar.gz) at "mt-dti"
tar xzfv data.tar.gz
  • These files sholud be in the right places
mt-dti/data/chembl_to_cids.txt
mt-dti/data/CID_CHEMBL.tsv
mt-dti/data/kiba/*
mt-dti/data/kiba/folds/*
mt-dti/data/kiba/mbert_cnn_v1_lr0.0001_k12_k12_k12_fold0/*
mt-dti/data/kiba/tfrecord/*.tfrecord
mt-dti/data/pretrain/*
mt-dti/data/pretrain/mbert_6500k/*

VirtualEnv

  • install mkvirtualenv
  • create a dti env with the following commands
mkvirtualenv --python=`which python3` dti
pip install tensorflow-gpu==1.12.0

Preprocessing

  • If downloaded data.tar.gz, then you can skip these preprocessings

  • Transform kiba dataset into one pickle file

python kiba_to_pkl.py 

# Resulted files
mt-dti/data/kiba/kiba_b.cpkl
  • Prepare Tensorflow Record files
cd src/preprocess
export PYTHONPATH='../../'
python tfrecord_writer.py 

# Resulted files
mt-dti/data/kiba/tfrecord/*.tfrecord

PreTraining

  • Download Pubchem smiles
$ head CID-SMILES
1	CC(=O)OC(CC(=O)[O-])C[N+](C)(C)C
2	CC(=O)OC(CC(=O)O)C[N+](C)(C)C
3	C1=CC(C(C(=C1)C(=O)O)O)O
4	CC(CN)O
5	C(C(=O)COP(=O)(O)O)N
6	C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl
7	CCN1C=NC2=C(N=CN=C21)N
8	CCC(C)(C(C(=O)O)O)O
9	C1(C(C(C(C(C1O)O)OP(=O)(O)O)O)O)O
  • Split into several files

    • CID-SMILES -> smiles00.txt, smiles01.txt, ...
    • Place these files to
     mt-dti/data/pretrain/molecule/smiles*
    
  • Make tfrecords for pretraining

cd src/pretrain
export PYTHONPATH='../../'
python tfrecord_smiles.py 
  • This will create tfrecord files in the "output folder" of your google cloud storage
# for example
gs://your_gs/mbert/tfr/smiles.001
gs://your_gs/mbert/tfr/smiles.002
...
  • Now pretrain (Need TPU in google cloud)
cd src/pretrain
export PYTHONPATH='../../'
python pretrain_smiles_tpu.py
  • The resulting pretrained model will be stored at the checkpoint folder of your google cloud storage
# for example
gs://your_gs/mbert/pretrain-mini/model.ckpt-6500000.*

Result

INFO:tensorflow:Saving checkpoints for 6500000 into gs://bdti/mbert/pretrain/model.ckpt.
INFO:tensorflow:loss = 0.098096184, step = 6500000 (48.736 sec)
INFO:tensorflow:global_step/sec: 20.5185
INFO:tensorflow:examples/sec: 10505.5
INFO:tensorflow:Stop infeed thread controller
INFO:tensorflow:Shutting down InfeedController thread.
INFO:tensorflow:InfeedController received shutdown signal, stopping.
INFO:tensorflow:Infeed thread finished, shutting down.
INFO:tensorflow:infeed marked as finished
INFO:tensorflow:Stop output thread controller
INFO:tensorflow:Shutting down OutfeedController thread.
INFO:tensorflow:OutfeedController received shutdown signal, stopping.
INFO:tensorflow:Outfeed thread finished, shutting down.
INFO:tensorflow:outfeed marked as finished
INFO:tensorflow:Shutdown TPU system.
INFO:tensorflow:Loss for final step: 0.098096184.
INFO:tensorflow:training_loop marked as finished

mini model
INFO:tensorflow:***** Eval results *****
INFO:tensorflow:  global_step = 6500000
INFO:tensorflow:  loss = 0.15356757
INFO:tensorflow:  masked_lm_accuracy = 0.94406235
INFO:tensorflow:  masked_lm_loss = 0.1413514

FineTuning

  • If downloaded data.tar.gz, then you can skip this finetuning
cd src/finetune
export PYTHONPATH='../../'
python finetune_demo.py 

Prediction

cd src/predict
export PYTHONPATH='../../'
python predict_demo.py 

mt-dti's People

Contributors

bgshin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mt-dti's Issues

errors when I ran finetune_demo.py

Thank you for your upload.
I finetuned v11 and v14, it went great. However, when I do the same thing for v1, v2, v3 and v4, I got different error messages as the followings:

$ python finetune_demo.py --fold 0 --model_version 1
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v1 of <src.finetune.dti_model.MbertPcnnModel object at 0x7fc00f61c400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc00f61c470>, '_num_ps_replicas': 0, '_task_id': 0, '_cluster': None, '_num_worker_replicas': 1, '_save_checkpoints_steps': 150, '_global_id_in_cluster': 0, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
, '_model_dir': '../../data/kiba/mbert_cnn_v1_lr0.0001_k12_k12_k12_fold0/', '_tf_random_seed': None, '_keep_checkpoint_max': 5, '_evaluation_master': '', '_master': '', '_device_fn': None, '_train_distribute': None, '_task_type': 'worker', '_is_chief': True, '_save_summary_steps': 100, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_log_step_count_steps': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 153974.
INFO:tensorflow:Finished training up to step 153974. Elapsed seconds 0.
INFO:tensorflow:************************** [kiba-V1-lr(0.0001)-f(12,12,12)step(153974/153974)] ***************************
INFO:tensorflow:************************** Final (sel_mse) Best @ [0] ***************************
INFO:tensorflow:********** [dev] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********** [tst] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********************************************************************
INFO:tensorflow:************************** [kiba-V1-lr(0.0001)-f(12,12,12)step(153974/153974)] ***************************
INFO:tensorflow:************************** Final(sel_ci) Best @ [0] ***************************
INFO:tensorflow:********** [dev] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********** [tst] mse: 10000.000000 ci 0.000000 **********
INFO:tensorflow:********************************************************************

$ python finetune_demo.py --fold 0 --model_version 2
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v2 of <src.finetune.dti_model.MbertPcnnModel object at 0x7fd67286a400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd67286a470>, '_train_distribute': None, '_task_type': 'worker', '_cluster': None, '_keep_checkpoint_every_n_hours': 10000, '_tf_random_seed': None, '_evaluation_master': '', '_service': None, '_global_id_in_cluster': 0, '_model_dir': '../../data/kiba/mbert_cnn_v2_lr0.0001_k12_k12_k12_fold0/', '_master': '', '_save_summary_steps': 100, '_device_fn': None, '_save_checkpoints_steps': 150, '_keep_checkpoint_max': 5, '_num_ps_replicas': 0, '_save_checkpoints_secs': None, '_num_worker_replicas': 1, '_log_step_count_steps': None, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_task_id': 0, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 0.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*********************************** MbertPcnnModel V2 ***********************************
Traceback (most recent call last):
File "finetune_demo.py", line 374, in
tf.app.run(main)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "finetune_demo.py", line 337, in main
estimator.train(input_fn=input_fn_trn, max_steps=next_checkpoint)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2162, in _call_model_fn
features, labels, mode, config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2391, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1244, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1505, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 826, in model_fn_v2
cnn_molecule = DeepConvolutionModelWithoutEmbedding(config_molecule, training, molecule_tokens)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 207, in init
kernel_size=config.kernel_size,
AttributeError: 'DeepConvolutionModelConfig' object has no attribute 'kernel_size'

$ python finetune_demo.py --fold 0 --model_version 3
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v3 of <src.finetune.dti_model.MbertPcnnModel object at 0x7fe76421b400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_evaluation_master': '', '_cluster': None, '_num_worker_replicas': 1, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe76421b470>, '_task_type': 'worker', '_task_id': 0, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_service': None, '_save_checkpoints_steps': 150, '_keep_checkpoint_every_n_hours': 10000, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
, '_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_save_checkpoints_secs': None, '_device_fn': None, '_train_distribute': None, '_num_ps_replicas': 0, '_save_summary_steps': 100, '_model_dir': '../../data/kiba/mbert_cnn_v3_lr0.0001_k12_k12_k12_fold0/', '_is_chief': True, '_tf_random_seed': None, '_log_step_count_steps': None, '_master': ''}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 0.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*********************************** MbertPcnnModel V3 ***********************************
Traceback (most recent call last):
File "finetune_demo.py", line 374, in
tf.app.run(main)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "finetune_demo.py", line 337, in main
estimator.train(input_fn=input_fn_trn, max_steps=next_checkpoint)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2162, in _call_model_fn
features, labels, mode, config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2391, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1244, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1505, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 1031, in model_fn_v3
scaffold_fn=scaffold_fn)
TypeError: new() got an unexpected keyword argument 'training_hooks'

$ python finetune_demo.py --fold 0 --model_version 4
WARNING:tensorflow:Estimator's model_fn (<bound method MbertPcnnModel.model_fn_v4 of <src.finetune.dti_model.MbertPcnnModel object at 0x7f4934566400>>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_save_summary_steps': 100, '_evaluation_master': '', '_master': '', '_is_chief': True, '_tpu_config': TPUConfig(iterations_per_loop=150, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None), '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': '../../data/kiba/mbert_cnn_v4_lr0.0001_k12_k12_k12_fold0/', '_save_checkpoints_secs': None, '_save_checkpoints_steps': 150, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 0.9
allow_growth: true
}
, '_log_step_count_steps': None, '_keep_checkpoint_max': 5, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4934566470>, '_cluster': None, '_tf_random_seed': None, '_num_worker_replicas': 1, '_global_id_in_cluster': 0, '_device_fn': None, '_train_distribute': None, '_task_type': 'worker', '_num_ps_replicas': 0}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Training for 153974 steps (1000.00 epochs in total). Current step 0.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*********************************** MbertPcnnModel V4 ***********************************
Traceback (most recent call last):
File "finetune_demo.py", line 374, in
tf.app.run(main)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "finetune_demo.py", line 337, in main
estimator.train(input_fn=input_fn_trn, max_steps=next_checkpoint)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2162, in _call_model_fn
features, labels, mode, config)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2391, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1244, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/pharma1/venv_silico/lib/python3.5/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1505, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/mnt/HD/mt-dti/src/finetune/dti_model.py", line 1144, in model_fn_v4
loss, self.learning_rate, self.num_train_steps, self.num_warmup_steps, self.use_tpu)
File "/mnt/HD/mt-dti/src/bert/optimization.py", line 66, in create_optimizer
grads = tf.gradients(loss, tvars)
NameError: name 'tvars' is not defined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.