kexinhuang12345 / caster Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 9.0 99.03 MB

CASTER: Predicting Drug Interactions with Chemical Substructure Representation (AAAI 2020)

Home Page: https://arxiv.org/abs/1911.06446

Python 5.65% Jupyter Notebook 94.35%

caster's People

Contributors

Stargazers

Watchers

Forkers

realcatking anu-bioinfo irreversibly jenniejiang xen0byte parkerburchett hamdiibnizhar rnaimehaom wangjx22

caster's Issues

How to generate unsup_datatest.csv

Hi, I think this project is great, I'm very interested in it and would like to ask you some questions.
(1) How is the unsup_dataset.csv file obtained? It is directly read in the code file run_dde.py, but I don't know how it is generated，especially the index inside.If I want to pretrain the model, are there any other requirements for the amount of unlabeled data?
(2) If I want to get p(x) just like which in fig4 in the paper, how should I add code to make it output? Is it obtained in run_dde.py?
I am looking forward to your reply.

About 'training server setup' problem

Hi Kexin,

When I run your codes, I met some problems. The screenshot of occurred error is as follow :

My server setup is Intel(R) Xeon(R) E5-2630 v4 2.20GHz CPU, 110 GB RAM and 2 TITAN Xp 12G. As you said in your paper, the training use a server with 2 Intel Xeon E5-2670v2 2.5GHz CPUs, 128 GB RAM and 3 NVIDIA Tesla K80 GPUs. Is it hard limit to run experiment successfully? Can you provide me some suggestions?

Thanks and regards.

Yujie

Label_Multi in dataset

Dear Kexin,
The drugbank dataset you used has a column named Label_Multi, could you please tell me the meaning and source of the column?

'unsup_dataset.csv' dataset problem

Hi Kexin,

'CASTER' is a wonderful work. I am wondering if you could kindly provide 'unsup_dataset.csv' dataset in your code. It will help me re-implement your algorithm.

Thanks and regards.

Yujie

Will you upload the training and test dataset?

Hi Kexin,
It is a great project and I am trying to use your codes. Will you upload the training and test dataset as well? They are not included in the data directory. Thanks!
Jack

Some problems about the SMILES

Very great work! We are also doing research on drug interaction.
First, in the ESPF, the chembl_seq.txt is not given. I think it is a txt file with all the SMILES in your dataset, right?
Also, in the paper of CASTER, the SMILES of the Drug:Melatonin is CC(=O)NCCC1=CNc2c1cc(OC)cc2. While I search for Drugbank, it is COC1=CC=C2NC=C(CCNC(C)=O)C2=C1.(https://www.drugbank.ca/drugs/DB01065) What is the difference between these two SMILES? In fact, there is no lower case in the SMILES in Drugbank, so is it still useful to perform the chemical sequential pattern mining Algorithm?

How to get the scores as Fig.4 in the paper did?

Hi, Kexin.
I'm working on using CASTER to do some interpretability work. I want to know how you get the scores for Fig.4?
In my opinion, the scores are the code * 100, where code is the second output of the model_nn:
recon, code, score, Z_f, z_D = model_nn(v_D)
And I want to examine the scores for the interaction between Isosorbide Mononitrate and Sildenafil, I wrote the code:

 
   a = 'O=[N+]([O-])O[C@@H]1CO[C@@H]2[C@@H](O)CO[C@H]12'
   b = 'CCCc1nn(C)c2c(=O)[nH]c(-c3cc(S(=O)(=O)N4CCN(C)CC4)ccc3OCC)nc12'
   test = np.array([smiles2vector(a, b)] * 32)  #It is 32 because I use 32 as the batch size.
   test = torch.from_numpy(test).to(device).float()
   test_recon, test_code, test_score, test_Z_f, test_z_D = model_nn(test)
   scores = test_code.detach().cpu().numpy()*100

And I check the scores, and found that the scores for O=N+ is -2.7843778. (I loaded the "model_train_checkpoint_SNAP_EarlyStopping_SemiSup_Full_Run1.pt".) It isn't the highest score among all the substructures. And it will change greatly as the model get trained (within few iterations).
So how can I reproduce the result as Fig.4 in the paper did?

ask for help

Hello, I am very interested in this project of yours. I would like to ask you some questions. I would like to know whether to use the preprocessed data of the ESPF project and then use it for CASTER? Is model_pretrain_checkpoint_1.pt in the script run_dde.py generated in this script? It is still generated in another script, and the reference is loaded in it. If it is generated in another script, is there the original script file? There is an unsup_datatest.csv file in the data file, is this an unlabeled dataset? If you can see this question, please reply, thank you very much!

About "'DataParallel' object has no attribute 'src_device_obj'" Error

Can you tell me how to solve this problem?Help me please...

Lack of Dataset

Dear Kexin,

In this file "Run_Explainability_Models", shows the lack of a dataset about "/data/deepDDI_small/fold2/df_ddi_train_val.csv".

Could you upload the directory of "/scratch/kh2383/DFI_data/data/deepDDI_small/"?

Thanks a lot.
Jam

dataset requeset

Hi, I've seen the paper, CASTER. And be interested in doing research in this field. So I would like to ask whether the drugBank data set is available. Can I get your dataset. look forward to your reply.

How to get the roc-auc on BIOSNAP?

Hi, Kexin.
I'm working on using CASTER to do some research. I want to know how you get the ROC-AUC on BIOSNAP? When I run the code, it is not 0.910, it is 0.964.