kexinhuang12345 / caster Goto Github PK
View Code? Open in Web Editor NEWCASTER: Predicting Drug Interactions with Chemical Substructure Representation (AAAI 2020)
Home Page: https://arxiv.org/abs/1911.06446
CASTER: Predicting Drug Interactions with Chemical Substructure Representation (AAAI 2020)
Home Page: https://arxiv.org/abs/1911.06446
Hi, I think this project is great, I'm very interested in it and would like to ask you some questions.
(1) How is the unsup_dataset.csv file obtained? It is directly read in the code file run_dde.py, but I don't know how it is generated,especially the index inside.If I want to pretrain the model, are there any other requirements for the amount of unlabeled data?
(2) If I want to get p(x) just like which in fig4 in the paper, how should I add code to make it output? Is it obtained in run_dde.py?
I am looking forward to your reply.
Hi Kexin,
When I run your codes, I met some problems. The screenshot of occurred error is as follow :
My server setup is Intel(R) Xeon(R) E5-2630 v4 2.20GHz CPU, 110 GB RAM and 2 TITAN Xp 12G. As you said in your paper, the training use a server with 2 Intel Xeon E5-2670v2 2.5GHz CPUs, 128 GB RAM and 3 NVIDIA Tesla K80 GPUs. Is it hard limit to run experiment successfully? Can you provide me some suggestions?
Thanks and regards.
Yujie
Dear Kexin,
The drugbank dataset you used has a column named Label_Multi, could you please tell me the meaning and source of the column?
Hi Kexin,
'CASTER' is a wonderful work. I am wondering if you could kindly provide 'unsup_dataset.csv' dataset in your code. It will help me re-implement your algorithm.
Thanks and regards.
Yujie
Hi Kexin,
It is a great project and I am trying to use your codes. Will you upload the training and test dataset as well? They are not included in the data directory. Thanks!
Jack
Very great work! We are also doing research on drug interaction.
First, in the ESPF, the chembl_seq.txt is not given. I think it is a txt file with all the SMILES in your dataset, right?
Also, in the paper of CASTER, the SMILES of the Drug:Melatonin is CC(=O)NCCC1=CNc2c1cc(OC)cc2. While I search for Drugbank, it is COC1=CC=C2NC=C(CCNC(C)=O)C2=C1.(https://www.drugbank.ca/drugs/DB01065) What is the difference between these two SMILES? In fact, there is no lower case in the SMILES in Drugbank, so is it still useful to perform the chemical sequential pattern mining Algorithm?
Hi, Kexin.
I'm working on using CASTER to do some interpretability work. I want to know how you get the scores for Fig.4?
In my opinion, the scores are the code * 100, where code is the second output of the model_nn:
recon, code, score, Z_f, z_D = model_nn(v_D)
And I want to examine the scores for the interaction between Isosorbide Mononitrate and Sildenafil, I wrote the code:
a = 'O=[N+]([O-])O[C@@H]1CO[C@@H]2[C@@H](O)CO[C@H]12' b = 'CCCc1nn(C)c2c(=O)[nH]c(-c3cc(S(=O)(=O)N4CCN(C)CC4)ccc3OCC)nc12' test = np.array([smiles2vector(a, b)] * 32) #It is 32 because I use 32 as the batch size. test = torch.from_numpy(test).to(device).float() test_recon, test_code, test_score, test_Z_f, test_z_D = model_nn(test) scores = test_code.detach().cpu().numpy()*100
And I check the scores, and found that the scores for O=N+ is -2.7843778. (I loaded the "model_train_checkpoint_SNAP_EarlyStopping_SemiSup_Full_Run1.pt".) It isn't the highest score among all the substructures. And it will change greatly as the model get trained (within few iterations).
So how can I reproduce the result as Fig.4 in the paper did?
Hello, I am very interested in this project of yours. I would like to ask you some questions. I would like to know whether to use the preprocessed data of the ESPF project and then use it for CASTER? Is model_pretrain_checkpoint_1.pt in the script run_dde.py generated in this script? It is still generated in another script, and the reference is loaded in it. If it is generated in another script, is there the original script file? There is an unsup_datatest.csv file in the data file, is this an unlabeled dataset? If you can see this question, please reply, thank you very much!
Dear Kexin,
In this file "Run_Explainability_Models", shows the lack of a dataset about "/data/deepDDI_small/fold2/df_ddi_train_val.csv".
Could you upload the directory of "/scratch/kh2383/DFI_data/data/deepDDI_small/"?
Thanks a lot.
Jam
Hi, I've seen the paper, CASTER. And be interested in doing research in this field. So I would like to ask whether the drugBank data set is available. Can I get your dataset. look forward to your reply.
Hi, Kexin.
I'm working on using CASTER to do some research. I want to know how you get the ROC-AUC on BIOSNAP? When I run the code, it is not 0.910, it is 0.964.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.