isayev / release Goto Github PK
View Code? Open in Web Editor NEWDeep Reinforcement Learning for de-novo Drug Design
License: MIT License
Deep Reinforcement Learning for de-novo Drug Design
License: MIT License
Hi,
I'm re-running the LogP example using current version of PyTorch, and the execution stops in the reinforcement loop due to a TypeError, as below. Are you aware of any changes in PyTorch that could be responsible for this? Is there a solution for it?
Thanks!
for i in range(n_iterations):
for j in trange(n_policy, desc='Policy gradient...'):
cur_reward, cur_loss = RL_logp.policy_gradient(gen_data)
rewards.append(simple_moving_average(rewards, cur_reward))
rl_losses.append(simple_moving_average(rl_losses, cur_loss))
plt.plot(rewards)
plt.xlabel('Training iteration')
plt.ylabel('Average reward')
plt.show()
plt.plot(rl_losses)
plt.xlabel('Training iteration')
plt.ylabel('Loss')
plt.show()
smiles_cur, prediction_cur = estimate_and_update(RL_logp.generator,
my_predictor,
n_to_generate)
print('Sample trajectories:')
for sm in smiles_cur[:5]:
print(sm)
with the error below:
Policy gradient...: 0%| | 0/15 [00:00<?, ?it/s]./release/data.py:98: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
return torch.tensor(tensor).cuda()
Policy gradient...: 0%| | 0/15 [00:00<?, ?it/s]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-37-7a3a9698cf0c> in <module>
1 for i in range(n_iterations):
2 for j in trange(n_policy, desc='Policy gradient...'):
----> 3 cur_reward, cur_loss = RL_logp.policy_gradient(gen_data)
4 rewards.append(simple_moving_average(rewards, cur_reward))
5 rl_losses.append(simple_moving_average(rl_losses, cur_loss))
~/work/li/leadopt/generator/ReLeaSE/release/reinforcement.py in policy_gradient(self, data, n_batch, gamma, std_smiles, grad_clipping, **kwargs)
117 reward = self.get_reward(trajectory[1:-1],
118 self.predictor,
--> 119 **kwargs)
120
121 # Converting string of characters into tensor
<ipython-input-33-a8c049e9e937> in get_reward_logp(smiles, predictor, invalid_reward)
1 def get_reward_logp(smiles, predictor, invalid_reward=0.0):
----> 2 mol, prop, nan_smiles = predictor.predict([smiles])
3 if len(nan_smiles) == 1:
4 return invalid_reward
5 if (prop[0] >= 1.0) and (prop[0] <= 4.0):
~/work/li/leadopt/generator/ReLeaSE/release/rnn_predictor.py in predict(self, smiles, use_tqdm)
62 self.model[i]([torch.LongTensor(smiles_tensor).cuda(),
63 torch.LongTensor(length).cuda()],
---> 64 eval=True).detach().cpu().numpy())
65 prediction = np.array(prediction).reshape(len(self.model), -1)
66 prediction = np.min(prediction, axis=0)
/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
~/work/source/repos/OpenChem/openchem/models/Smiles2Label.py in forward(self, inp, eval)
41 else:
42 self.train()
---> 43 embedded = self.Embedding(inp)
44 output, _ = self.Encoder(embedded)
45 output = self.MLP(output)
/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
~/work/source/repos/OpenChem/openchem/modules/embeddings/basic_embedding.py in forward(self, inp)
7
8 def forward(self, inp):
----> 9 embedded = self.embedding(inp)
10 return embedded
/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
--> 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) -> str:
/opt/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not list
Hi, Dear authors,
It seem the predictor is just a tranditional model not a network model. Why did you say using LSTM in the predictor model in your paper? (Deep reinforcement learning for de novo drug design)
I think only the generator was based on the network model, right? I am so confused when I comparing your code with your paper, please give some help to your readers. If you have update the codes, please let me know.
Hi! My weights don't seem to be changing.
I printed out the grad for each of the layers and it comes back as none, the grad of the hidden layer is none as well.
In the code the only thing I changed in the reinforcement file was rl_loss -= (log_probs[0, top_i].cpu().detach().numpy()*discounted_reward) and then tried rl_loss -= (log_probs[0, top_i].item()*discounted_reward) due to an issue with cuda device = 0.
Would you know any reason why grad would be coming up as none for all of the layers? I believe this is why optimizer.step() is not working and the weights are not updating.
Hi there,
Running the RecurringQSAR-example.ipynb
, there is an import statement:
import data_preprocessing as dp
that fails because this module is not available.
In fact., this in really not such a big deal, because it is apparently not used anywhere else in the notebook, so I could just comment it out. However, the notebook fails when looking for the jak2_data.csv
file.
I understand this issue has been raised before, and that you used proprietary data. However, would it be possible just to upload some sample data, so we can run tests and know the expected format?
Thanks!
Hi, I'm trying to work on the example notebooks included in the package. Working on the LogP notebook, I noticed that:
File paths are pointing to wrong directories:
/data/masha/generative_model/chembl_22_clean_1576904_sorted_std_final.smi
/home/mariewelt/Notebooks/gan_oracle/oracle_data/logP_labels.csv
This I could easily change the prefix to ./data/
There is a file still missing: /home/mariewelt/Notebooks/PyTorch/Model_checkpoints/generator/checkpoint_lstm
I looked into the ReLeaSE/checkpoints/generator/
folder, but this file is not there. Should it be possible to generate this file prior to running this notebook? How? Or is it just missing from the tree?
Thanks!
can't find jak2_data.csv in the data folder.
Hi all,
A little bit confuse about the deaded while loop, are you want to skip the invalid SMILES?
So add "break" next to "reward = 0" can work. However, I think you want to penalize the invalid smiles, so just replace reward to some negative value?
How much RAM do I need to run logP prediction?
I can not run generator
gen_data = GeneratorData(training_data_path=gen_data_path, delimiter='\t',
cols_to_read=[0], keep_header=True, tokens=tokens)
because it eats all available memory
Shall I put together a pull request with the MIT license? ;-)
I tried to run your JAK2-demo using my jak2 data, but when I execute to "train_model" here, I get an unexpected error
Execute the code in the following section to report an error:
rewards = []
n_to_generate = 1000
n_policy_replay = 10
n_policy = 5
n_transfer = 500
n_iterations = 5
prediction_log = []
for _ in range(n_iterations):
## Transfer learning
RL.transfer_learning(transfer_data, n_epochs=n_transfer)
_, prediction = estimate_and_update(n_to_generate)
prediction_log.append(prediction)
if len(np.where(prediction >= threshold)[0])/len(prediction) > 0.15:
threshold = min(threshold + 0.05, 0.8)
### Policy gtadient with experience replay
for _ in range(n_policy_replay):
rewards.append(RL.policy_gradient_replay(gen_data, replay, threshold=threshold, n_batch=10))
print(rewards[-1])
_, prediction = estimate_and_update(n_to_generate)
prediction_log.append(prediction)
if len(np.where(prediction >= threshold)[0])/len(prediction) > 0.15:
threshold = min(threshold + 0.05, 0.8)
### Policy graient without experinece replay
for _ in range(n_policy):
rewards.append(RL.policy_gradient(gen_data, threshold=threshold, n_batch=10))
print(rewards[-1])
_, prediction = estimate_and_update(n_to_generate)
prediction_log.append(prediction)
if len(np.where(prediction >= threshold)[0])/len(prediction) > 0.15:
threshold = min(threshold + 0.05, 0.8)
Get the following error:
RuntimeError Traceback (most recent call last)
<ipython-input-51-d78b3f91f2cc> in <module>()
11
12 ## Transfer learning
---> 13 RL.transfer_learning(transfer_data, n_epochs=n_transfer)
14 # _, prediction = estimate_and_update(n_to_generate)
15 # prediction_log.append(prediction)
/project/ReLeaSE/reinforcement.py in transfer_learning(self, data, n_epochs, augment)
131
132 def transfer_learning(self, data, n_epochs, augment=False):
--> 133 _ = self.generator.fit(data, n_epochs, augment=augment)
/project/ReLeaSE/stackRNN.py in fit(self, data, n_epochs, all_losses, print_every, plot_every, augment)
332 for epoch in range(1, n_epochs + 1):
333 inp, target = data.random_training_set(smiles_augmentation)
--> 334 loss = self.train_step(inp, target)
335 loss_avg += loss
336
/project/ReLeaSE/stackRNN.py in train_step(self, inp, target)
287 for c in range(len(inp)):
288 output, hidden, stack = self(inp[c], hidden, stack)
--> 289 loss += self.criterion(output, target[c])
290
291 loss.backward()
~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
--> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)
~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
860 def forward(self, input, target):
861 return F.cross_entropy(input, target, weight=self.weight,
--> 862 ignore_index=self.ignore_index, reduction=self.reduction)
863
864
~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
1548 if size_average is not None or reduce is not None:
1549 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 1550 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
1551
1552
~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
1401 raise ValueError('Expected 2 or more dimensions (got {})'.format(dim))
1402
-> 1403 if input.size(0) != target.size(0):
1404 raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
1405 .format(input.size(0), target.size(0)))
RuntimeError: dimension specified as 0 but tensor has no dimensions
I didn't encounter any mistakes before this step, but this step got an unexpected error. Can you give me some guidance?
conda create -n release python=3.6
(release) [jclin@longleaf-login4 ReLeaSE]$ pip install tensorflow==1.11.0rc1
ERROR: Could not find a version that satisfies the requirement tensorflow==1.11.0rc1 (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.3.0, 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.12.2, 1.12.3, 1.13.1, 1.13.2, 1.14.0, 1.15.0, 1.15.2, 1.15.3, 1.15.4, 1.15.5, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0rc0, 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0rc0, 2.4.0rc1, 2.4.0rc2, 2.4.0rc3, 2.4.0rc4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0rc0, 2.5.0rc1, 2.5.0rc2, 2.5.0rc3, 2.5.0, 2.5.1, 2.5.2, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2)
ERROR: No matching distribution found for tensorflow==1.11.0rc1
In my installation, I used pip install tensorflow==1.11.0
instead for now.
Hi,
I did try to run the (https://github.com/isayev/ReLeaSE/blob/master/RecurrentQSAR-example-logp.ipynb) the following example, and it ends with the following error.
Can you please let us know if RecurrentQSAR-example-logp.ipnb is working fine or there is any bugs.
Thanks,
can't find jak2_data.csv in the clone,How to get jak2_data.csv,
Sorry for the inconvenience!
my_generator.load_model('checkpoints/generator/checkpoint_biggest') leads to the following error:
RuntimeError: Error(s) in loading state_dict for StackAugmentedRNN:
Missing key(s) in state_dict: "rnn.weight_ih_l0", "rnn.weight_hh_l0", "rnn.bias_ih_l0", "rnn.bias_hh_l0", "rnn.weight_ih_l0_reverse", "rnn.weight_hh_l0_reverse", "rnn.bias_ih_l0_reverse", "rnn.bias_hh_l0_reverse".
Unexpected key(s) in state_dict: "gru.weight_ih_l0", "gru.weight_hh_l0", "gru.bias_ih_l0", "gru.bias_hh_l0".
size mismatch for stack_controls_layer.weight: copying a param of torch.Size([3, 1000]) from checkpoint, where the shape is torch.Size([3, 1500]) in current model.
size mismatch for stack_input_layer.weight: copying a param of torch.Size([100, 1000]) from checkpoint, where the shape is torch.Size([1500, 1500]) in current model.
size mismatch for stack_input_layer.bias: copying a param of torch.Size([100]) from checkpoint, where the shape is torch.Size([1500]) in current model.
size mismatch for encoder.weight: copying a param of torch.Size([45, 500]) from checkpoint, where the shape is torch.Size([45, 1500]) in current model.
size mismatch for decoder.weight: copying a param of torch.Size([45, 1000]) from checkpoint, where the shape is torch.Size([45, 1500]) in current model.
Do you potentially know why this is the case? Thank you!
Hello,
I was wondering if you have the python code for how you created the checkpoint_biggest from the chembl file? It also mentions loading multiple parameters in the abstract of the paper, do you have any implementation with more than one parameter?
Thank you,
Kevin
I read that the JAK2 data itself is proprietary but is it possible to upload the trained model without the data for use, cause like said in the JAK2 notebook 2000 samples is not so much to train a NN predictor, so it would be great if the model itself could be uploaded
when I am running train the model
for _ in range(n_iterations):
### Transfer learning
RL.transfer_learning(transfer_data, n_epochs=n_transfer)
_, prediction = estimate_and_update(n_to_generate)
prediction_log.append(prediction)
if len(np.where(prediction >= threshold)[0])/len(prediction) > 0.15:
threshold = min(threshold + 0.05, 0.8)
### Policy gtadient with experience replay
for _ in range(n_policy_replay):
rewards.append(RL.policy_gradient_replay(gen_data, replay, threshold=threshold, n_batch=10))
print(rewards[-1])
_, prediction = estimate_and_update(n_to_generate)
prediction_log.append(prediction)
if len(np.where(prediction >= threshold)[0])/len(prediction) > 0.15:
threshold = min(threshold + 0.05, 0.8)
### Policy graient without experinece replay
for _ in range(n_policy):
rewards.append(RL.policy_gradient(gen_data, threshold=threshold, n_batch=10))
print(rewards[-1])
_, prediction = estimate_and_update(n_to_generate)
prediction_log.append(prediction)
if len(np.where(prediction >= threshold)[0])/len(prediction) > 0.15:
threshold = min(threshold + 0.05, 0.8)
I experience this error
RuntimeError Traceback (most recent call last)
in ()
10
11 ### Transfer learning
---> 12 RL.transfer_learning(transfer_data, n_epochs=n_transfer)
13 _, prediction = estimate_and_update(n_to_generate)
14 prediction_log.append(prediction)
~/ReLeaSE/reinforcement.py in transfer_learning(self, data, n_epochs, augment)
131
132 def transfer_learning(self, data, n_epochs, augment=False):
--> 133 _ = self.generator.fit(data, n_epochs, augment=augment)
~/ReLeaSE/stackRNN.py in fit(self, data, n_epochs, all_losses, print_every, plot_every, augment)
330 for epoch in range(1, n_epochs + 1):
331 inp, target = data.random_training_set(smiles_augmentation)
--> 332 loss = self.train_step(inp, target)
333 loss_avg += loss
334
~/ReLeaSE/stackRNN.py in train_step(self, inp, target)
285 for c in range(len(inp)):
286 output, hidden, stack = self(inp[c], hidden, stack)
--> 287 loss += self.criterion(output, target[c])
288
289 loss.backward()
~/anaconda3/envs/ReLeaSE/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
--> 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)
~/anaconda3/envs/ReLeaSE/lib/python3.6/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
757 _assert_no_grad(target)
758 return F.cross_entropy(input, target, self.weight, self.size_average,
--> 759 self.ignore_index, self.reduce)
760
761
~/anaconda3/envs/ReLeaSE/lib/python3.6/site-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce)
1440 >>> loss.backward()
1441 """
-> 1442 return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)
1443
1444
~/anaconda3/envs/ReLeaSE/lib/python3.6/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce)
1326 raise ValueError('Expected 2 or more dimensions (got {})'.format(dim))
1327
-> 1328 if input.size(0) != target.size(0):
1329 raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
1330 .format(input.size(0), target.size(0)))
RuntimeError: dimension specified as 0 but tensor has no dimensions
Just run the example JAK2-demo.pynb with the recent modifications you'll see what I mean.
Good afternoon,
I've been using the code from the develop branch with pytorch 0.4. I am having this memory issue below when executing this piece of code from the notebook example:
### Transfer learning
RL.transfer_learning(transfer_data, n_epochs=n_transfer)
_, prediction = estimate_and_update(n_to_generate)
prediction_log.append(prediction)
if len(np.where(prediction >= threshold)[0])/len(prediction) > 0.15:
threshold = min(threshold + 0.05, 0.8)
RuntimeError:
cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCTensorMath.cu:35
Any idea of what might be causing this problem?
Hello, I want to try to use your method to design some molecules with my own data set. Is the two models of pIC50 and metal melting temperature just different reward functions? Can you explain the significance of the reward function for the melting temperature of metals? Why use this form? Or can you provide the demo code for this experiment, I look forward to your reply! !!
Hello!
I am a student who is studying your code. I have some questions.
If I look at the end of your 'JAK2-demo.ipync' code,
'while (jak2_compounds) = 1000):
sm = my_generator.evaluate(gen_data, tempperature=0.5) [1:-1]
clean_sm, pred, nan_sm = jak2_predictor.preferred valu
if Len(clean_sm) > 0 and red[0] = 0.8: #probably greater than 0.8
jak2_compounds += clean_sm
save_smi_to_file('/generated_compounds/test_ask1/' + str(i+1) + '.txt', jak2_compounds)'
The gen_data that is applied to the 'my_generator.evalute' function is a file about the Chemlb 22 data base. However, the last file you save is called jak2_compounds.
If the last part creates a new jak2_compounds through fine tuning, I think the data used for gen_data should contain jak2 data rather than chemlb data.
I'd like to politely ask if I'm right.
Also, I want to know which part of the code is doing fine_tuning. I'm studying a lot thanks to you. I would appreciate your reply.
Hi!
I'm unable to find the transfer learning data-- transfer_data.smi. Do you potentially know what the cause might be?
Thanks!
Hello, I am very happy to see your research. I want to use the trained generation model to randomly generate molecules without setting any conditions, but I don't know how to operate, so please give me more guidance, how should I do it? Can you generate molecules randomly instead of conditionally?
Best !
I collected a large smiles data set. I wanted to try to generate the model from scratch. Then I counted the unique characters of all smiles, as follows:
#%()*+-./0123456789:=@ABCDEFGHIKLMNOPRSTUVWXYZ[\\]abcdefghiklmnoprstuy
But I see in your `JAK2_min_max_demo.ipynb',
tokens = ['<', '>', '#', '%', ')', '(', '+', '-', '/', '.', '1', '0', '3', '2', '5', '4', '7', '6', '9', '8', '=', 'A', '@', 'C', 'B', 'F', 'I', 'H', 'O', 'N', 'P', 'S', '[', ']','\\', 'c', 'e', 'i', 'l', 'o', 'n', 'p', 's', 'r', '\n']
Then I read the smiles data file you provided chembl_22_clean_1576904_sorted_std_final.smi
,Get the unique character of smiles,But I found that token is not equal to token in `JAK2_min_max_demo.ipynb':
chem_smiles = read_smi_file("ReLeaSE/data/chembl_22_clean_1576904_sorted_std_final.smi")
ch_smiles = [i.split("\t")[0] for i in chem_smiles[0]]
tokens2 = list(set(''.join(ch_smiles)))
tokens2 = list(np.sort(tokens))
tokens2 = ''.join(tokens)
The token2 result is:#%()+-./0123456789=BCFHINOPS[\\]clnoprs
Except that <
and >'denote beginning and ending,
token1 and
token2 are not equal, why is that? What did you do with the
chembl_22_clean_1576904_sorted_std_final.smi?
Can you give me more guidance? Thank you very much.
can't find jak2_data.csv in the clone, should one download from chembl?
Hello, I use the SMILES data I collected and your transfer learning model to train a generation model on my data. I want the model to generate new molecules similar to my training set, but unfortunately, it seems that the model has not learned any characteristics of the training data. Here are some SMILES generated by my model:
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCOCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CC=CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=CCC=CSC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC(C)C ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC(CC)CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=C(C)C ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC(C)C ',
'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC ',
The generated SMILES are transformed into molecular structure diagrams. Some examples are as follows:
It doesn't look like a compound molecule at all.What may be the problem?
Hello, when I train a generate model with my own SMILES data, use LogP_optimization_demo.ipynb
:
tokens = ['<', '>', '#', '%', ')', '(', '+', '-', '/', '.', '1', '0 ', '3', '2', '5', '4', '7', '6', '9', '8', '=', 'A', '@', 'C', 'B', 'F', 'I', 'H', 'O', 'N', 'P', 'S', '[', ']', '\\', 'c', ' e', 'i', 'l', 'o', 'n', 'p', 's', 'r', '\n']
, but will get characters outside the tokens list, causing me to fail Continue to use the Transfer learning method to train, so I changed the code as follows during training:
gen_data_path = "data/nueji_data2.csv"
gen_data = GeneratorData(training_data_path=gen_data_path, delimiter='\t',
cols_to_read=[0], keep_header=True, tokens=None)
hidden_size = 1500
stack_width = 1500
stack_depth = 200
layer_type = 'GRU'
lr = 0.001
optimizer_instance = torch.optim.Adadelta
my_generator = StackAugmentedRNN(input_size=gen_data.n_characters, hidden_size=hidden_size,
output_size=gen_data.n_characters, layer_type=layer_type,
n_layers=1, is_bidirectional=False, has_stack=True,
stack_width=stack_width, stack_depth=stack_depth,
use_cuda=use_cuda,
optimizer_instance=optimizer_instance, lr=lr)
model_path = './checkpoints/generator/checkpoint_biggest_rnn'
my_generator.load_model(model_path)
But I get the following error:
RuntimeError Traceback (most recent call last)
<ipython-input-11-3c9498b26c8c> in <module>()
----> 1 my_generator.load_model(model_path)
/scratch2/hzhou/Drug/generate_smiles/ReLeaSE/release/stackRNN.py in load_model(self, path)
140 """
141 weights = torch.load(path)
--> 142 self.load_state_dict(weights)
143
144 def save_model(self, path):
~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
717 if len(error_msgs) > 0:
718 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 719 self.__class__.__name__, "\n\t".join(error_msgs)))
720
721 def parameters(self):
RuntimeError: Error(s) in loading state_dict for StackAugmentedRNN:
size mismatch for encoder.weight: copying a param of torch.Size([40, 1500]) from checkpoint, where the shape is torch.Size([45, 1500]) in current model.
size mismatch for decoder.weight: copying a param of torch.Size([40, 1500]) from checkpoint, where the shape is torch.Size([45, 1500]) in current model.
size mismatch for decoder.bias: copying a param of torch.Size([40]) from checkpoint, where the shape is torch.Size([45]) in current model.
But my data set is very small. Without migration learning, my generation model may not be able to learn the chemical rules of SMILES, so my idea is this: I use the `data/chembl_22_clean_1576904_sorted_std_final.smi'data set to retrain a model, but I customize tokens to define the characters in my data set into token, and finally make it work again. Re-training my data with a pre-training model, is my idea right? I'm not sure.
Hi,
I repeat to execute your LogP module with two test. But, in first test, I only get 0.2413 for valid SMILES when I used retrained generator and predictor to do reinforcement learning. And my retrained generator also get 0.8876497315159025 for drug-like region and 0.7263 for valid SMILES before reinforcement learning. I don't know why my generator get the low valid SMILES after Reinforcement learning.
And in second test, I get 0.7698 for valid SMILES and 0.9664848012470771 for drug-like region when I used your checkpoint of generator. Do you have some tricks for generator training?
The logp notebook throws an error trying to import what I think is scikit-learn
I use my own SMILES data to load the pre-training model you provided and train a generation model so that I can generate new molecules with similar structure to my training data. How can I use the trained generation model to generate a specified number of new molecules?
Hello, I want to use the data you provided to reproduce the results mentioned in the paper, such as the following results:
In particular, I don't know what is mentioned in your paper:
Can you provide a complete example? Thank You
Best wishes!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.