ardigen / mat Goto Github PK
View Code? Open in Web Editor NEWThe official implementation of the Molecule Attention Transformer.
License: MIT License
The official implementation of the Molecule Attention Transformer.
License: MIT License
Just as the title reads. I am trying out the Jupyter example with my own database, and it's extremely slow most of the time except once when it randomly worked fast. I know this might be a stupid question and I would appreciate any help.
Hi,
Thanks for the excellent and easily understandable code.
I could easily use your code but I have some issues understanding the use of "Adapter" and in which library it exists.
I couldn't find which library it is from.
self.adapter = Adapter(size, 8) if use_adapter else None
Thanks in advance.
When I run the cell model.cuda( ), I get the following error.
I am running it on Google Colab with GPU support ( Tesla K80 )
RuntimeError Traceback (most recent call last)
in ()
----> 1 model.cuda()
8 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in (t)
456 Module: self
457 """
--> 458 return self._apply(lambda t: t.cuda(device))
459
460 def cpu(self: T) -> T:
RuntimeError: CUDA error: an illegal memory access was encountered
From The Pre - Print page 4 of evaluation what does this sentence mean? What standardization did the results go through?
Hello! I'm trying to do a test run of the pre-trained MAT model. I've run SMILES through the DataLoader, which returns the following nine variables:
adj
afm
bft
orderAtt
aromAtt
conjAtt
ringAtt
distances
label
The model itself, as instantiated in load_weights.ipynb
, is class GraphTransformer
which needs these arguments:
src
src_mask
adj_matrix
distances_matrix
edges_att
Is there example code that shows how to go from the output of the DataLoader into the transformer?
Thanks for putting out this implementation! It's a really nice complement to your recent paper :)
Would you folks be willing to add a license for the code? I'd love to re-use some of the models and tools here, but can only do so with a license file.
If you're not sure what license would be good, I'd recommend MIT. It's what we use for deepchem's license
Perhaps this is because I come from a Tensorflow/Keras background and I'm not familiar with PyTorch, but the main entry point for the code for inference and for training is not clear to me. I've looked at the EXAMPLE.ipynb
, and the 9th cell has the code:
for batch in data_loader:
adjacency_matrix, node_features, distance_matrix, y = batch
batch_mask = torch.sum(torch.abs(node_features), dim=-1) != 0
output = model(node_features, batch_mask, adjacency_matrix, distance_matrix, None)
. . .
What's supposed to go in the dots? For training: I wasn't able to find any obvious (to me) optimizer in the other scripts in the repo. For inference: should I pass the out
to the to_predict
method of the GraphTransformer
class? Does to_predict
return an Nx1
array of point estimates of (for example) logS solubility, where N
is the number of molecules in the batch?
Is there a lot of missing code in the ...
part or is it trivial enough (fewer than 10 or so lines) that you could paste it here? Doesn't have to be anything fancy, maybe just to retrain a model on one of your .csv
datasets and perform inference on the same dataset.
Other repos I've come across make the main entry point really clear. Here is an example from another repo:
main.Main(data=sol_data, # provided data (SMILES, property)
data_name=data_name, # dataset's name
data_units='', # property's SI units
bayopt_bounds=bounds, # bounds contraining the Bayesian search of neural architectures
k_fold_number = 10, # number of k-folds used for cross-validation
augmentation = True, # SMILES augmentation
outdir = "./data/", # directory for outputs (plots + .txt files)
bayopt_n_epochs = 10, # number of epochs for training during Bayesian search
bayopt_n_rounds = 25, # number of architectures to sample during Bayesian search
bayopt_on = True, # use Bayesian search
n_gpus = 1, # number of GPUs to be used
patience = 25, # number of epochs with no improvement after which training will be stopped
n_epochs = 100) # maximum of epochs for training
Maybe that's a bit too formal, but when I pasted it in a separate notebook and ran it on an AWS GPU it ran without issues. Is there a complete example like that for this repo?
Afaict, Adapter
referenced here is not defined anywhere. It's not a problem as control flow with default params doesn't hit it, but I'm curious what it is? Just a reference to some other unreleased dev code?
The load_weights.ipynb
file, which is referenced in the README has been removed. I think it is superseded by the new EXAMPLE.ipynb
. If so, the README should be updated.
Thanks!
Are there any experiments that benchmark MAT against chemprop? Have the folks over at Ardigen already tried this comparison?
I would like to confirm whether the pretrained weights available in the README are just from "masked input node" prediction, and not of the final trained MAT. I assume this is the case because it skips loading any generator weights (which would differ for each task).
When you do transfer learn onto a specific task, do you do any freezing and gradual thawing of the encoder weights, or just train right away?
Hi!
I am quite fascinated by your approach,I was wondering if we could tweak the present architecture or if the present architecture can give lets say a d dimensional feature vector of any SMILES molecule given to it,like a molecule fingerprint which can be used for other downstream tasks e.g. calculating two SMILES sequence similarity.
Thanks and Regards
Hi there! In the MAT paper, there were visualisations of the self-attention weights that were produced for one molecule. I am a bit unsure of where these self-attention weights can be located in the transformers.py file. Do you have any advice? Thanks!
I've modified the data_loader code and am running into a crash in featurization.data_utils.mol_collate_func
here:
for molecule in batch:
if type(molecule.y[0]) == np.ndarray:
labels.append(molecule.y[0])
else:
labels.append(molecule.y)
I didn't debug the original code to see why it doesn't crash here, but with my code molecule.y
is a scalar, so molecule.y[0]
doesn't work. It seems like it should be:
for molecule in batch:
if type(molecule.y) == np.ndarray:
labels.append(molecule.y[0])
else:
labels.append(molecule.y)
Thanks a lot for the code! And your idea is really impressive.
After the coordinates are embedded in the mol, the "pairwise_distances" function is used to calculate the distance matrix.
But why not use the Rdkit function "AllChem.Get3DDistanceMatrix" for distance matrix calculation? I did the experiment in the both methods, it seems that the results are the same.
Is there any concern to use "pairwise_distances" instead of "AllChem.Get3DDistanceMatrix"?
Great Work!
I'm trying to repeat the results in your paper, but I'm having trobule using random search to obtain the best results in each dataset.
As you mentioned in the paper, " we extensively tune their hyperparameters using random search", & "We run two sets of experiments with budget of 150 and 500 evaluations".
May I ask how you tune this hyperparameters using random search and how to control the budget of 150 and 500 combinations. I have tried to utilize the skorch package to solve this problem but it failed.
Thanks a lot!
Hello!
I am trying to reproduce the results on downstream tasks (e.g. BBBP, estrogen-alpha, estrogen-beta, and so on).
I used the pre-trained model released in this repository and fine-tuned it on each downstream task for 100 epochs as guided in the paper. But I could not reproduce the results in the paper. I suppose my fine-tuning process is quite different from yours.
Can I ask for the fine-tuning codes or more details on the fine-tuning process?
Thank you and nice work!
Hi,
Thanks for the really nice and well explained paper.
I had a question regarding how the prediction output is invariant to the order of the atoms in the molecule. One can randomly permute the order of atoms in both the adjacency matrix, distance matrix as well as the atom feature matrix.
Will the MAT give the same property prediction for the different permutations?
My understanding is that the learned Attention is between positions so it is not permutation invariant. In the NLP uses of the Transformer, there is a positional encoding term added which helps with learning distant context, but unlike in language tasks, the order of the atoms in a molecule can be specified quite arbitrarily.
Thanks.
Hi, I want to add edge features to train, but I don't know how the edge features working
Great Work!
Can you release the splits used for the 6 folds for each dataset?
In figure 2 you mention that some were random and some were scaffold, but which was which was not discussed in either dataset section. The splits would be especially helpful the Estrogen datasets as grabbing data hitting a protein from CHEMBL is a tricky process and hard to do exactly the same way twice.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.