yuyaoyang2333 / syntalinker Goto Github PK
View Code? Open in Web Editor NEWAutomatic Fragment Linking with Deep Conditional Transformer Neural Networks
License: MIT License
Automatic Fragment Linking with Deep Conditional Transformer Neural Networks
License: MIT License
I couldnt find instruction of how to generate linkers for my given fragments.
I have trained a model following the steps up to average_models.sh using the provided CHEMBL dataset.
But I am not sure what to do next.
Thanks
I ran the preprocess.sh and get the following output.
[2020-10-09 10:52:15,369 INFO] Extracting features...
[2020-10-09 10:52:15,371 INFO] * number of source features: 0.
[2020-10-09 10:52:15,371 INFO] * number of target features: 0.
[2020-10-09 10:52:15,371 INFO] Building `Fields` object...
[2020-10-09 10:52:15,371 INFO] Building & saving training data...
[2020-10-09 10:52:15,372 INFO] Reading source and target files: data/ChEMBL/src-train data/ChEMBL/tgt-train.
[2020-10-09 10:52:15,810 INFO] Splitting shard 0.
[2020-10-09 10:52:16,380 INFO] Building shard 0.
[2020-10-09 10:53:27,915 INFO] * saving 0th train data shard to data/ChEMBL/.train.0.pt.
[2020-10-09 10:53:59,229 INFO] Building & saving validation data...
[2020-10-09 10:53:59,231 INFO] Reading source and target files: data/ChEMBL/src-val data/ChEMBL/tgt-val.
[2020-10-09 10:53:59,267 INFO] Splitting shard 0.
[2020-10-09 10:53:59,331 INFO] Building shard 0.
[2020-10-09 10:54:08,047 INFO] * saving 0th valid data shard to data/ChEMBL/.valid.0.pt.
[2020-10-09 10:54:11,926 INFO] Building & saving vocabulary...
[2020-10-09 10:54:15,444 INFO] * reloading data/ChEMBL/.train.0.pt.
[2020-10-09 10:54:20,820 INFO] * tgt vocab size: 34.
[2020-10-09 10:54:20,820 INFO] * src vocab size: 50.
[2020-10-09 10:54:20,820 INFO] * merging src and tgt vocab...
But then the subsequent training.sh failed to run and gave me this
Traceback (most recent call last):
File "train.py", line 118, in <module>
main(opt)
File "train.py", line 51, in main
single_main(opt, 0)
File "/home/UK/ama/Development/SyntaLinker/onmt/train_single.py", line 100, in main
first_dataset = next(lazily_load_dataset("train", opt))
File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 551, in lazily_load_dataset
yield _lazy_dataset_loader(pt, corpus_type)
File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 538, in _lazy_dataset_loader
dataset = torch.load(pt_file)
File "/home/UK/ama/.conda/envs/SyntaLinker/lib/python3.6/site-packages/torch/serialization.py", line 419, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/ChEMBL/ChEMBL.train.pt'
These are the only files in the data.ChEMBL directory
total 304132
drwxr-xr-x 2 ama domain users 4096 Oct 9 10:54 .
drwxr-xr-x 3 ama domain users 4096 Oct 8 11:01 ..
-rw-r--r-- 1 ama domain users 6511683 Oct 8 11:01 src-test.txt
-rw-r--r-- 1 ama domain users 52028301 Oct 8 11:01 src-train
-rw-r--r-- 1 ama domain users 6502030 Oct 8 11:01 src-val
-rw-r--r-- 1 ama domain users 8071432 Oct 8 11:01 tgt-test.txt
-rw-r--r-- 1 ama domain users 64500626 Oct 8 11:01 tgt-train
-rw-r--r-- 1 ama domain users 8060092 Oct 8 11:01 tgt-val
-rw-r--r-- 1 ama domain users 146295349 Oct 9 10:54 .train.0.pt
-rw-r--r-- 1 ama domain users 18167444 Oct 9 10:54 .valid.0.pt
-rw-r--r-- 1 ama domain users 1355 Oct 9 10:54 .vocab.pt
Hi,
when running the recovery.sh
script, there is an error raised because there is no score_predictions.py file. Could you please add this file module to the repository?
I try generate the linker with multiple constraints, e.g. this is my fragment pair
[L_6 0 0 1 1] * C ( = O ) N C 1 C C 1 . * C ( = O ) N C C ( C ) ( C ) C
One of the generated structures is this
CC(C)(C)CNC(=O)C1CCCN1C(=O)CCCn1cncn1
, in which it missed one of the original fragment.
I noticed that in the provided training data, there is not any pharmacophoric constraints. Do I need to generate a new set of training data and re-train the model?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.