yuyaoyang2333 / syntalinker Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 50.0 54.64 MB

Automatic Fragment Linking with Deep Conditional Transformer Neural Networks

License: MIT License

Python 95.63% Shell 4.37%

syntalinker's People

Contributors

Stargazers

Watchers

syntalinker's Issues

How to input customized two fragments with pharmacophore constraints for linker generation?

I couldnt find instruction of how to generate linkers for my given fragments.

I have trained a model following the steps up to average_models.sh using the provided CHEMBL dataset.
But I am not sure what to do next.

Thanks

Failed to train

I ran the preprocess.sh and get the following output.

[2020-10-09 10:52:15,369 INFO] Extracting features...
[2020-10-09 10:52:15,371 INFO]  * number of source features: 0.
[2020-10-09 10:52:15,371 INFO]  * number of target features: 0.
[2020-10-09 10:52:15,371 INFO] Building `Fields` object...
[2020-10-09 10:52:15,371 INFO] Building & saving training data...
[2020-10-09 10:52:15,372 INFO] Reading source and target files: data/ChEMBL/src-train data/ChEMBL/tgt-train.
[2020-10-09 10:52:15,810 INFO] Splitting shard 0.
[2020-10-09 10:52:16,380 INFO] Building shard 0.
[2020-10-09 10:53:27,915 INFO]  * saving 0th train data shard to data/ChEMBL/.train.0.pt.
[2020-10-09 10:53:59,229 INFO] Building & saving validation data...
[2020-10-09 10:53:59,231 INFO] Reading source and target files: data/ChEMBL/src-val data/ChEMBL/tgt-val.
[2020-10-09 10:53:59,267 INFO] Splitting shard 0.
[2020-10-09 10:53:59,331 INFO] Building shard 0.
[2020-10-09 10:54:08,047 INFO]  * saving 0th valid data shard to data/ChEMBL/.valid.0.pt.
[2020-10-09 10:54:11,926 INFO] Building & saving vocabulary...
[2020-10-09 10:54:15,444 INFO]  * reloading data/ChEMBL/.train.0.pt.
[2020-10-09 10:54:20,820 INFO]  * tgt vocab size: 34.
[2020-10-09 10:54:20,820 INFO]  * src vocab size: 50.
[2020-10-09 10:54:20,820 INFO]  * merging src and tgt vocab...

But then the subsequent training.sh failed to run and gave me this

Traceback (most recent call last):                                                                          
  File "train.py", line 118, in <module>                                                                    
    main(opt)                                                                                               
  File "train.py", line 51, in main
    single_main(opt, 0)
  File "/home/UK/ama/Development/SyntaLinker/onmt/train_single.py", line 100, in main
    first_dataset = next(lazily_load_dataset("train", opt))
  File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 551, in lazily_load_dataset
    yield _lazy_dataset_loader(pt, corpus_type)
  File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 538, in _lazy_dataset_loader
    dataset = torch.load(pt_file)
  File "/home/UK/ama/.conda/envs/SyntaLinker/lib/python3.6/site-packages/torch/serialization.py", line 419, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/ChEMBL/ChEMBL.train.pt'

These are the only files in the data.ChEMBL directory

total 304132
drwxr-xr-x 2 ama domain users      4096 Oct  9 10:54 .
drwxr-xr-x 3 ama domain users      4096 Oct  8 11:01 ..
-rw-r--r-- 1 ama domain users   6511683 Oct  8 11:01 src-test.txt
-rw-r--r-- 1 ama domain users  52028301 Oct  8 11:01 src-train
-rw-r--r-- 1 ama domain users   6502030 Oct  8 11:01 src-val
-rw-r--r-- 1 ama domain users   8071432 Oct  8 11:01 tgt-test.txt
-rw-r--r-- 1 ama domain users  64500626 Oct  8 11:01 tgt-train
-rw-r--r-- 1 ama domain users   8060092 Oct  8 11:01 tgt-val
-rw-r--r-- 1 ama domain users 146295349 Oct  9 10:54 .train.0.pt
-rw-r--r-- 1 ama domain users  18167444 Oct  9 10:54 .valid.0.pt
-rw-r--r-- 1 ama domain users      1355 Oct  9 10:54 .vocab.pt

Segmentation Error

Have you come across this segmentation error when you ran the testing_beam_search?
Is there any way to fix this?

Thanks in advance

Missing python module for recovery.sh script

Hi,
when running the recovery.sh script, there is an error raised because there is no score_predictions.py file. Could you please add this file module to the repository?

multiple constraints

I try generate the linker with multiple constraints, e.g. this is my fragment pair
[L_6 0 0 1 1] * C ( = O ) N C 1 C C 1 . * C ( = O ) N C C ( C ) ( C ) C

One of the generated structures is this
CC(C)(C)CNC(=O)C1CCCN1C(=O)CCCn1cncn1

, in which it missed one of the original fragment.

I noticed that in the provided training data, there is not any pharmacophoric constraints. Do I need to generate a new set of training data and re-train the model?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.