Giter Club home page Giter Club logo

syntalinker's People

Contributors

yuyaoyang2333 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

syntalinker's Issues

Failed to train

I ran the preprocess.sh and get the following output.

[2020-10-09 10:52:15,369 INFO] Extracting features...
[2020-10-09 10:52:15,371 INFO]  * number of source features: 0.
[2020-10-09 10:52:15,371 INFO]  * number of target features: 0.
[2020-10-09 10:52:15,371 INFO] Building `Fields` object...
[2020-10-09 10:52:15,371 INFO] Building & saving training data...
[2020-10-09 10:52:15,372 INFO] Reading source and target files: data/ChEMBL/src-train data/ChEMBL/tgt-train.
[2020-10-09 10:52:15,810 INFO] Splitting shard 0.
[2020-10-09 10:52:16,380 INFO] Building shard 0.
[2020-10-09 10:53:27,915 INFO]  * saving 0th train data shard to data/ChEMBL/.train.0.pt.
[2020-10-09 10:53:59,229 INFO] Building & saving validation data...
[2020-10-09 10:53:59,231 INFO] Reading source and target files: data/ChEMBL/src-val data/ChEMBL/tgt-val.
[2020-10-09 10:53:59,267 INFO] Splitting shard 0.
[2020-10-09 10:53:59,331 INFO] Building shard 0.
[2020-10-09 10:54:08,047 INFO]  * saving 0th valid data shard to data/ChEMBL/.valid.0.pt.
[2020-10-09 10:54:11,926 INFO] Building & saving vocabulary...
[2020-10-09 10:54:15,444 INFO]  * reloading data/ChEMBL/.train.0.pt.
[2020-10-09 10:54:20,820 INFO]  * tgt vocab size: 34.
[2020-10-09 10:54:20,820 INFO]  * src vocab size: 50.
[2020-10-09 10:54:20,820 INFO]  * merging src and tgt vocab...

But then the subsequent training.sh failed to run and gave me this

Traceback (most recent call last):                                                                          
  File "train.py", line 118, in <module>                                                                    
    main(opt)                                                                                               
  File "train.py", line 51, in main
    single_main(opt, 0)
  File "/home/UK/ama/Development/SyntaLinker/onmt/train_single.py", line 100, in main
    first_dataset = next(lazily_load_dataset("train", opt))
  File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 551, in lazily_load_dataset
    yield _lazy_dataset_loader(pt, corpus_type)
  File "/home/UK/ama/Development/SyntaLinker/onmt/inputters/inputter.py", line 538, in _lazy_dataset_loader
    dataset = torch.load(pt_file)
  File "/home/UK/ama/.conda/envs/SyntaLinker/lib/python3.6/site-packages/torch/serialization.py", line 419, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'data/ChEMBL/ChEMBL.train.pt'

These are the only files in the data.ChEMBL directory

total 304132
drwxr-xr-x 2 ama domain users      4096 Oct  9 10:54 .
drwxr-xr-x 3 ama domain users      4096 Oct  8 11:01 ..
-rw-r--r-- 1 ama domain users   6511683 Oct  8 11:01 src-test.txt
-rw-r--r-- 1 ama domain users  52028301 Oct  8 11:01 src-train
-rw-r--r-- 1 ama domain users   6502030 Oct  8 11:01 src-val
-rw-r--r-- 1 ama domain users   8071432 Oct  8 11:01 tgt-test.txt
-rw-r--r-- 1 ama domain users  64500626 Oct  8 11:01 tgt-train
-rw-r--r-- 1 ama domain users   8060092 Oct  8 11:01 tgt-val
-rw-r--r-- 1 ama domain users 146295349 Oct  9 10:54 .train.0.pt
-rw-r--r-- 1 ama domain users  18167444 Oct  9 10:54 .valid.0.pt
-rw-r--r-- 1 ama domain users      1355 Oct  9 10:54 .vocab.pt

Segmentation Error

Have you come across this segmentation error when you ran the testing_beam_search?
Is there any way to fix this?

Thanks in advance

image

multiple constraints

I try generate the linker with multiple constraints, e.g. this is my fragment pair
[L_6 0 0 1 1] * C ( = O ) N C 1 C C 1 . * C ( = O ) N C C ( C ) ( C ) C

One of the generated structures is this
CC(C)(C)CNC(=O)C1CCCN1C(=O)CCCn1cncn1

, in which it missed one of the original fragment.

I noticed that in the provided training data, there is not any pharmacophoric constraints. Do I need to generate a new set of training data and re-train the model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.