danielsoler93 / drug_learning Goto Github PK

Open package with DL and ML tools for drug discovery. It is based in a 3 branch package: 2D-QSAR, 3D-QSAR, simulation+AI

Home Page: https://danielsoler93.github.io/drug_learning/

License: MIT License

Python 100.00%

drug_learning's Issues

Docs '' in fingerprint conversion

When running python drug_learning/drug_learning/two_dimensions/main_fingerprints.py test_zinc/*.sdf-mo -pq split -ch 1000 -nsp 15it fails with error:

main_fingerprints.py: error: argument split: invalid choice: 'test_zinc/core_1.sdf' (choose from 'split')

If you do python drug_learning/drug_learning/two_dimensions/main_fingerprints.py 'test_zinc/*.sdf'-mo -pq split -ch 1000 -nsp 15 it works.

Please add that to the docs

Parsers different Files

As the architecture allows to run split and fingerprints separately, and in a future we plan to have many more things we could already split the different argparse in different files and join all of them in the main file.

Tasks:

generate splite_argparse.py
fingerprint_argpares.py
main.py

Parallelize with ray

Multiprocessing does not share memory between nodes. @cescgina, proposed Ray as a high level library to implement MPI processes to use several nodes at full capacity. First benchmark was 10M in 14h. (Mordred descriptors).

Implement Ray for mordred
Change multiprocessing for ray

Full documentation of the prediction branch

Parallelize with ray tutorial
Command line full docs
API full docs

Writing mordred to parquet file crashes

I don't have currently a trace but trying to write mordred descriptors to a parquet file gives an error.

Cannot parallelize mordred descriptors + remove progress bar

The calculation of mordred descriptors is internally parallelized via a multiprocessing pool, which crashes if called via a script using another pool. To avoid this issue you can set nproc=1 in the call to calc.pandas in fingerprints.py. I don't know however if it is faster to use a external pool or serially process the file and use the mordred internal pool.

Also you can use quiet=False in the calc.pandas function to avoid the progress bar in the output, which will look bad if the output is redirected to a file.

Check whether is faster to use mordred pool or and external one
Fix paralelization of mordred
Use quiet=False to remove progress bar

Training ensemble model

Read Joan's code and get what it does
Implement new 2D module for training

Unable to write only csv in package to calculate fingerprints

Default value for the --parquet option is true, but I found no way to turn it off since the action in the argparse is store_true. I think the default should be False. Then, if the user has not specified any output format either raise an error or set parquet as the output format.

danielsoler93 / drug_learning Goto Github PK

drug_learning's People

Contributors

Stargazers

Watchers

Forkers

drug_learning's Issues

Docs '' in fingerprint conversion

Parsers different Files

Parallelize with ray

Full documentation of the prediction branch

Writing mordred to parquet file crashes

Cannot parallelize mordred descriptors + remove progress bar

Training ensemble model

Unable to write only csv in package to calculate fingerprints

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent