pedrojuanbj / mltsa Goto Github PK

Machine Learning Transition State Analysis (MLTSA) suite with Analytical models to create data on demand and test the approach on different types of data and ML models.

Home Page: https://mltsa.readthedocs.io/en/latest/

License: MIT License

Python 2.63% Jupyter Notebook 97.37%

tensorflow machine-learning deep-learning molecular-dynamics-analysis molecular-dynamics sklearn-compatible tensorflow-compatible sklearn time-series time-series-analysis time-series-classification enhanced-sampling

mltsa's Introduction

MLTSA: Machine Learning Transition State Analysis repository

Introduction

This is a Python package to apply the MLTSA approach for relevant CV identification on Molecular Dynamics data using both Sklearn and TensorFlow modules.It also includes both a suite of 1D Potential Analytical model feature generation module for light testing and a suite of different 2D potential shapes (Spiral, Z-shaped) generation as well as the posterior feature generation by 1D projections of the 2D data. In this package you will find:

Data Generation Module (MLTSA_datasets) : Contains files with the easy to call 1D/2D/MD examples to generate data or play around with it as tests for the approach.
Scikit-Learn-based ML models and Feature Reduction module (MLTSA_sklearn) : Contains the Scikit-Learn integrated functions to apply MLTSA on data.
TensorFlow-based ML models and Feature Reduction module (MLTSA_tensorflow): Contains the set of functions and different models built on TensorFlow to apply MLTSA on data.

Usage

Example OneD
Example TwoD
Example Train
Example MLTSA

Installation

To use MLTSA, first install it using pip:

(.venv) $ pip install MLTSA

mltsa's People

Contributors

Stargazers

Watchers

Forkers

zwei21 handsomeshao11 captainfjc tianyun7 cynthia-0807

mltsa's Issues

Review and finish up the 1D analytical model module documentation

Fix documentation and read the docs error compilation.

The build currently cannot compile under the read the docs and it is also missing modules/functions.

Test all Notebooks for bugs

Please if anyone would be so kind to run correctly all available notebooks and find the bugs, then list them in the project page for bugs so we are aware and someone else can fix them.

2D Models (Z-shaped and Spiral) data gen implementation on datagen.py module

We need to implement the 2D data generation classes and the projection to 1D suite in a single module inside MLTSA_datasets under the 2D folder, the name of the module can be anything but something related to it like datagen.py or datagen_2D.py is good.

For this implementation to be shipped we need to add the latest data generation code on the models, we could have some optional plots for the free energy surface of the potentials, also it should be implemented as a module call similar to the 1D one to generate data on demand as it is really fast and optimized now. Additionally the 1D projection code should be added to it to provide the data ready for train on demand as well.

What: To enhance the management of MLTSA project, which would yield a better layout for users and readers of MLTSA paper whom has been linked to this repo from the paper.

For pacakage using: The structure of importing should keep shallow, a deep structure is bad for api design, exmpale: from MLTSA import dataset, then use dataset.functions, avoid deep importing structure as from MLTSA.dataset.twoDdata.generator import ...
For package developing: change log and readme file should be updated from time to time, to let users and other developers be aware of project management status clearly
For paper reference: A notebook folder would be ideal container having example code in it, as showing proper supporting examples for the readers who jumped to this repo from the paper published, if possible, it would be fine for keeping the example code unchanged using github archive(which yield a new repo) or create a new branch to store the example code which used in the paper.

Who: Pedro and zwei21 would be discuss and work on this issue together
Where: The whole project repo file structure should be considered to restructure
How:

Pedro has suggested a python package called "cookiecutter" which would automatically generate file directory with respect to defined templates. Few templates suitable for deep learning projects has been recommended by Pedro. However, this method would generate brand new directory which means the origional file of MLTSA would be totally moved and transferred to the new directory, can't say if this would consume more time or cost.
Zwei21 suggest amend the current file directory accoring to the given repo templates, which would be less difficult since there are few folders that could be reused in the new structure, however, this requires reconsidering the MLTSA repo current structure and rearranging code already build, problems like importing structure in current files would occur when this plan is merging. A good understanding of whole MLTSA structure should be considered well for this plan; in this reason, zwei sugguest that comment and document all the current code and files before the reconstruction, like making an inventory report before rebasing the warehouse.

When: This should be done no later than end of August, 2022, from when zwei21 would leave UCL and all incomplete projects would be difficult to finish.