delemottelab / demystifying Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 7.0 124 KB

License: MIT License

Python 79.44% Jupyter Notebook 17.76% Tcl 2.80%

biomolecular-simulation deep-taylor-decomposition machine-learning molecular-dynamics

demystifying's People

Contributors

Stargazers

Watchers

Forkers

chemlove jhmlam mrauha edwardmendez95 akshay-sridhar teletchealab pyphystuff

demystifying's Issues

update demo with biological system

Broken input data link

The link to the input data specified in the README appears to be broken and leads to a 404 error. It is also broken in the published document.

Support regression for supervised learning

Replace mdtraj dependency in postprocessing by only using biopandas

Replace t-SNE with UMAP as "drop-in replacement"

McInnes et al. suggest using UMAP can speed up some calculations, and is more interpretable than t-SNE. https://pypi.org/project/umap-learn/

Add support for LDA

See https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html

We can probably use the coef_ attribute directly as importance per feautre.

switch to pip/pipenv for dependency management #11

Works better in dev/prod environment IMO and its easier to package.

Normalizing relevance per frame with negative values for MLP

In mlp_feature_extractor:

    def _normalize_relevance_per_frame(self, relevance_per_frame):
        for i in range(relevance_per_frame.shape[0]):
            ind_negative = np.where(relevance_per_frame[i, :] < 0)[0]
            relevance_per_frame[i, ind_negative] = 0
            relevance_per_frame[i, :] = (relevance_per_frame[i, :] - np.min(relevance_per_frame[i, :])) / \
                                        (np.max(relevance_per_frame[i, :]) - np.min(relevance_per_frame[i, :]) + 1e-9)
        return relevance_per_frame

I don't think we handle negative relevance values correctly, if there any. We shouldn't filter away negative relevance here and set it to 0? I think this can lead to missclassifications having zero importance in the output and "blinking" frames in videos.

running the benchmark script

Hi,

I have done the setup suggested in the Readme file for importing demystifying, then I moved to the folder bpj_paper_input and tried the command:

python run_benchmarks.py --extractor_type PCA

this gives the error:

2022-03-20 10:37:43 Extracting features-DEBUG: Done with feature extraction for auto-cutoff
2022-03-20 10:37:43 benchmarking-ERROR: 'numpy.ndarray' object is not callable
Traceback (most recent call last):
  File "/demystifying/bpj_paper_input/run_benchmarks.py", line 76, in do_run
    postprocessors = computing.compute(extractor_type=et,
  File "/demystifying/benchmarking/computing.py", line 108, in compute
    feature_to_resids=dg.feature_to_resids())
TypeError: 'numpy.ndarray' object is not callable
/demystifying/bpj_paper_input/run_benchmarks.py:97: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn("Failed for extractor %s ", et)

Some information about my setup, I am using pip, python 3.9.6, and the following dependencies' versions:

pip show numpy
Name: numpy
Version: 1.22.3
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: 
License: BSD
Location: /env-demys/lib/python3.9/site-packages
Requires: 
Required-by: scipy, scikit-learn, pandas, mdtraj, matplotlib, biopandas

pip show mdtraj
Name: mdtraj
Version: 1.9.7
Summary: MDTraj: A modern, open library for the analysis of molecular dynamics trajectories
Home-page: http://mdtraj.org
Author: Robert McGibbon
Author-email: [email protected]
License: LGPLv2.1+
Location: /env-demys/lib/python3.9/site-packages
Requires: scipy, numpy, pyparsing, astunparse
Required-by:

pip show scikit-learn 
Name: scikit-learn
Version: 1.0.2
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author: 
Author-email: 
License: new BSD
Location: /env-demys/lib/python3.9/site-packages
Requires: threadpoolctl, numpy, joblib, scipy
Required-by:

pip show biopandas
Name: biopandas
Version: 0.2.9
Summary: Machine Learning Library Extensions
Home-page: https://github.com/rasbt/biopandas
Author: Sebastian Raschka
Author-email: [email protected]
License: BSD 3-Clause
Location: /env-demys/lib/python3.9/site-packages
Requires: setuptools, pandas, numpy
Required-by:

Automatically generate pull code properties from demystifying output

Given some feature type supported by GROMACS (dihedrals, distances etc) and a topology of the system, modify the index file and generate the pull code MDP input for enhanced sampling simulation.

optional third column for chains in to feature_to_resids

For protein with many chains, several chains may have the same residue ID. We should support an optional third column to feature_to_resids to specify the chain index. This should also be taken into account when we set the beta field on pdbs in postprocessing.

For now, the workaround is to change the residue IDs