prolint / prolint2 Goto Github PK

Prolint2 is an optimized tool for analyzing and visualizing lipid-protein interactions from molecular dynamics trajectories.

Home Page: https://prolint2.readthedocs.io/

License: MIT License

Shell 0.30% Python 83.38% HTML 1.61% JavaScript 14.10% CSS 0.60%

lipid-protein-interactions membrane-proteins molecular-dynamics python-library

prolint2's Introduction

ProLint v2: an optimized tool for the analysis of lipid protein interactions.

Overview

ProLint2 calculates distance-based lipid-protein interactions from molecular dynamics trajectories of membrane protein systems.

Installation
Basic examples
How to contribute?
License
Copyright
Acknowledgements

Installation

To install prolint2 we recommend creating a new conda environment as follows:

   conda create -n prolint2 python=3.8
   conda activate prolint2

Then you can install prolint2 via pip:

   pip install prolint2

Basic examples:

Using the Prolint2's API:

   from prolint2 import Universe
   from prolint2.sampledata import GIRKDataSample
   GIRK = GIRKDataSample()

   u = Universe(GIRK.coordinates, GIRK.trajectory)

   contacts = u.compute_contacts(cutoff=7) # cutoff in Angstroms

Using the Prolint2's command-line interface:

   prolint2 coordinates.gro trajectory.xtc -c 7

You can find more details on how to use prolint2 in the documentation.

How to contribute?

If you find a bug in the source code, you can help us by submitting an issue here. Even better, you can submit a Pull Request with a fix.

We really appreciate your feedback!

License

Source code included in this project is available under the MIT License.

Copyright

Acknowledgements

The respository structure of ProLint2 is based on the Computational Molecular Science Python Cookiecutter version 1.6.

prolint2's People

Contributors

Stargazers

Watchers

Forkers

flop20 tieleman-lab

prolint2's Issues

Adding the contacts class and basic tests.

Add trajectory information

We should make some basic system information very easily accessible. All of this data is already available via different MDAnalysis function and methods, but it would be nice to have them readily accessible.

Fix issue with the interactions in the Dashboard.

The residues IDs in the dashboard are not correct. The issue should be similar to the previously fixed in the exporting functions.

Create Getting Started notebook.

Include Getting Started notebook.
Modify the overview in the documentation, including a benchmark figure and at least one snapshot of the new dashboard.

Plotting error

Hello,
This is a repeat of bug #85 but there has been no one assigned and no movement on this. The module (prolint2.plotting) will not load at all, even when copying the notebooks that are part of this github page.

ModuleNotFoundError: No module named 'prolint2.plotting'

Deal with MDAnalysis warnings when working with Martini model

Default warnings coming out of MDAnalysis are not very useful when working with the Martini model.
We can try to wrap them and better handle them.

Add a cli version of the code

To facilitate the usage of the code, we should implement a cli version of the code. Initially, we only need a bare bones version with support for only the essential components.

HPC support with the Dask scheduler

Optimize the runner class to allow for the setup of remote machines to run the calculation of the contacts.

Include parallelization with Dask.

Recreate parallel routine with Dask once it has been defined the data structure for the contacts results.

Loading examples and example files

We should have a dedicated way of loading and working with example files. For example:

form UFCC.examples import GIRK
# we can then load the data using GIRK.directory notation: 
print (GIRK.trajectory)
# $INSTALLATION_DIR/data/GIRK/trajectory.xtc

Fix the command line usage.

Command-line access to prolint2 is provided by using `typer` and the config_files. This is not working since the latest release.

Include parser

Argument parser to use the library directly from the command line.

Increase the coverage of the tests.

Right now the coverage of the unit tests is only 45%, which is quite low, to demonstrate that the code is reliable for external users we would need to increase this as much as possible.

Implement the analysis of membrane curvature.

Use https://github.com/MDAnalysis/membrane-curvature.

Add unittests.

Fix command-line version.

The command-line version stoped working due to some issues with the dependencies that needs to be solved.

Show compute statistics

When calculating contacts, add useful output regarding performance e.g. time it took, resources used, etc.

This output should be enabled by default, and we should provide rough estimates using current dependencies (ie. no need to add a new dependency).

At this point, we also don't need to worry about the formatting of the output or any other similar details. A simple example:

Calculation Report
------------------
Resource used: CPU
Time per search: 0.01 seconds
Iterations: 1000
Total time: 10 seconds

Publish v1.0.0

Once #78 has been merged into the main branch, we can upgrade the package version to 1.0, which will be a significant achievement. However, there is still a long way to go. We require extensive (1) testing and (2) the addition of tutorial/example notebooks.

Support for logical operations between groups in the interactive selection.

Add support for + and -, which can be mapped into set operations (e.g. + can be union, | can be intersection, - can be A - (A∩B), and others) during the interactive selection of the Database and Query groups for the calculation of the contacts.

MDAnalysis already implements many of these. I checked the code and it should be possible to add atomgroups directly (and do other operations as well): link.

Have a look at this: https://www.codingem.com/python-__add__-method/.

Remove demo branch?

I don't think it is needed now?

include a config file for default parameters.

Add contact metrics to the contact calculation routine

We should go ahead and implement some basic contact metrics. This is essential to move forward with issue #12
We can maybe start by adding the sum of all contacts, and the average of contacts.

Heatmap/contact projection error

Hello,
I was trying to visualise the contact projection (heatmap) and got this error (below ) could you help me please ?
pl.show_contact_projection(T, bf=js[metrics[0]], cmap="Spectral_r")

TypeError Traceback (most recent call last)
/tmp/ipykernel_3340230/3395143762.py in
1 # visualize the first metric
----> 2 pl.show_contact_projection(T, bf=js[metrics[0]], cmap="Spectral_r")

/softwares/Anaconda3/2023.07/envs/prolint/lib/python3.7/site-packages/prolintpy/vis/show_contact_projection.py in show_contact_projection(t, bf, protein, residue_list, ngl_repr, cmap)
69 else:
70 if len(df.resSeq.unique()) != len(bf):
---> 71 raise TypeError ('When projecting only a subset of residues provide a list of tuples: [(residue_id, value), ...]')
72 for atom in resseq:
73 atomic_bfactors.append(bf[atom-1])

TypeError: When projecting only a subset of residues provide a list of tuples: [(residue_id, value), ...]

i used the data.json file and P454_BB.pdb (which is the file of 2 peptides ) from the output

adding config file to future main.

Add a config file for default parameters of the calculations in the future_main branch.

export_to_prolintpy method

This method needs to be optimized or completely substituted. A first step would be to change the line 21 of the w2plp.py file.

Improve API usage.

Improve the way the users can use the software as an API using a python script.

Improvements to the calculated and displayed metrics

We need to define the metrics we will calculate and use. Currently, we output a metric variable that is set to a float value. We need to update that to an iterable. I'm not so sure about the actual metrics. We can discuss the specifics during our regular meetings.
For now, we can simply use the average contact and the maximum contact.

From a broader view, we may want to group contact types into categories: default contacts, occupancies, and residence types.

Implement a visualization frontend (dashboard)

Not clear yet how we should go around implementing this. The actual features to implement are even less clear.
We can start with a backbone setup and continue from there.

prolint2.plotting error

Hello,

Working with prolint2 i was not able to import prolint2.plotting and got this error message:
" ModuleNotFoundError: No module named 'prolint2.plotting' "

An idea about this error please ?

Load server data from local file

We need to create an API that enables the storage of server data and subsequent loading of the server from the local data file.

Self-interactions.

Work on the analyses of self-interactions (e.g. lipid-lipid and lipid-protein interactions).

Polish docs.

Complete command-line section.
Complete the server section.
Use a homogeneous format for the docs in the API integration.
Polish current tutorials and add more.
Add Plotters to the tutorials.

Adding exporting options.

Add exporting options, so that users can be able to use the contacts data for specific analyses.

Redefine data structure for the results including different metrics.

Define valuable metrics and the best way to define them in the code.

Adding new lipid types with -al option in the command line.

The list type for the add_lipids variable in the command-line parser doesn't seem to be working as expected.

Define visualization functions and custom metrics.

Define the same visualization functions there were in the previous version of prolint and define a function for custom metrics.

Update setup.py installation

ufcc installation using setup.py does not work, at least not on MacOS.

Can you have a look at that and confirm that it is working on Linux?

Fix order of the groups during the interactive selection.

Make sure that the order of the selection groups during the interactive selection mode is fixed and consistent between different runs of the code so that it can scriptable.

Implement support for Index Library

Rather than making an artificial distinction between what is and what isn't protein, we should add support for a much more extensive and user friendly set of groups. We can start by taking the GROMACS make_ndx command as motivation.

Upon loading of user data, we retrieve all atom labels, and group everything into residue level. By default we can define the following labels:

1. System     Size
2. Protein    Size
3. Lipids     Size
4. Water      Size
5. Ions       Size
6. Ligands    Size
7. POPC       Size 
8. POPE       Size
...

Next we take all non-protein residues and list them all.

Now we also need a way to work with this Index Library. One suggestion could be to define a make_library() or make_index() function which takes two arguments: selection and action. Selection is a wrapper around the select_atoms MDAnalysis function, but which adds support for the default groups we define above. E.g. UFCC.make_index('select 1 and not 2', action='a')

Update the documentation.

Include usage from the command-line and the API.

notebook access

Hi Daniel,

Can we get back the access of notebooks for running Prolint2?
Thank you very much!

Regards,
Arvin

Missing dependencies and modules after prolint2 installation via pip

Hello,

After installing prolint2 on Ubuntu 22.04 with:

conda create -n prolint2 python=3.8
conda activate prolint2
pip install prolint2

I find the following missing dependencies: seaborn, logomaker (which can be installed afterwards).

The following module is also not getting installed: prolint2.plotting
Error:
from prolint2.plotting.utils import *
ModuleNotFoundError: No module named 'prolint2.plotting'

Would it be possible to fix this?
Thank you.