Giter Club home page Giter Club logo

trajpy's Introduction

PyPI version Build Status Documentation Status Python3 License: GPL v3 DOI Binder

TrajPy

Trajectory analysis is a challenging task and fundamental for understanding the movement of living organisms in various scales.

We propose TrajPy as an easy pythonic solution to be applied in studies that demand trajectory analysis. With a friendly graphic user interface (GUI) it requires little knowledge of computing and physics to be used by nonspecialists.

TrajPy is composed of three main units of code:

  • Basic usage:
    • The GUI: it is where you interact with trajpy and the only thing you need to know to start using it
  • Advanced
    • trajpy.py: it's the heart of trajpy, it computes the Features for characterizing the trajectories
    • traj_generator.py: a trajectory generator that can be used to build a dataset for trajectory classification

Our dataset and Machine Learning (ML) model are available for use, as well the generator for building your own database.

Installation

We have the package hosted at PyPi, for installing use the command line:

pip3 install trajpy

If you want to test the development version, clone the repository at your local directory from your terminal:

git clone https://github.com/ocbe-uio/trajpy

Then run the setup.py for installing

python setup.py --install

Basic Usage Example

Using the Graphic User Interface (GUI)

Open a terminal and execute the line bellow

python3 -m trajpy.gui

1 - You can open one file at time clicking on Open file... or process several files in the same director with Open directory...

2 - Select the features to be computed by ticking the boxes

3 - Click on Compute

4 - Select the directory and file name where the results will be stored

The processing is ready when the following message appears in the text box located at the bottom of the GUI:

Results saved to /path/to/results/output.csv

File formats

Comma separated values (CSV)

Currently trajpy support CSV files organized in 4 columns: time t and 3 spatial coordinates x, y, z:

t x y z
1.00 10.00  50.00 50.00
2.00 11.00 50.00 50.00
3.00 11.00 50.00 50.00
4.00 12.00 50.00 50.00
5.00 12.00 50.00 50.00
6.00 13.00 50.00 50.00

See the sample file provided in this repository as example.

LAMMPS YAML dump format

LAMMPS YAML files are defined with the following structure:

    ---
    time: 0.0
    natoms: 100
    keywords: [id, type, x, y, z, vx, vy, vz, fx, fy, fz]
    data:
    - [1, 1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -nan, -nan, -nan]
    - [2, 1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -nan, -nan, -nan]
    - [3, 1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -nan, -nan, -nan]
    ...

We provide support for parsing this type of data files with the function parse_lammps_dump_yaml().

Scripting

First we import the package

import trajpy.trajpy as tj

Then we load the data sample provided in this repository, we pass the arguments skip_header=1 to skip the first line of the file and delimiter=',' to specify the file format

filename = 'data/samples/sample.csv'
r = tj.Trajectory(filename,
                  skip_header=1,
                  delimiter=',')

Finally, for computing a set of features for trajectory analysis we can simple run the function r.compute_features()

    r.compute_features()

The features will be stored in the object r, for instance:

  >>> r.asymmetry
  >>> 0.5782095322093505
  >>> r.fractal_dimension
  >>> 1.04
  >>> r.efficiency
  >>> 0.29363293632936327
  >>> r.gyration_radius
  >>> array([[30.40512689,  5.82735002,  0.96782673],
  >>>     [ 5.82735002,  2.18625318,  0.27296851],
  >>>     [ 0.96782673,  0.27296851,  2.41663589]])

For more examples please consult the extended documentation: https://trajpy.readthedocs.io/

Requirements

  • numpy >= 1.14.3
  • scipy >= 1.7.1
  • ttkthemes >= 2.4.0
  • Pillow >= 8.1.0
  • PyYAML >= 5.3.1

How to cite?

If using TrajPy for academic work, please cite our methodological paper and Software DOI:

@article{10.1093/bioadv/vbae026,
    author = {Moreira-Soares, Maurício and Mossmann, Eduardo and Travasso, Rui D M and Bordin, José Rafael},
    title = "{TrajPy: empowering feature engineering for trajectory analysis across domains}",
    journal = {Bioinformatics Advances},
    volume = {4},
    number = {1},
    pages = {vbae026},
    year = {2024},
    month = {02},
    issn = {2635-0041},
    doi = {10.1093/bioadv/vbae026},
    url = {https://doi.org/10.1093/bioadv/vbae026},
    eprint = {https://academic.oup.com/bioinformaticsadvances/article-pdf/4/1/vbae026/56926570/vbae026.pdf},
}

@software{mauricio_moreira_2020_3978699,
  author       = {Mauricio Moreira and Eduardo Mossmann},
  title        = {phydev/trajpy: TrajPy 1.3.1},
  month        = aug,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {1.3.1},
  doi          = {10.5281/zenodo.3978699},
  url          = {https://doi.org/10.5281/zenodo.3978699}
}

Contribution

This is an open source project, and all contributions are welcome. Feel free to open an Issue, a Pull Request, or to e-mail us.

Publications using trajpy

Moreira-Soares M., Mossmann E., Travasso R. D. M, Bordin J. R., TrajPy: empowering feature engineering for trajectory analysis across domains, Bioinformatics Advances, Volume 4, Issue 1, 2024, vbae026, doi:10.1093/bioadv/vbae026

Eduardo Henrique Mossmann. A physics based feature engineering framework for trajectory analysis. MSc dissertation. Federal University of Pelotas 2022, Brazil.

Simões, RF, Pino, R, Moreira-Soares, M, et al. Quantitative Analysis of Neuronal Mitochondrial Movement Reveals Patterns Resulting from Neurotoxicity of Rotenone and 6-Hydroxydopamine. FASEB J. 2021; 35:e22024. doi:10.1096/fj.202100899R

Moreira-Soares, M., Pinto-Cunha, S., Bordin, J. R., Travasso, R. D. M. Adhesion modulates cell morphology and migration within dense fibrous networks. https://doi.org/10.1088/1361-648X/ab7c17

References

Arkin, H. and Janke, W. 2013. Gyration tensor based analysis of the shapes of polymer chains in an attractive spherical cage. J Chem Phys 138, 054904.

Wagner, T., Kroll, A., Haramagatti, C.R., Lipinski, H.G. and Wiemann, M. 2017. Classification and Segmentation of Nanoparticle Diffusion Trajectories in Cellular Micro Environments. PLoS One 12, e0170165.

trajpy's People

Contributors

eduardohenriquemossmann avatar phydev avatar pyup-bot avatar wpendl99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

trajpy's Issues

Create a new function for diffusivity

The calculation of the diffusion coefficient should be moved to a new @staticmethod (see bellow).
https://github.com/phydev/trajpy/blob/562415b0f35976c726df7fc8daa773dacca7fb2c/trajpy/trajpy.py#L74

@staticmethod
def diffusivity_(msd, timelag, ndim):
    """"
        :param msd: ensemble averaged mean squared displacement
        :param timelag: time-lag
        :param ndim: number of dimensions
        :return diffusivity: short-time diffusion coefficient D 
    """"
    diffusivity = Trajectory.anomalous_exponent_(msd[:10], timelag[:10]) / (2*n)
    return diffusivity

Deprecation warning raised by scipy on traj_generator.py

./home/runner/work/trajpy/trajpy/trajpy/traj_generator.py:127:
DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future.
Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
y[i_step, i_sample] = sub_y[-1]

Affected code:

y[i_step, i_sample] = sub_y[-1]

Add dependencies needed for the GUI in requirements.txt

Previously I left dependencies only required to run the GUI out of the requirements list because the main intended usage for trajpy was via scripting. However, as the GUI matured and is minimally functional I think it makes sense to deliver the GUI as the main method for using the package.

The packages ttkthemes and Pillow are required to run the GUI and are not listed in requirements.txt.

ttkthemes>=2.4.0
Pillow>=8.1.0

Remove plot feature from the GUI to reduce dependencies

Currently we have a plotting feature in the GUI. This feature is unnecessary and increase the number of dependencies.

Code that should be removed:

  • in trajpy/gui.py

    trajpy/trajpy/gui.py

    Lines 6 to 10 in 2b74234

    from matplotlib.backends.backend_tkagg import (
    FigureCanvasTkAgg, NavigationToolbar2Tk)
    # Implement the default Matplotlib key bindings.
    from matplotlib.backend_bases import key_press_handler
    from matplotlib.figure import Figure

    self.plot_bt = tk.Button(self.app, text="Plot", command=self.show_plot)

    self.plot_bt.place(x=440, y=130)

    trajpy/trajpy/gui.py

    Lines 266 to 271 in 2b74234

    def show_plot(self) -> None:
    self._fig = Figure(figsize=(3, 3), dpi=100)
    self._canvas = FigureCanvasTkAgg(self._fig, master=self.app)
    self._fig.add_subplot(111).plot(self.r._t, self.r._r, ls='-.')
    self._canvas.draw()
    self._canvas.get_tk_widget().place(x=200, y=200)

  • in requirements.txt

    matplotlib >= 3.3.2

After removal we should organise the GUI buttons in a better way. We can reduce the windows size and increase text size for improved accessibility.

Migrate from travis-ci to github actions

travis-ci is not working due to some change of policies regarding free plans. Therefore we should implement a github actions workflow to test, build and deploy.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Remove sklearn dependency by replacing the linear regression function

Scikit-learn is only used for fitting the mean squared displacements (MSD) in order to obtain the anomalous exponent. This is easily replaced by scipy.stats.linregress (OLS) or by a custom function. This is important to keep the project maintainability high with fewer external dependencies.

Implement progress status in the GUI

When processing several trajectories, show the progress status.

Simple: write "n/len(trajectories)" in the text box.
Advanced: implement a progress bar.

Add to the GUI the option to process several files at once

  • Rename the "Open" button to "Open file" and create "Open directory" button
  • Add an argument to action "get_file" that distinguishes if we are opening file or directory
  • Add partial(self.get_file, file_or_dir) to find_bt and to the new button find_dir_bt
  • Write a function to process the csv files in the directory path.

TravisCI pointing to old repo

Just a friendly post-meeting reminder to address this. 😃

  • Change repo URL at TravisCI from phydev/trajpy to ocbe-uio/trajpy
  • Adjust README.md badge accordingly

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Implement new parser with `singledispatch` and move type handling from `trajpy.__init__`

Currently the class trajpy accepts either a csv file or a numpy array for initialising the object. However, we can improve this by implementing a parser with functools.singledispatch.

Since trajpy's aims to be a general framework for trajectory analysis, it is critical to put more work on the parser for providing broad support for different file formats. singledispatch offers an elegant way for this implementation.

trajpy/trajpy/trajpy.py

Lines 27 to 35 in 8381bed

if type(trajectory) == str:
trajectory = np.genfromtxt(trajectory, **params)
if type(trajectory) == np.ndarray:
self._t, self._r = trajectory[:, 0], trajectory[:, 1:]
elif type(trajectory) == tuple:
self._t, self._r = np.asarray(trajectory[0]), np.asarray(trajectory[1:])
else:
raise TypeError('trajectory receives an array or a filename as input.')

Deprecate variable `Trajectory()._t` and store the time with the spatial components

When we started coding TrajPy we only thought about processing spatial trajectories like $\vec{r}(t)$ with $\vec{r} = (x,y,z)$. However, we are also interested on providing a tool that can be applied to abstract trajectory spaces, such as blood pressure (BP) over time, where the temporal component should be processed as part of the trajectory.

Therefore, the early implementation for initialising Trajectory() where time is stored in the variable Trajectory()._t is not a good design approach for user experience. We need to deprecate this variable and store all components of the trajectory in Trajectory()._r

Considerations for the back-end:

  1. If only 2 dimensions are provided, the code will assume that time must be processed together with the abstract dimension $x$
  2. If 3 or more dimensions are provided as input, then the time component should not be used for computing the trajectory features
  3. This is likely to break previous tests and examples, careful implementation is required.

Expand supported file formats

A parser for different file formats is needed, especially for processing csv files that contain several trajectories.

file format description priority
.xyz file with several atoms and time points 1
.csv csv file with several trajectories 2
LAMMPS Molecular dynamics 2
.pdb protein data bank 3

.xyz file

Nparticles [integer]
comment [character]
X Y Z [repeat Nparticles]
[repeat Nframes] 

CSV with several trajectories - format definition

The csv should contain 5 columns: time t, 3 spatial (x, y, z) components and the trajectory identifier id.

LAMMPS data file format

Large-scale Atomic/Molecular Massively Parallel Simulator is a molecular dynamics program from Sandia National Laboratories.

More details about the file format: https://docs.lammps.org/read_data.html

The LAMMPS data dump file format is written in yaml with the following structure:

---
creator: LAMMPS
timestep: 0
units: lj
time: 0
natoms: 3
boundary: [ p, p, p, p, p, p, ]
thermo:
  - keywords: [ Step, Temp, E_pair, E_mol, TotEng, Press, ]
  - data: [ 0, 0, -27093.472213010766, 0, 0, 0, ]
box:
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 0, 0 ]
keywords: [ id, type, x, y, z, vx, vy, vz, ix, iy, iz,  ]
data:
  - [     1 , 1 ,  0.000000e+00 ,  0.000000e+00 ,  0.000000e+00 ,  -1.841579e-01 , -9.710036e-01 , -2.934617e+00 , 0 , 0 , 0, ]
  - [     2 , 1 ,  8.397981e-01 ,  8.397981e-01 ,  0.000000e+00 ,  -1.799591e+00 ,  2.127197e+00 ,  2.298572e+00 , 0 , 0 , 0, ]
  - [     3 , 1 ,  8.397981e-01 ,  0.000000e+00 ,  8.397981e-01 ,  -1.807682e+00 , -9.585130e-01 ,  1.605884e+00 , 0 , 0 , 0, ]
---
timestep: 100
...
---

A parser for this file format is straightforward with yaml.load_all() function.

Protein Data Bank (PDB) format

Standard file format for protein structures containing several atoms each file at different time steps. Each pdb file can contain a screenshot of the system or several trajectories, so we need to process several pdb files at once to extract trajectories.

A possible workflow would be:

  1. Read each pdb file and extract the trajectories per atom
  2. Write a CSV file using the format (y, x, y, z, id), where id is the atom identifier.
  3. Use the CSV file to compute the features using trajpy

More information about pdb file format: https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.