ocbe-uio / trajpy Goto Github PK

View Code? Open in Web Editor NEW

6.0 3.0 3.0 1.54 MB

Trajpy - empowering feature engineering for trajectory analysis across domains.

Home Page: https://ocbe-uio.github.io/trajpy/

License: GNU General Public License v3.0

Python 100.00%

python-package diffusion trajectory-analysis cell-migration repeated-measurements time-series

trajpy's Issues

Deprecation warning raised by scipy on traj_generator.py

./home/runner/work/trajpy/trajpy/trajpy/traj_generator.py:127:
DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future.
Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
y[i_step, i_sample] = sub_y[-1]

Affected code:

trajpy/trajpy/traj_generator.py

Line 127 in d86094c

y[i_step, i_sample] = sub_y[-1]

Default **params for loading .csv trajectories

Implement default parameters for opening csv files in order to make it easier.

params = { 'skip_header':1, 'delimiter':',' }

Remove sklearn dependency by replacing the linear regression function

Scikit-learn is only used for fitting the mean squared displacements (MSD) in order to obtain the anomalous exponent. This is easily replaced by scipy.stats.linregress (OLS) or by a custom function. This is important to keep the project maintainability high with fewer external dependencies.

Function diffusivity_ is deprecated in favor of Green-Kubo

This function should be removed:

trajpy/trajpy/trajpy.py

Line 446 in 0de9be8

def diffusivity_(msd_ta, timelag, ndim):

Shall not forget to remove every reference to the function from tests and compute_features().

Implement progress status in the GUI

When processing several trajectories, show the progress status.

Simple: write "n/len(trajectories)" in the text box.
Advanced: implement a progress bar.

Add to the GUI the option to process several files at once

Rename the "Open" button to "Open file" and create "Open directory" button
Add an argument to action "get_file" that distinguishes if we are opening file or directory
Add partial(self.get_file, file_or_dir) to find_bt and to the new button find_dir_bt
Write a function to process the csv files in the directory path.

Add dependencies needed for the GUI in requirements.txt

Previously I left dependencies only required to run the GUI out of the requirements list because the main intended usage for trajpy was via scripting. However, as the GUI matured and is minimally functional I think it makes sense to deliver the GUI as the main method for using the package.

The packages ttkthemes and Pillow are required to run the GUI and are not listed in requirements.txt.

ttkthemes>=2.4.0
Pillow>=8.1.0

Complete the docs with the synthetic data generator

We should write in the documentation how to generate the synthetic data and refer to the dataset available at https://zenodo.cern.ch/record/3627650 .

Create a new function for diffusivity

The calculation of the diffusion coefficient should be moved to a new @staticmethod (see bellow).
https://github.com/phydev/trajpy/blob/562415b0f35976c726df7fc8daa773dacca7fb2c/trajpy/trajpy.py#L74

@staticmethod
def diffusivity_(msd, timelag, ndim):
    """"
        :param msd: ensemble averaged mean squared displacement
        :param timelag: time-lag
        :param ndim: number of dimensions
        :return diffusivity: short-time diffusion coefficient D 
    """"
    diffusivity = Trajectory.anomalous_exponent_(msd[:10], timelag[:10]) / (2*n)
    return diffusivity

Deprecate variable `Trajectory()._t` and store the time with the spatial components

When we started coding TrajPy we only thought about processing spatial trajectories like $\vec{r}(t)$ with $\vec{r} = (x,y,z)$. However, we are also interested on providing a tool that can be applied to abstract trajectory spaces, such as blood pressure (BP) over time, where the temporal component should be processed as part of the trajectory.

Therefore, the early implementation for initialising Trajectory() where time is stored in the variable Trajectory()._t is not a good design approach for user experience. We need to deprecate this variable and store all components of the trajectory in Trajectory()._r

Considerations for the back-end:

If only 2 dimensions are provided, the code will assume that time must be processed together with the abstract dimension $x$
If 3 or more dimensions are provided as input, then the time component should not be used for computing the trajectory features
This is likely to break previous tests and examples, careful implementation is required.

Variables in Portuguese should be translated

Variables in Portuguese in the file trajpy.py should be translated to English: poder, limite, etc...

Documentation required for animals.py

There is not a single comment in this file. Please add at least one small description for each function and comments about the input and output.

trajpy/trajpy/animals.py

Line 5 in 141b617

def displacement(file):

Implement new parser with `singledispatch` and move type handling from `trajpy.init`

Currently the class trajpy accepts either a csv file or a numpy array for initialising the object. However, we can improve this by implementing a parser with functools.singledispatch.

Since trajpy's aims to be a general framework for trajectory analysis, it is critical to put more work on the parser for providing broad support for different file formats. singledispatch offers an elegant way for this implementation.

trajpy/trajpy/trajpy.py

Lines 27 to 35 in 8381bed

 if type(trajectory) == str: 

 trajectory = np.genfromtxt(trajectory, **params) 

 if type(trajectory) == np.ndarray: 

 self._t, self._r = trajectory[:, 0], trajectory[:, 1:] 

 elif type(trajectory) == tuple: 

 self._t, self._r = np.asarray(trajectory[0]), np.asarray(trajectory[1:]) 

 else: 

 raise TypeError('trajectory receives an array or a filename as input.')

Expand supported file formats

A parser for different file formats is needed, especially for processing csv files that contain several trajectories.

file format	description	priority
.xyz	file with several atoms and time points	1
.csv	csv file with several trajectories	2
LAMMPS	Molecular dynamics	2
.pdb	protein data bank	3

.xyz file

Nparticles [integer]
comment [character]
X Y Z [repeat Nparticles]
[repeat Nframes]

CSV with several trajectories - format definition

The csv should contain 5 columns: time t, 3 spatial (x, y, z) components and the trajectory identifier id.

LAMMPS data file format

Large-scale Atomic/Molecular Massively Parallel Simulator is a molecular dynamics program from Sandia National Laboratories.

More details about the file format: https://docs.lammps.org/read_data.html

The LAMMPS data dump file format is written in yaml with the following structure:

---
creator: LAMMPS
timestep: 0
units: lj
time: 0
natoms: 3
boundary: [ p, p, p, p, p, p, ]
thermo:
  - keywords: [ Step, Temp, E_pair, E_mol, TotEng, Press, ]
  - data: [ 0, 0, -27093.472213010766, 0, 0, 0, ]
box:
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 0, 0 ]
keywords: [ id, type, x, y, z, vx, vy, vz, ix, iy, iz,  ]
data:
  - [     1 , 1 ,  0.000000e+00 ,  0.000000e+00 ,  0.000000e+00 ,  -1.841579e-01 , -9.710036e-01 , -2.934617e+00 , 0 , 0 , 0, ]
  - [     2 , 1 ,  8.397981e-01 ,  8.397981e-01 ,  0.000000e+00 ,  -1.799591e+00 ,  2.127197e+00 ,  2.298572e+00 , 0 , 0 , 0, ]
  - [     3 , 1 ,  8.397981e-01 ,  0.000000e+00 ,  8.397981e-01 ,  -1.807682e+00 , -9.585130e-01 ,  1.605884e+00 , 0 , 0 , 0, ]
---
timestep: 100
...
---

A parser for this file format is straightforward with yaml.load_all() function.

Protein Data Bank (PDB) format

Standard file format for protein structures containing several atoms each file at different time steps. Each pdb file can contain a screenshot of the system or several trajectories, so we need to process several pdb files at once to extract trajectories.

A possible workflow would be:

Read each pdb file and extract the trajectories per atom
Write a CSV file using the format (y, x, y, z, id), where id is the atom identifier.
Use the CSV file to compute the features using trajpy

More information about pdb file format: https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)

Remove plot feature from the GUI to reduce dependencies

Currently we have a plotting feature in the GUI. This feature is unnecessary and increase the number of dependencies.

Code that should be removed:

in trajpy/gui.py

trajpy/trajpy/gui.py

Lines 6 to 10 in 2b74234

 from matplotlib.backends.backend_tkagg import ( 

 FigureCanvasTkAgg, NavigationToolbar2Tk) 

 # Implement the default Matplotlib key bindings. 

 from matplotlib.backend_bases import key_press_handler 

 from matplotlib.figure import Figure

trajpy/trajpy/gui.py

Line 34 in 2b74234

self.plot_bt = tk.Button(self.app, text="Plot", command=self.show_plot)

trajpy/trajpy/gui.py

Line 110 in 2b74234

self.plot_bt.place(x=440, y=130)

trajpy/trajpy/gui.py

Lines 266 to 271 in 2b74234

 def show_plot(self) -> None: 

 self._fig = Figure(figsize=(3, 3), dpi=100) 

 self._canvas = FigureCanvasTkAgg(self._fig, master=self.app) 

 self._fig.add_subplot(111).plot(self.r._t, self.r._r, ls='-.') 

 self._canvas.draw() 

 self._canvas.get_tk_widget().place(x=200, y=200)

in requirements.txt

trajpy/requirements.txt

Line 3 in 2b74234

matplotlib >= 3.3.2

After removal we should organise the GUI buttons in a better way. We can reduce the windows size and increase text size for improved accessibility.

Documentation needed for supported file formats

Write documentation about the supported file formats.

Write documentation for using the GUI

Add how to open the GUI:

python3 -m trajpy.gui

Make a screen recording tutorial.

anisotropy and asymmetry are not being computed correctly

The eigen values must be ordered by descending order (eigen[0] > eigen[1] > eigen[2]), otherwise the values for anisotropy and asymmetry are not computed correctly.

TravisCI pointing to old repo

Just a friendly post-meeting reminder to address this. 😃

Change repo URL at TravisCI from phydev/trajpy to ocbe-uio/trajpy
Adjust README.md badge accordingly

compute_features() with selected features

Rewrite the function compute_features() with the possibility to pass a list of features to be computed.

def compute_features(self, features_list):

Migrate from travis-ci to github actions

travis-ci is not working due to some change of policies regarding free plans. Therefore we should implement a github actions workflow to test, build and deploy.

Add velocity descriptors and FFT features to the GUI

The following new features need to be included in the GUI:

velocity
velocity description
- mean, median, mode, standard deviation, variance, range, kurtosis, skewness
frequency spectrum

We should start using black.

New tests required for object tracking functions

Several new functions were implemented for object tracking and analysis of live animal trajectories.

We need to write unit tests for these functions.

	if type(trajectory) == str:
	trajectory = np.genfromtxt(trajectory, **params)

	if type(trajectory) == np.ndarray:
	self._t, self._r = trajectory[:, 0], trajectory[:, 1:]
	elif type(trajectory) == tuple:
	self._t, self._r = np.asarray(trajectory[0]), np.asarray(trajectory[1:])
	else:
	raise TypeError('trajectory receives an array or a filename as input.')

	from matplotlib.backends.backend_tkagg import (
	FigureCanvasTkAgg, NavigationToolbar2Tk)
	# Implement the default Matplotlib key bindings.
	from matplotlib.backend_bases import key_press_handler
	from matplotlib.figure import Figure

	def show_plot(self) -> None:
	self._fig = Figure(figsize=(3, 3), dpi=100)
	self._canvas = FigureCanvasTkAgg(self._fig, master=self.app)
	self._fig.add_subplot(111).plot(self.r._t, self.r._r, ls='-.')
	self._canvas.draw()
	self._canvas.get_tk_widget().place(x=200, y=200)

ocbe-uio / trajpy Goto Github PK

trajpy's Issues

.xyz file

CSV with several trajectories - format definition

LAMMPS data file format

Protein Data Bank (PDB) format

Recommend Projects

Recommend Topics

Recommend Org