jeschkelab / deerlab Goto Github PK

View Code? Open in Web Editor NEW

14.0 5.0 10.0 97.46 MB

Python package for data analysis for dipolar EPR spectroscopy

Home Page: https://jeschkelab.github.io/DeerLab

License: MIT License

Python 99.30% Makefile 0.09% Batchfile 0.13% TeX 0.01% HTML 0.47%

deer peldor epr esr data-analysis python

deerlab's Introduction

DeerLab

About

DeerLab is a comprehensive free scientific software package for Python focused on modeling, penalized least-squares regression, and uncertainty quantification. It provides highly specialized on the analysis of dipolar EPR (electron paramagnetic resonance) spectroscopy data. Dipolar EPR spectroscopy techniques include DEER (double electron-electron resonance), RIDME (relaxation-induced dipolar modulation enhancement), and others.

The documentation can be found here.

The early versions of DeerLab (up to version 0.9.2) are written in MATLAB. The old MATLAB codebase is archived and can be found here.

Requirements

DeerLab is available for Windows, Mac and Linux systems and requires Python 3.8, 3.9, 3.10, or 3.11.

All additional dependencies are automatically downloaded and installed during the setup.

Setup

A pre-built distribution can be installed from the PyPI repository using pip.

From a terminal (preferably with admin privileges) use the following command to install from PyPI:

python -m pip install deerlab

More details on the installation and updating of DeerLab can be found here.

Citing DeerLab

When you use DeerLab in your work, please cite the following publication:

DeerLab: a comprehensive software package for analyzing dipolar electron paramagnetic resonance spectroscopy data
Luis Fábregas Ibáñez, Gunnar Jeschke, Stefan Stoll
Magn. Reson., 1, 209–224, 2020
doi.org/10.5194/mr-1-209-2020

Here is the citation in bibtex format:

@article{FabregasIbanez2020_DeerLab,
  title = {{DeerLab}: a comprehensive software package for analyzing dipolar electron paramagnetic resonance spectroscopy data},
  author = {Fábregas Ibáñez, Luis and Jeschke, Gunnar and Stoll, Stefan},
  journal = {Magnetic Resonance},
  year = {2020},
  volume = {1},
  number = {2},
  pages = {209--224},
  doi = {10.5194/mr-1-209-2020}
}

License

DeerLab is licensed under the MIT License.

deerlab's People

Contributors

Stargazers

Watchers

Forkers

erichustedt yevpol laenan8466 thcasey3 zwbond mtessmer edmundxcvi hkaras luisfabib sergeqzin mirror-dump

deerlab's Issues

Documentation violates PEP8 guidelines

In the documentation, and especially in the examples, wildcard imports are used for the deerlab module:

from deerlab import *

This is considered a bad practice as stated in the PEP8 guidelines. The documentation should use the alias dl. for all DeerLab functions as is common practice.

import deerlab as dl

pip installation fails in a least-privilege system

When trying to install DeerLab via

pip install --user deerlab

on a cluster with only user-level privileges.

The setup.py script crashes when trying to install all other dependencies in the installation script:

# Custom OS-specific installation script
def install_dependencies():
#-----------------------------------------------------------------------
    subprocess.run(['pip','install','memoization'],check=True)
    subprocess.run(['pip','install','matplotlib'],check=True)

    # Dependencies with OS-specific BLAS
    if platform.system() == 'Windows':  
        # Install Numpy,SciPy, CVXopt linked to MKL
        subprocess.run(['pip','install','pipwin'],check=True)
        subprocess.run(['pipwin','install','numpy'],check=False)
        subprocess.run(['pipwin','install','scipy'],check=False)
        subprocess.run(['pipwin','install','cvxopt'],check=False)
    else:
        # Install Numpy,SciPy, CVXopt linked to OpenBLAS
        subprocess.run(['pip','install','numpy'],check=True)
        subprocess.run(['pip','install','scipy'],check=True)
        subprocess.run(['pip','install','cvxopt'],check=True)

    subprocess.run(['pip','install','numdifftools'],check=True)
    subprocess.run(['pip','install','tqdm'],check=True)
    subprocess.run(['pip','install','joblib'],check=True)
#-----------------------------------------------------------------------

Since pip is called without the --user flag, the installation fails due to the lack of privilege to install dependencies system-wide.

The easiest fix is to install DeerLab dependencies only at user-level not system-wide, e.g.

subprocess.run(['pip','--user','install','memoization'],check=True)

but maybe there is a more elegant way to do this.

Fitsignal does not adjust default parameters to accommodate bounds.

DeerLab will crash when using fitsignal with bounded parameters that exclude the default par0 parameter, requiring the user to indicate a par0. Perhaps the program should automatically choose a par0 within the range when a user defined par0 is not provided.

import deerlab as dl
import numpy as np
from scipy.stats import norm

r = np.linspace(1.5, 8, 256)
t = np.linspace(-0.1, 4.5, 256)
lam, param = 0.4, 80

K = dl.dipolarkernel(t, r)
P = norm(3.5, 0.2).pdf(r)
V = K@P
B = dl.bg_hom3d(t, param, lam)

Vexp = (1 - lam + lam * V) * B + np.random.normal(0, 0.005, len(B))

fit = dl.fitsignal(V, t, r, bg_lb=70)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-5fb2330e3f87> in <module>
     17 
     18 
---> 19 fit = dl.fitsignal(V, t, r, bg_lb=70)

e:\07)programming\python\deerlab\deerlab\fitsignal.py in fitsignal(Vexp, t, r, dd_model, bg_model, ex_model, dd_par0, bg_par0, ex_par0, verbose, dd_lb, bg_lb, ex_lb, dd_ub, bg_ub, ex_ub, weights, uqanalysis, regparam, regtype)
    279     lb = _parcombine(lb,[lower_dd,lower_bg,lower_ex])
    280     ub = _parcombine(ub,[upper_dd,upper_bg,upper_ex])
--> 281     _checkbounds(lb,ub,par0)
    282 
    283     # Build index vectors for accessing parameter subsets

e:\07)programming\python\deerlab\deerlab\fitsignal.py in _checkbounds(lb, ub, par0)
    699         raise ValueError('Lower bounds cannot be larger than upper bounds.')
    700     if np.any(lb>par0) or np.any(par0>ub):
--> 701         raise ValueError('Inital values for parameters must lie between lower and upper bounds.')
    702 # ==============================================================================
    703 

ValueError: Inital values for parameters must lie between lower and upper bounds.

Provide DeerLab through Anaconda Repo

Hey,
first congrats moving to python. 😍

From my experience, many researchers use Anaconda (https://www.anaconda.com/) to organize their python packages. So it could be helpful to provide DeerLab via Anaconda as well.
https://docs.anaconda.com/anaconda-cloud/user-guide/getting-started/

If not an option for you feel free to close this issue, I would appreciate consideration, as it would simplify the workflow, especially for python-n00bs. 😉

Cheers,
Dennis

[BUG] DeerLoad import time Broken

Describe the bug

DeerLoad import imports time trace as single arrays per timestep. This cannot be used in all further functions.

Environment

DeerLab Version: 0.10.0
Python Version: 3.7
OS: Windows

How to reproduce it?

Steps to reproduce the behavior:

Change experimental dataset to one you know in 'file'
Run script.

Expected behavior

Import as single numpy array. Further improvement would be using pandas for all data.

If I can spare the time I will look into the source lateron to propose a fix, but expect nothing.

Code Snippet

import numpy as np
import matplotlib.pyplot as plt
import deerlab as dl

file = r'dataset_401528.DSC'

[t_raw,V_raw,pars] = dl.deerload(file)

[V_real, V_imag, phase, imoffset] = dl.correctphase(V_raw,full_output=True)

tcorr = dl.correctzerotime(V_real,t_raw)

Files

I used the ring test data form Gunnar, should be same error for other experimental data.

Remove pytest dependency for deployed package

In setup.py

[BUG] fitsignal not providing uncertainty analysis with dipolar evolution functions

Describe the bug

When calling fitsignal wihout ex and bg models (fitting a dipolar evolution function), the signal is fitted with fitregmodel and no uncertainty analysis is done for the dipolar signal Vfit. This results in a crash when trying to display.

Environment

DeerLab Version: 0.10.dev
Python Version: 3.6.8
OS: Windows

Code Snippet

The input

fit = fitsignal([V1,V2],[t1,t2],r,'P',None,None,uqanalysis=True,display=True)

Error traceback:

d:\lufa\projects\deerlab\deerlab\deerlab\fitsignal.py in fitsignal(Vexp, t, r, dd_model, bg_model, ex_model, par0, lb, ub, weights, uqanalysis, display)
    556     # --------
    557     if DisplayResults:
--> 558         _display_results()
    559 
    560     # Return numeric arrays and not lists if there is only one signal

d:\lufa\projects\deerlab\deerlab\deerlab\fitsignal.py in _display_results()
    493         for i in range(nSignals):
    494             # Get confidence intervals for the signal
--> 495             Vci95 = Vfit_uq[i].ci(95)
    496             Vci50 = Vfit_uq[i].ci(50)
    497             # Plot the signal

AttributeError: 'NoneType' object has no attribute 'ci'

svd crash during HCCM Jacobian inversion

Hi, while fitting a series of data, one dataset somehow seemed to be problematic, and I got the error report that one sub-routine in fitsignal did not converge (see below):

What can cause this? Is there anything I can do to 'help' the fit algorithm?
Would be nice that even when this error is caught somehow, such that the fit of the series continues anyways with the remaining data in the set (otherwise I have to figure out where it crashed, remove and restart manually).

python errormessage

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag)
     95 
     96 def _raise_linalgerror_svd_nonconvergence(err, flag):
---> 97     raise LinAlgError("SVD did not converge")
     98 
     99 def _raise_linalgerror_lstsq(err, flag):

LinAlgError: SVD did not converge

Remove old MATLAB releases and tags

All tagged releases 0.9.2 and prior should be removed, including the associated assets, since they are MATLAB-based and are available in the archived MATLAB repo.

Simplify parameter boundaries interface for fisignal

Another remnant of the old MATLAB interface. In fitsignal the lower/upper boundaries of the different parameter subsets are specified as a single keyword

fitsignal(__, lb = [lb_dd,lb_bg,lb_ex], ub = [ub_dd,ub_bg,ub_ex])

Since one seldom needs to re-define all boundaries for all subsets, it would make sense to simplify this to multiple keywords

fitsignal(__, lb_dd = lb_dd, lb_bg = lb_bg lb_ex = lb_ex)

this way a quick assignment can be done without needing to worry about the order of the parameter subsets.

FitResult.plot() cannot be modified after plotting

A small change is applied to fitsignal.py: Remove/Comment line 611:
python # plt.show()
This is important so that axs can be manipulated afterwards. I would recommend to skip the plt.show() command anyway, or manipulation of the returned axis isn't possible (at least for me). In my IDE (Spyder) plots are shown after running a script/command by default.

Originally posted by @laenan8466 in #74 (comment)

Clean up issue templates for bug reports

Right now, there are two slightly different issue templates for bug reports. Consolidate into one.

(custom) ex_models

Dear DeerLab developers, here goes my first official issue (finally now via git, sorry again, Luis, for annoying you previously via other channels):

I think future users might be happy if the documentation on the usage of the ex_models would be extended a bit more. In particular, I was writing a model for a high-spin RIDME experiment, and it took me a while (admittedly being rather new to python) to figure out what the main requirements of an experiment are, in terms of what a newly written experiment must fulfil, e.g. whether it is possible to pass parameters to the experiment (I know now that it is not), how the program knows how many arguments are required, ... I realise this will probably not be a heavy-use functionality by most future users, but it is a very unique feature of DeerLab to be able to define completely custom pathways during experiments, so it might be worth documenting well :)

Feature request (?): continuing on this, it would actually be great if constant parameters could be passed on to the ex_models when they are being used in fitsignal. Otherwise even quite simple modification of the Deer-Kernel, like adjusting for the g-values, or including (fixed-value) harmonics need to be hard-coded (annoying and error-prone), or fitted via a custom routine (which I find less intuitive to use).

P.S. sorry, while I am familiar with the concept and general principles of version control via github, but I am shamefully unexperienced with actually using it, especially in collaborations, so please don't hesitate to point out when I'm doing something wrong

Cheers, Irina

Documentation: optional vs required parameters of functions

Would it be possible to distinguish better in the documentation between optional input of a DeerLab fuction, and strictly required input? For example, different formatting in the call description at the top of the documentation page, or similar might be helpful. Currently, the only clue is (from what I understood) that the parameters, which are listed below in 'Parameters' are all strictly required, whereas the stuff listed in 'Other Parameters' is optional, right? (I think this is not stated anywhere explicitly.)

I noticed also that there seem to be some cases, where parameters are only conditionally optional. For example, when fitting multiple datasets with fitparamodel, the parameter weights is suddenly required to be given with the correct length (even if all are 1), otherwise (at least in my hands) it crahsed. This should maybe be explained in the documentation as well.

Unconstistent parameter definitions in fitparamodel

When running the following script an error is prompted requesting the initial values of the parameters.

import numpy as np 
import deerlab as dl 

x = np.linspace(-5,5,500)
ymodel = lambda p: p[0] + p[1]*x + p[2]*x**2

ptrue = [10000,0.05,5]
y = ymodel(ptrue) + dl.whitegaussnoise(x,0.01)

fit = dl.fitparamodel(y,ymodel,rescale=False)

fit.plot()
print(fit.param)

In order for it to run it requires for example

`fit = dl.fitparamodel(y,ymodel,par0=[1000,0.5,0.5],rescale=False)

The issue is that par0 is listed as a keyword argument in the function definition making it optional by Python standards. While in the documentation it is stated as required, the lb and ub arguments are listed as required and the script above shows that they are optional (as should be).

Hence the documentation of fitparamodel needs to be fixed as well as the function definition and make par0 a positional argument rather than a keyword argument.

deerload parameters

With 0.10, I was able to use deerload with:

t, V, parms = dl.deerload('20200828_41R1_211R1_apoMBP_4PDEER_50K_PhSUM.DTA')

In version 0.11, I get this error unless I get rid of parms:
ValueError: not enough values to unpack (expected 3, got 2)

Documentation example bug `fitparamodel`

I think the fitparamodel example in the documentation has a bug:

In all examples the 'Vmodel' fuction at the point where the actual fit is calculated has an invalid call to np resp. newaxis (see bold below). I'm not sure what the actual code is supposed to be, I am guessing this ensures that the background has the same dimensions as the rest, so maybe np.newaxis?? However, if I use the latter instead, I still get (a different) errormessage (see below Vmodel):

example code from fitparamodeldocumentation
def Vmodel(par):
# Extract parameters
rmean,fwhm,lam,conc = par
B = bg_hom3d(t,conc)
P = dd_gauss(r,[rmean,fwhm])
V = (1-lam + lam*K0@P)*B[:,np,newaxis]
return V

lowest level error code, when changing np,newaxisto np.newaxis in the example above
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/deerlab/fitparamodel.py in lsqresiduals(p)
161
162 # Check if there are invalid values...
--> 163 if any(np.isnan(Vsim)) or any(np.isinf(Vsim)):
164 res = np.zeros_like(Vsim) # ...can happen when Jacobian is evaluated outside of bounds
165 return res

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

bug with scales after 'fitsignal'

Hi, I upgraded to DeerLab 0.12.0, and (as it mentions in the release notes) there are changes in how the fit output is now saved in terms of model scaling. However, the release mode don't say, how the final output is now scaled exactly (would be nice, because I had to adjust all my plot files accordingly...). And I think there is also a bug, because the scaling is differnet, when I used `fitsignal' with free distribtution or with model fit
In oder to get all lines on top of each other I need to scale differnt outputs differently, e.g.:

plt.plot(t,V/deerfit[i].scale,'.b')#/deerfit[i].scale
plt.plot(t,deerfit[i].V,color='r') # free distribution
plt.plot(t,deerfit_gauss[i].V/deerfit[i].scale,color='k')# gauss model

Separate plotting from fitsignal

Currently, fitsignal contains a function for plotting the fit result, which can be called via the display keyword.

I think it should be possible to put a fit result again (for example, after loading it from a stored file) without having to rerun the fit.

This means the API should provide a separate plotting function.

Scale invariability of pathway amplitudes and redundancy with V0

In the design of DeerLab seems to be some redundancy and invariability issues related to how the dipolar pathway amplitudes are defined and handled.

Scale invariability of dipolar pathways

In experiment models (except ex_4pdeer) where the unmodulated contribution is parametrized (Lam0), the absolute values of the pathway amplitudes result in a scale-invariant response in DeerLab. For example the simulations for V1 and V2 here

    V1= dl.dipolarkernel(t,r,dl.ex_5pdeer([0.8, 0.4, 0.2, 4]))@P
    V2= dl.dipolarkernel(t,r,dl.ex_5pdeer([0.4, 0.2, 0.1, 4]))@P
    np.array_equal(V1,V2)
    >> True

will result in exactly the same signal. The reason behind this is the default behavior of dipolarkernel, which normalizes the dipolar kernel after calculations (controlled by the option renormalize). BTW there is an additional option called renormpaths which has no effect on the output and will be deprecated in a future PR).

As expected by setting renormalize=False the scale-invariability of the dipolar pathways is lifted

    V1= dl.dipolarkernel(t,r,dl.ex_5pdeer([0.8, 0.4, 0.2, 4]),renormalize=False)@P
    V2= dl.dipolarkernel(t,r,dl.ex_5pdeer([0.4, 0.2, 0.1, 4]),renormalize=False)@P
    np.array_equal(V1,V2)
    >> False

Redundancy with V0

Setting renormalize=False reveals a second issue: the absolute values of the pathway amplitudes and the dipolar signals scale 'V0' become redundant. For example the simulations for V1 and V2 here

    V0 = 1
    V1= V0*dl.dipolarkernel(t,r,dl.ex_5pdeer([0.8, 0.4, 0.2, 4]),renormalize=False)@P
    V0 = 2
    V2= V0*dl.dipolarkernel(t,r,dl.ex_5pdeer([0.4, 0.2, 0.1, 4]),renormalize=False)@P
    np.array_equal(V1,V2)
    >> True

the signals are again equal. Only by reinstating the kernel normalization can these be distinguished again.

Summary

Normalization of the dipolar kernel (i.e. enforcing K(0,r)=1) is required for V0 in the separation V(t)=V0*Vintra(t)*Vinter(t) to be distinguishable.
Under normalization, pathway amplitudes in models of the form V0*(Lam0 + sum_i lam_i*Vintra_i) exhibit scale invariability.

The question now is: what is the optimal and correct way to treat this?
On one hand, pathway amplitudes are probabilistic quantities so it would make sense to think of them as being confined to a range [0,1]. On the other hand, we could imagine them relative contributions to V0, i.e. such that the sum over all dipolar pathways equals V0. This second interpretation might be more efficient due to the fact that the ex_4pdeer does not suffer from the issues described here, as the contributions are also define in a relative way.

This requires further discussion.

Here is a full working example with a graphical representation of these issues:

import deerlab as dl
import numpy as np 
import matplotlib.pyplot as plt 

t = np.linspace(-0.1,7,200)
r = np.linspace(2,6,200)
P = dl.dd_gauss(r,[3,0.2])

def V5pdeer_default(lam):
    K = dl.dipolarkernel(t,r,dl.ex_5pdeer([lam[0],lam[1],lam[2],4.23]),lambda lam:dl.bg_hom3d(t,50,lam),renormpaths=False)
    return K@P

def V5pdeer(lam):
    K = dl.dipolarkernel(t,r,dl.ex_5pdeer([lam[0],lam[1],lam[2],4.23]),lambda lam:dl.bg_hom3d(t,50,lam),renormalize=False,renormpaths=False)
    return K@P

plt.figure(figsize=[7,6])
plt.subplot(221)
plt.plot(t,V5pdeer_default([0.8,0.6,0.4]),'-',linewidth=2)
plt.plot(t,V5pdeer_default([0.4,0.3,0.2]),'--',linewidth=2)
plt.xlabel('t [us]')
plt.ylabel('V(t)')
plt.title('$V_0$=1, Kernel normalized')
plt.legend(['[$\Lambda_0$,$\lambda_1$,$\lambda_2$]','$\\frac{1}{2}$[$\Lambda_0$,$\lambda_1$,$\lambda_2$]'])

plt.subplot(222)
plt.plot(t,V5pdeer([0.8,0.6,0.4]),'-',linewidth=2)
plt.plot(t,V5pdeer([0.4,0.3,0.2]),'--',linewidth=2)
plt.xlabel('t [us]')
plt.ylabel('V(t)')
plt.title('$V_0$=1, Kernel not normalized')


plt.subplot(223)
V0 = 2
plt.plot(t,V5pdeer_default([0.8,0.6,0.4]),'-',linewidth=2)
plt.plot(t,V0*V5pdeer_default([0.4,0.3,0.2]),'--',linewidth=2)
plt.xlabel('t [us]')
plt.ylabel('V(t)')
plt.title('$V_0$=2, Kernel normalized')

plt.subplot(224)
V0 = 2
plt.plot(t,V5pdeer([0.8,0.6,0.4]),'-',linewidth=2)
plt.plot(t,V0*V5pdeer([0.4,0.3,0.2]),'--',linewidth=2)
plt.xlabel('t [us]')
plt.ylabel('V(t)')
plt.title('$V_0$=2, Kernel not normalized')

plt.tight_layout()
plt.show()

Simplify the pathinfo interface

In dipolarkernel the dipolar pathways are specified as a list of amplitude, refocusing time, and harmonic parameters:

pathinfo = [[],[],[]]
pathinfo[0] = [lam0,   nan, nan]    # unmodulated part, gives offset
pathinfo[1] = [lama,   0,   1]      # main modulation, refocusing at time zero
pathinfo[2] = [lam21, tau2, 1]      # 2+1 modulation, refocusing at time tau2

for the unmodulated pathway the refocusing time and harmonic must be specified as nan-values. This is a leftover from the MATLAB interface. Since the Python interface is list-based, the unmodulated pathway could be specified as a list with just the amplitude:

pathinfo = [[],[],[]]
pathinfo[0] = [lam0]             # unmodulated part, gives offset
pathinfo[1] = [lama,   0,   1]   # main modulation, refocusing at time zero
pathinfo[2] = [lam21, tau2, 1]   # 2+1 modulation, refocusing at time tau2

Wrong fit parameter for background-only fit

The following script fits a background-only signal with fitsignal. The fit is succesfull, and fit.B looks correct. But fit.bgparam is wrong.

#%%
import deerlab as dl
import numpy as np
import matplotlib.pyplot as plt
#%%
t = np.linspace(0,5,100)
r = np.linspace(2,6,200)
k = 0.2
B = dl.bg_exp(t,k)

fit = dl.fitsignal(B,t,r,None,dl.bg_exp,dl.ex_4pdeer,uqanalysis=False)

B2 = dl.bg_exp(t,fit.bgparam)

# %%
plt.plot(t,B,'.',label='data (bgparam = {})'.format(k))
plt.plot(t,fit.B,label='fit.B')
plt.plot(t,B2,label='fit.bgparam={}'.format(fit.bgparam))
plt.legend()

Parallelize bootan

bootan is slow. It should be easy to get a significant speedup by parallelizing the loop over iSample.

deerload should return time in microseconds

Currently, deerload returns the time axis in nanoseconds.

It should return it in microseconds, since all other DeerLab functions use microseconds. (This was first noted and suggested in #14).

FitResult.cost returns half the expected value of the objective function

The FitResult.cost attribute of the fit result objects should be returning the value of the objective function:

cost = sum(residual**2)

which right now is taken directly from the output of the different least-squares solvers in Scipy. However, these return half of the value. For example the least_squares function returns (as per the documentation):

cost = 1/2*sum(residual**2)

This must be corrected and made consistent throughout DeerLab.

[CONSISTENCY] Rice distributions still use sigma

Whereas all other parametrized distance distributions use full width at half maximum (FWHM) as a width parameter, Rice distributions still use the standard deviation of their Gaussian factor.

Fitting a background-only trace with `fitsignal` fails if scale is not 1

The following script shows that fitsignal is not able to fit a background-only signal with scale different from 1. This should work.

import numpy as np
import matplotlib.pyplot as plt
import deerlab as dl

t = np.linspace(0,5,100)
r = np.linspace(2,6,200)
B = 42*dl.bg_exp(t,0.2)

fit = dl.fitsignal(B,t,r,None,dl.bg_exp,dl.ex_4pdeer,uqanalysis=False)

plt.plot(t,B,t,fit.B)

Plot method of FitResult not working properly in MacOS

The plot() method of the FitResult object seems to not display properly on MacOS systems as reported by a user.

The following MWE:

filepath = ‘/xx/.DTA'
t,V = dl.deerload(filepath)
V = dl.correctphase(V)
t = dl.correctzerotime(V,t)
r = np.linspace(1.5,7,len(t))
 
fit = dl.fitsignal(V,t,r,'P',dl.bg_hom3d,dl.ex_4pdeer, verbose=True)
fit.plot()

seems to result in only the fit being shown, but not the original data and confidence intervals.

set parameter search resolution in `fitparamodel`

Wishlist entry :)

Would be nice to have an option to set the resolution of individual parameters in a parametric model fit with fitparamodel.
Should either be in percent of the upper bound value, and/or individually set for each entry in parameter varible (since they might have comepletly different ranges, e.g. dimensionality, and pathway contribution).

Provide distance mean etc

Currently, the user has to explicitly calculate the mean distance via r_mean = np.sum(r*fit1.P*(r[1]-r[0])) (or something more sophisticated). Same for the distance mode and median.

It would be quite useful if a DeerLab fit can provide basic statistics about the distance, such as the mean and modal distance, including uncertainty estimates for all these quantities.

This could be in the form of a function that takes r and P as input and returns these values, including uncertainties. Or it could be integrated into the fit functions.

correctzerotime crashes for positive-only times

When trying to process a time-axis where the first element is already the zero-time, the function crashes.

regparamrange crashes due to gsvd

The regparamrange employs the deerlab.utils.gsvd function which under certain circumstances leads to a crash. Until now crashes have been caused:

Numerical errors internally of Numpy:

~\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\fromnumeric.py in amax(a, axis, out, keepdims, initial, where)
   2704     """
   2705     return _wrapreduction(a, np.maximum, 'max', axis, None, out,
-> 2706                           keepdims=keepdims, initial=initial, where=where)
   2707 
   2708 

~\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85                 return reduction(axis=axis, out=out, **passkwargs)
     86 
---> 87     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     88 
     89 

ValueError: zero-size array to reduction operation maximum which has no identity

Invalid values being sent to specific memory adresses in Scipy's Fortran subroutines:

/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/scipy/linalg/decomp_qr.py:140: in qr
    qr, tau = safecall(geqrf, "geqrf", a1, lwork=lwork,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

f = <fortran object>, name = 'geqrf'
args = (array([], shape=(100, 0), dtype=float64),)
kwargs = {'lwork': 0, 'overwrite_a': False}, lwork = None
ret = (array([], shape=(100, 0), dtype=float64), array([], dtype=float64), array([0.]), -7)

    def safecall(f, name, *args, **kwargs):
        """Call a LAPACK routine, determining lwork automatically and handling
        error return values"""
        lwork = kwargs.get("lwork", None)
        if lwork in (None, -1):
            kwargs['lwork'] = -1
            ret = f(*args, **kwargs)
            kwargs['lwork'] = ret[-2][0].real.astype(numpy.int_)
        ret = f(*args, **kwargs)
        if ret[-1] < 0:
>           raise ValueError("illegal value in %dth argument of internal %s"
                             % (-ret[-1], name))
E           ValueError: illegal value in 7th argument of internal geqrf

Thus far these errors seem to depend on the OS, the precision of the hardware and the BLAS linked to Numpy.

Since neither numpy nor scipy have a working gsvd algorithm (that I could find), there are two options:

Invest time into debugging and fixing deerlab.utils.gsvd
Deprecate regparamrange. The function has lost some utility now that the regularization parameter is optimized via Golden-search algorithms rather than grid searches.

For now the best approach is to just patch regparamrange and catch the exception when running gsvd(K,L). Once caught we can just run svd(K) as an approximating alternative.

Bug: FitResult.plot doesn't scale all data.

The very basic example was used and only the Vexp was multiplied by 1000 to imitate experimental data. This results in a wrong plot (fit is in height of Vexp and datapoints are scaled to one).
This could be connected to #74 .

Example code:

import numpy as np
import matplotlib.pyplot as plt
import deerlab as dl

t = np.linspace(-0.1,4,250)        # time axis, µs
r = np.linspace(1.5,6,len(t))      # distance axis, ns
param = [3, 0.1, 0.2, 3.5, 0.1, 0.65, 3.8, 0.05, 0.15] # parameters for three-Gaussian model
P = dl.dd_gauss3(r,param)          # model distance distribution
lam = 0.5                          # modulation depth
B = dl.bg_hom3d(t,300,lam)         # background decay
K = dl.dipolarkernel(t,r,lam,B)    # kernel matrix
Vexp = (K@P + dl.whitegaussnoise(t,0.01,seed=0))*1000

fit = dl.fitsignal(Vexp,t,r,'P',dl.bg_hom3d,dl.ex_4pdeer)
fit.plot()

Output graphic:

Use objects to aggregate experimental data and parameters.

It would be nice if DeerLab operated somewhat like EasySpin in the sense that you could group chunks of data together and pass them to DeerLab functions. Some benefits include:

Alleviating issues like #29
Prevent users from overwriting or mixing common variables like t, r, V
Reduce the number of variables passed to DeerLab functions

For example:

import numpy as np
import matplotlib.pyplot as plt
import deerlab as dl

# Load and process input data
Exp = dl.deerload('MyData.DTA')
Exp = dl.correctzerotime(Exp)
Exp = dl.correctphase(Exp)
print(Exp.params)

# Fit data
r = np.linspace(1.5, 6.0, 256)
fit = dl.fitsignal(Exp, r)

# Plot data
fig, (ax1, ax2) = plt.subplots(2)
ax1.plot(Exp.t, Exp.V)
ax1.plot(fit.t, fit.V)
ax2.plot(fit.r, fit.P)
plt.show()

Merge `penalty` and `regType` keywords?

In snlls and fitsignal, one can switch regularization on/off with the penalty keyword argument (True/False), and one can choose the type of regularization with the regType keyword argument. What about eliminating penalty and using regType=None in place of penalty=False? Seems like this would preserve functionality and simplify the interface.

Documentation: 2D-datafiles, data architecture

Hi, would be nice to expand the data load documentation to descibe, what the data format is, when 2D data are loaded (i.e. is V then a list of lists, how is indexing?)

Return FitResult.cost as float instead of list

The FitResult.cost results are returned as a list even though the objective function is a scalar value.

Must be fixed throughout.

release notes: pathway definition

Minor issue, but please clarify in release notes (I know I am probably currently still almost the only persons to read the realease notes, but still ;-) :

Currently in release notes of 0.12.0 it says:
pathways = [[Lam0,np.nan], [lam1,T0]]. Now the unmodulated pathway can just be defined by its amplitude and does not require the use of np.nan, e.g. pathways = [Lam0, [lam1,T0]].

The problem is it must be defined in this format now, or it will crash, and the error message was not really helping me much. (So perhaps point out that old scripts needed to be adjusted.)

GHA Windows test suite crash due to missing DLL

During the latest automatic test run on GHA, the following exception is raised during collection of the tests via pytest:

c:\hostedtoolcache\windows\python\3.8.6\x64\lib\site-packages\deerlab\dd_models.py:8: in <module>
    import scipy.special as spc
c:\hostedtoolcache\windows\python\3.8.6\x64\lib\site-packages\scipy\special\__init__.py:633: in <module>
    from . import _ufuncs
E   ImportError: DLL load failed while importing _ufuncs: The specified module could not be found.

This issue is present for all Python versions on Windows.

Upon closer inspection it seems that there was a non crashing error reported during the numpy installation:

Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.4
    Uninstalling numpy-1.19.4:
      Successfully uninstalled numpy-1.19.4
ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'C:\\hostedtoolcache\\windows\\Python\\3.8.6\\x64\\Lib\\site-packages\\~umpy\\.libs\\libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll'
Consider using the `--user` option or check the permissions.

where there seems to be a permissions error. This is the first time that has happened for this workflow on GHA. This would explain the missing DLL when importing a Scipy module.

The quick way would be to just setup the DeerLab installation to be on a user-level with reduced privileges. This would also solve issues like #52.

dd_triangle and dd_skewgauss seem not be working properly

When simulating the distributions for these models evaluated at the start values defined in the built-in functions, it appear that these are not working properly:

dd_triangle

import deerlab as dl
import matplotlib.pyplot as plt 
import numpy as np 
model = dl.dd_triangle
r = np.linspace(0,20,400)
info = model() 
par0 = info['Start']
P = model(r,par0)
plt.figure(figsize=[6,3])
plt.plot(r,P)
plt.xlabel('r [nm]',fontsize=13)
plt.ylabel('P(r) [nm$^{-1}$]',fontsize=13)
plt.grid(alpha=0.4)
plt.tick_params(labelsize=12)
plt.tick_params(labelsize=12)
plt.tight_layout()

dd_skewgauss

import deerlab as dl
import matplotlib.pyplot as plt 
import numpy as np 
model = dl.dd_skewgauss
r = np.linspace(0,20,400)
info = model() 
par0 = info['Start']
P = model(r,par0)
plt.figure(figsize=[6,3])
plt.plot(r,P)
plt.xlabel('r [nm]',fontsize=13)
plt.ylabel('P(r) [nm$^{-1}$]',fontsize=13)
plt.grid(alpha=0.4)
plt.tick_params(labelsize=12)
plt.tick_params(labelsize=12)
plt.tight_layout()

bootstrap samples validation needed

In bootan the number of samples can be set to 1 or 0 which then lead to errors when requesting uncertainties. A short control statement is needed in the function.

Internalize bootstrapping for fitsignal quantities

At the moment fitsignal does only return uncertainty quantification based on asymptotic approximation (covariance-based). If the user wants to get bootstrapped confidence intervals the bootan must be used. While for advanced users this might be simple:

fit = dl.fitsignal(Vexp,t,rfit)
def myfcn(Vexp):
    fit = dl.fitsignal(Vexp,t,rfit)
    return fit.P,fit.V,fit.B,fit.exparam,fit.bgparam,fit.ddparam

bootuq = dl.bootan(myfcn,Vexp,fit.V,samples=N,verbose=True)

For routine use this is cumbersome. This could be easily internalized into fitsignal since the output quantities are hard-coded. The bootstrapping could then be exposed just as an option, for example:

fit = dl.fitsignal(Vexp,t,rfit,uq='covariance')
fit = dl.fitsignal(Vexp,t,rfit,uq='bootstrap')

2D datasets

Loading a 2D dataset would be neat. If I load a 2D dataset:

t, V = dl.deerload('20200828_41R1_211R1_apoMBP_4PDEER_50K.DTA')

I get the following error:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-14-a93ba8068173> in <module>
----> 1 t, V = dl.deerload('20200828_41R1_211R1_apoMBP_4PDEER_50K.DTA')

~/Library/Python/3.8/lib/python/site-packages/deerlab/deerload.py in deerload(fullbasename, Scaling, plot, full_output, *args, **kwargs)
     46
     47     # Read descriptor file (contains key-value pairs)
---> 48     parameters = read_description_file(filename_dsc)
     49     parDESC = parameters["DESC"]
     50     parSPL = parameters["SPL"]

~/Library/Python/3.8/lib/python/site-packages/deerlab/deerload.py in read_description_file(DSCFileName)
   283     """
   284     with open(DSCFileName,"r") as f:
--> 285         allLines = f.readlines()
    286 
    287     # Remove lines with comments

/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 3426: invalid start byte

Deerload 2D dataset dimension mapping

Hey,
I encounter a problem using deerload. Using this script

dataImport = dict() # introduce as dictionary
dataImport['xyAxisArray'],dataImport['intensity'],dataImport['parameters'] = dl.deerload(filepath,full_output=True)

I get returned the correct data. When taking the xyAxisArray apart I receive the correct axises.
The dimension of the intensity array matches, but the values seem to be populated the wrong direction. So my data get shiftet with every scan (here time vs scan vise).
Data is included for testing, working on a fix right now.

20200621_DB261_theta-spatial.zip

Simplify branching workflow

The current branching workflow is based on the Git flow.

I suggest we move to a simpler model (GitHub flow), with the following setup:

The default branch is called main and contains the current bleeding-edge code that should be deployable.
Small bugfixes and improvements go directly to main.
For new features and bigger refactoring, use feature branches and PRs.
For releases, tag with a release number and start a new release branch, e.g. v0.12.0.

GitHub is renaming the default branch from master to main We should follow this as well.

Handle UncertQuant objects for case uqanalysis=False

Tried it, when using: ridmefit[i] = dl.fitsignal(Vrid[i],trid[i],r[i],'P',dl.bg_strexp,dl.ex_4pdeer,uqanalysis=False), I get a new error:

AttributeError                            Traceback (most recent call last)
<ipython-input-480-3f535c4cc6fc> in <module>
     28     plt.plot(r[i],ridmefit[i].P,color='k')
     29     Puq = ridmefit[i].Puncert# Uncertainty quantification of the distance distribution
---> 30     Puq_95=Puq.ci(95)
     31     plt.fill_between(r[i],Puq_95[:,0],Puq_95[:,1],color='r',alpha=0.5)
     32     #plot RIDME-kernel result

AttributeError: 'list' object has no attribute 'ci'

Originally posted by @ritsch-i in #50 (comment)

fitsignal should return regularization parameter

fit = fitsignal(...) should return fit.alpha - this is important if automatic alpha selection is specified via the regparam keyword.

It looks like fitregmodel does, but snnls doesn't, and therefore fitsignal doesn't.

Include distance range edges in regoperator

Currently, the L matrix returned by regoperator for an N-element distance vector returns an N-2 by N matrix. This ignores the distance range edges and leads to edge artifacts, because P smoothness is not imposed at the edges. But it should. Therefore, the L matrix needs to be N by N, with [-2,1,0…] in the first row for ghe second-order difference case etc.

Remove lambda parameter from phenomenological background models

Current background models fall into two categories:

physical: bg_hom3d, bg_hom3dex, and bg_homfractal.
phenomenological: bg_exp, bg_strexp, bg_sumstrexp, bg_prodstrexp, bg_poly1, bg_poly2, bg_poly3.
All background models take lambda as a third input parameter and include it in some way into the calculation.

Physical models are derived from first-principles spin dynamics. Therefore, they do depend on the bulk spin concentration c and on the inversion efficiency lambda.

However, the phenomenological models do not have any spin-physical basis and are just shapes used to fit the time-domain decay attributed to the matrix spins. Therefore, they should not depend on lambda.

fitmultimodel is no longer deterministic

With the changes introduced in 05f304e the results of fitmultimodel are no longer deterministic as the parameter start values are changed randomly between runs. This has some disadvantages:

Reproducibility is broken if the results do not correspond to a global minimum found by multi-start approaches.
Testing can become an issue if pass conditions are tight.
Future bugs or issues related to this randomness can become a high nuisance to debug.

My proposal would be to redefine the way the start values of fitmultimodel are computed, e.g. setting them to the center of the box constraints or at intervals of those.

jeschkelab / deerlab Goto Github PK

deerlab's Introduction

DeerLab

About

Requirements

Setup

Citing DeerLab

License

deerlab's People

Contributors

Stargazers

Watchers

Forkers

deerlab's Issues

Describe the bug

Environment

How to reproduce it?

Expected behavior

Code Snippet

Files

Describe the bug

Environment

Code Snippet

Scale invariability of dipolar pathways

Redundancy with V0

Summary

Recommend Projects

Recommend Topics

Recommend Org