philfitters / kafe2 Goto Github PK

View Code? Open in Web Editor NEW

16.0 7.0 12.0 7.3 MB

Karlsruhe Fit Environment 2: a Python package for parameter estimation

License: GNU General Public License v3.0

Python 99.90% Makefile 0.10%

kafe2's Introduction

kafe2 - Karlsruhe Fit Environment 2

kafe2 is an open-source Python package for the likelihood-based estimation of model parameters from measured data. As the spiritual successor to the original kafe package it aims to provide state-of-the-art statistical methods in a way that is still easy to use. More information here.

If you have installed pip just run

pip install kafe2

to install the latest stable version and you're (mostly) ready to go. The Python package iminuit which kafe2 uses internally for numerical optimization may fail to be installed automatically if no C++ compiler is available on your system . While iminuit is strictly speaking not required its use is heavily recommended. Make sure to read the pip installation log. As of kafe2 v2.4.0 only Python 3 is supported. kafe2 works with matplotlib version 3.4 and newer.

The documentation under kafe2.readthedocs.io has more detailed installation instructions. It also explains kafe2 usage as well as the mathematical foundations upon which kafe2 is built.

If you prefer a more practical approach you can instead look at the various examples. In addition to the regular Python/kafe2go files there are also Jupyter notebook tutorials (in English and in German) that mostly cover the same topics.

If you encounter any bugs or have an improvement proposal, please let us know by opening an issue here.

kafe2's People

Contributors

Stargazers

Watchers

Forkers

kafetante johannesgaessler cverstege guenterquast ralfulrich a-monsch paraskoundal guyntl passion4energy mpresill mitchilaser

kafe2's Issues

Fit results are being cached in two separate locations

Currently fit results are being cached in kafe2 fit objects as well as in nexus objects.
Should all caching be outsourced to the nexus?

Inconsistent behavior between iminuit and scipy if fit.do_fit has not been called

If fit.do_fit has not been called fit.parameter_cov_mat returns None for the iminuit minimizer.
However, for the scipy minimizer a covariance matrix is being calculated.

configuration options for graphics: need functionality to override default configs, eg. in ~/.config/kafe2 and in work directory

Konfigurationsoptionen für Grafiken:
müssen vom Nutzer gesetzt werden können,
Empfehlung: falls config-Datei kafe2.matplotlibrc.conf
in ~/.config/kafe2 oder im Arbeitsverzeichis, so führt zum
Überschreiben der configs im kafe2-Installationsverzeichnis

2ln(L) for Histogram fits not meaningful - think about normalisation (e.g. wiht reference to "fully saturated model")

2 ln(L)/ndf für Histogrammfits: kann so (auch im Grenzfall großer
Stichproben) nicht mit chi2/ndf gleichgesetzt werden. Darüber
müssen wir noch einmal nachdenken ... der Wert der Likelihood
am Minimum muss passend normiert werden; üblicherweise nimmt
man dazu den Wert, der sich ergibt, wenn alle Datenpunkte exakt
auf dem Modell liegen, also ln(L_norm) = ln(L(n_i|m_i)) - ln(L(m_i|m_i))
Im Grenzfall sollte chi2 herauskommen !

Template fits for Histogram fits

For a Bachelor Thesis, template fits for histograms are needed.
This is essentially the same as an indexed fit.
Should we implement something like this or say they have to use the indexed fit?
Another feature request from this bachelor thesis was weights for histograms.

x and y-labels in kafe2go

In kafe2go the yaml file is read with _fit = FitBase.from_file(_filename, format=_input_format) at line 62, which creates a fit object. When we want to add labels to kafe2go there are 2 possibilities:

When generating the plot with _plot = _fit.generate_plot() with kwargs or setting it later with _plot.x_label = "some string", but this requires the file to be read again.
Store the labels in the fit object when calling _fit = FitBase.from_file(_filename, format=_input_format) and then set the labels of the plot when calling _plot = _fit.generate_plot(). This requires the labels to be stored in the fit class as well, but the file is only read once.

Some feedback on how to implement this is appreciated @JohannesGaessler

Custom cost functions must contain parameter_constraints

Currently parameter constraints are integrated into the cost function itself.
If a user were to specify a cost function themselves they would also have to add a term for constraints or else the fit result will be wrong.

Only use one curly brace for latex expressions

Currently when creating a latex expression to curly braces are needed. E.g. \\frac{{1}}{{x}} for 1/x
If only one curly brace is used kafe will lookup the associated param.
Example:

# assign latex names for the parameters for nicer display
fit.assign_parameter_latex_names(tau=r'\tau', fbg='f', a='a', b='b')
# assign a latex expression for the fit function for nicer display
# Currently when creating a latex expression to curly braces are needed. E.g. \\frac{{1}}{{x}} for 1/x
# If only one curly brace is used kafe will lookup the associated param e.g. \\tau for {tau} or f for {fbg}
fit.assign_model_function_latex_expression("(1-{fbg}) \\frac{{e^{{-{x}/{tau}}}}}"
                                           "{{{tau}(e^{{-{a}/{tau}}}-e^{{-{b}/{tau}}})}}"
                                           "+ {fbg} \\frac{{1}}{{{b}-{a}}}")

result grafics: no (obvious) way to change axis labels; such functionality must be provided,

Ergebnisgrafik - Setzen der Achsenbschriftungen:
die können nicht immer x oder y sein, sondern müssen
vom Nutzer definierbar sein; das sind Eigenschaften der
Daten und sollten dort definiert werden, z.b.
(x-data_label: 'time (s)', y-data_label: 'Amplitude (cm)')

suggestion of simpler user interface

compared to kafe(1), the user interface of kafe2 is quite complicated

suggestions:

a wrapper class "Fit" avoiding XYFit, IndexedFit, etc. - the kind of fit is determined
by the container object
as single method add_error(); again, add_simple_error or add_matrix_error are clearly
distinguishable by the argument list (err_matrix=None)

plot layout: x and y plot range

Most plot look very "squeezed" because the plot range in x and y is chosen to just fit the
data points +/- error bars; the 2-sigma contours often are even cropped.

request: make default 10% larger in (xmin, xmax) and (ymin, max) than is presently set
(this should then be ok in most cases and will look much nicer)
suggestion: provide simple(!) method to conifiure, eg. as an option to the plot()-method

kafe2/matplotlib import order

Some examples state that it's important to import matplotlib after kafe2.
However, I did not notice a difference or problem when I imported matplotlib first.

Implement to_file() and from_file() for unbinned fits

The two functions are currently not supported for unbinned fits.
This needs to be implemented.
This feature should also be checked for all other fit types.

please provide a function to calculate chi2 probability

eine Fuktion für die Chi2-Wahrscheinlichkeit einbauen:
from scipy import stats

def chi2prob(chi2, ndf):
""" chi2-probability
Args:
* chi2: chi2 value
* ndf: number of degrees of freedom

Returns:
  * float: chi2 probability

"""
return 1.- stats.chi2.cdf(chi2, ndf)

Histogram fit constructor docstring is wrong

The docstring for the HistFit constructor states that data should be an iterable of float.
However, the constructor actually expects data to be a HistContainer object.
Presumably the docstring was just copied from another fit class and not changed.

What is the intended data format for HistFit?
Should the docstring or the code be changed?

Error band undersampling for large datasets

Currently the error bands for xy fits always use 50 data points for their calculation.
For model functions with > 50 data points and nonlinear behavior this causes undersampling and the error band can look like it doesn't fit the data at all.
Sine and cosine functions in particular are being affected.

Warning when plotting without fitting first

fit.report displays a warning when no fit has been performed yet. Plots should probably do this too.

Unexpected behavior when x is not first argument

If you specify am xy model function that has an argument other than x as its first argument it will treat the first argument as x.
For example the model function f(x, a, b) = a * x + b is interpreted as follows when changing the order of the parameters:
f(b, a, x) = a * x + b = c + b, with c being a real number.

Mutlifit for all Fit types

Currently only XY-Fit has a multifit option. it would be very nice if every fit type had the option to perform a multifit.

Asymmmetric and symmetric errors seem to differ drastically with Multifit

This needs more investigation!
When using plot.plot(with_asymmetric_parameter_errors=True) for a Multifit, the value of the cost functions, as well as the scale of the parameter uncertainties seem to change quite drastically.

For example the uncertainty for omega is about ten times higher when using the asymmetric parameter errors. The error for A_0 is about 100 times bigger with the symmetric errors.

The following pictures show the exact same fit, the only difference is the asymmetric parameter error when plotting the results,

The fit was performed using the iminuit minimizer.

Ability to convolve multiple pdfs in unbinned fits

Günter wants the feature to convolve multiple pdfs for an unbinned fit.

Needed features:

check for normed pdfs
convolve function
check if all args of all pdfs are present in nexus when minimizing

Better spacing between function and parameters for asymmetric errors

When using a latex expression for the fit function including a fraction, the expression overlaps with the upper asymmetric error from the first parameter.

fit.report() and plot.plot() use different keywrods for asymmetric errors

This should be unified to the same keyword in order to avoid confusion.
fit.report(asymmetric_parameter_errors=True)
plot.plot(with_asymmetric_parameter_errors=True)
Which option is the preferred one? I personally would remove with, but this would affect the rest of the function keywords as well. E.g. plot.plot(with_ratio=True) should then be named ratio.

Custom cost function can't be called cost

If you try to add a custom cost function that is called cost the nexus will throw an exception since it can't create an alias called cost.

Show asymmetric_parameter_errors fails with fixed parameters

When a parameter is fixed when performing the fit, fit.report(asymmetric_parameter_errors=True) fails with an key error.

Traceback when performing a line fit with a fixed to 1 (iminuit)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/cverstege/opt/pycharm-2019.1.3/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/cverstege/src/kafe2/examples/001_linear_regression/linear_regression.py", line 38, in <module>
    line_fit.report(asymmetric_parameter_errors=True)
  File "/home/cverstege/src/kafe2/kafe2/fit/xy/fit.py", line 1109, in report
    super(XYFit, self).report(output_stream=output_stream, asymmetric_parameter_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 599, in report
    self._update_parameter_formatters(update_asymmetric_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 111, in _update_parameter_formatters
    for _fpf, _ape in zip(self._model_function.argument_formatters, self.asymmetric_parameter_errors):
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 189, in asymmetric_parameter_errors
    return self._fitter.asymmetric_fit_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/fitters/nexus_fitter.py", line 145, in asymmetric_fit_parameter_errors
    return self._minimizer.asymmetric_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 111, in asymmetric_parameter_errors
    self._par_asymm_err = self._calculate_asymmetric_parameter_errors()
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/iminuit_minimizer.py", line 84, in _calculate_asymmetric_parameter_errors
    _asymm_par_errs[_index, 0] = _minos_result_dict[_par_name]['lower']
KeyError: 'a'

Same for scipy:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/cverstege/opt/pycharm-2019.1.3/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/cverstege/src/kafe2/examples/001_linear_regression/linear_regression.py", line 38, in <module>
    line_fit.report(asymmetric_parameter_errors=True)
  File "/home/cverstege/src/kafe2/kafe2/fit/xy/fit.py", line 1109, in report
    super(XYFit, self).report(output_stream=output_stream, asymmetric_parameter_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 599, in report
    self._update_parameter_formatters(update_asymmetric_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 111, in _update_parameter_formatters
    for _fpf, _ape in zip(self._model_function.argument_formatters, self.asymmetric_parameter_errors):
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 189, in asymmetric_parameter_errors
    return self._fitter.asymmetric_fit_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/fitters/nexus_fitter.py", line 145, in asymmetric_fit_parameter_errors
    return self._minimizer.asymmetric_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 111, in asymmetric_parameter_errors
    self._par_asymm_err = self._calculate_asymmetric_parameter_errors()
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 66, in _calculate_asymmetric_parameter_errors
    _min_parameters)
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 81, in _find_chi_2_cut
    return brentq(f=_profile, a=low, b=high, xtol=self.tolerance)
  File "/home/cverstege/.virtualenvs/kafe2/local/lib/python2.7/site-packages/scipy/optimize/zeros.py", line 756, in brentq
    r = _zeros._brentq(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
ValueError: f(a) and f(b) must have different signs

[kafe2go] Support for customizing things like plots and contours via YAML

In kafe2go, some aspects of the plots created can currently be customized via a limited number CLI arguments (e.g. noband, noinfobox, etc.), but there is no way to do this via the YAML file.

It would be good to add support for this, possibly via additional YAML namespaces (e.g. plot, contours), which would then contain keywords for customizing the respective objects.

This would be particularly useful for implementing things like axis labels or other plot annotations, but is probably quite tricky to achieve in the current implementation (see related issue on axis labels: #34)

One possible way of implementing this is to add YAML deserialization (and validation) functionality to the "algorithm classes" (Plot, ContoursProfiler) themselves.

In addition, a dedicated kafe2go YAML reader would probably be needed instead of the current FitBase.from_file. This "meta-reader" could then delegate the deserialization of the dataset, plot and contours namespaces to the deserialization methods of the appropriate objects (in this case FitBase, Plot and ContoursProfiler, respectively).

default name for uncertainty band

in the latest version in branch labels, the default name for the error band is
self.update_plot_kwargs('model_error_band', dict(label="{} error".format(self._fit.model_label)))
As "error" depends on the language, this is not a very suitable name, suggest to use
self.update_plot_kwargs('model_error_band', dict(label="{}".format(self._fit.model_label) +r'$\pm 1\sigma' ))

Likelihood with x-errors

Likelihood needs to be checked for whether or not it handles x-errors correctly.

to_file() not working for Py3.6

Trying to save a kafe2 object to a file via the to_file() method throws a TypeError when using Python 3.6.

Matplotlib deprecation warning

When calling cpf.plot_profiles_contours_matrix(show_grid_for='contours') this depracation warning shows up. Same for cpf.plot_profiles_contours_matrix(show_grid_for='all')

MatplotlibDeprecationWarning: Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.

get_result_dict_for_robots(): please rename to get_results_dict()

bitte get_result_dict_for_robots() umbenennen in
get_result_dict()

ROOT installation instructions not working

The readme gives installation instructions for ROOT via the package manager.
However, when I tried installing ROOT this way it didn't work because apt could not find the described packages.
Is it perhaps necessary to add a repository first?

Preferred matplotlib frontend

The kafe2 description previously listed PyQT4 as the default matplotlib backend.
However, the backend specified in kafe2/config/kafe2.matplotlibrc.conf is TkAgg.
I changed the description to list tkinter as the default backend.
Is this correct?

Error when updating data of fit objects

Currently it's possible to update the data for all fit objects.
When the new data has a different dimension than the original data, this causes problems when performing the fit as the parametric model is not updated.
This currently affects indexed and XYMulti fits.
I was able to fix the rest in #46.
This probably needs some more attention from other people.
Please refer to #46 for the discussion.

Add label modifiers

By default the x and y labels are $x$ and $y$ as set in the default configuration. The user should be able to easily modify those labels eg. to Time $t$ [s].
Generally: quantity $symbol$ [unit] as in kafe1.
In kafe1 this was done when creating the dataset class kafe.dataset.Dataset(data=None, title='Untitled Dataset', axis_labels=['x', 'y'], axis_units=['', ''], **kwargs) where it required the user to set quantity $symbol$ as axis_label and unit. The parenthesis around the unit were added by the program.

model 0 and data 0 on the right side of the plot should be nameable as well .

To Do:

add function to modify x- and y-labels (#32) needs to be reworked for the new Plotting Class
add function to modify the name of model and data

Questions:

where to set the x- and y-labels? In the XYContainer and get it from the container object when plotting? --> Yes
modify data 0 similarly to kafe1 by setting a title in the XYContainer?
modify model 0 by specifying a model name in the XYFit object?

Unit tests

The unit tests are currently all failing on the travis system.
The reseon seems to be Travis running the test twice. The first run is fine, but when it tries to run the tests a second time, a numpy import error will fail the tests.

plot.customize() needs iterable value(s)

The plot.customize() function needs iterable values.
For example the call plot.customize('data', 'markersize', 3) will fail with:

  File "/home/cedric/.local/lib/python3.7/site-packages/kafe2/fit/_base/plot.py", line 1071, in customize
    for _val in values:
TypeError: 'int' object is not iterable

As a workaround plot.customize('data', 'markersize', [3]) can be used.

kafe formatting annotations

kafe has implemented annotations for formatting functions and their parameters.
Should this system be copied for kafe2?

example 101 multifit error

Beispiel 201_kafe2go multifit gibt Fehler
No PlotAdapter configured for fit type 'XYMultiFit'!

Jupyter notebooks

Explore use of kafe2 in conjunction with Jupyter notebooks

macOS Python 2.7 compatibility

macOS 10.15.2 mit Python 2.7 über MacPorts

kafe2go wurde in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin installiert und war nicht automatisch im Pfad
Beispiel 001, 002, 006, 101 (da habe ich aufgehört zu testen) haben Probleme mit dem tkagg Backend:
from matplotlib.backends import _tkagg
ImportError: cannot import name _tkagg
→ anscheinend liegt das an der Reihenfolge der Importe. Wenn ich matplotlib.pyplot vor kafe2 importiere, funktionieren die Beispiele (zumindest 001 und 002).
Ich habe dann kafe2 auch noch für Python 3.8 installiert. Da gibt es obiges Problem mit tkagg nicht.

set name of inependent variable (x) in graphical output

In most practcial examples the name of the variable on the x-axis is not "x".
This convention in kafe2 is ~ok for specifying the code of the model function, but not acceptable in the graphical output - here, a possibility must be provided to specify another symbol reflection
the real nature of the x-data (e.g A for amplitude, E for energy, Phi for phase, ...)

access to minimizer debug output

access to the output of the minimizer is needed as an option to be able to debug non-converging fits

Preferred rst line length

The rst files written by @dsavoiu are formatted to have very short lines.
I do not understand the reason, particularly since code comments seem to have longer lines.

show_info_box LaTeX default

PlotFigureBase.show_info_box does not format the info box as LaTeX by default.
Should this behaviour be changed?

Cost functions do not have unit tests

There are no unit tests for kafe2 cost functions.
This is particularly problematic for non-standard cost functions, e.g. neg log likelihood.

@abstractproperty with Py2 and Py3

Since Python 3.3 the decorator @abc.abstractproperty is deprecated. Instead the usage of @property with @abc.abstractmethod is recommended.
This, however, is not supported with Python 2.7.
The code currently contains both options. This behavior should be unified.

One possible solution is to add a new decorator like this:

def compatibleabstractproperty(func):

    if sys.version_info > (3, 3):             
        return property(abstractmethod(func))
    else:
        return abstractproperty(func)

or we could stick to the deprecated @abc.abstractproperty.
See https://stackoverflow.com/questions/5960337/how-to-create-abstract-properties-in-python-abstract-classes for more details.

@dsavoiu and @JohannesGaessler, whats your opinion on this matter?

Chi_Square cost functions aren't guarded for NaN

The nll cost functions in base/cost.py all have

        # guard against returning NaN
        if np.isnan(_log_likelihood_ratio):
            return np.inf

Those checks are not performed for Chi Square Cost functions. Is this intentional? If no such check is in place, there are sometimes different results than with a similar check.
This is a quick fix. Can @dsavoiu or @JohannesGaessler comment, if this is intentional or not? If it's a bug, I'll fix it.

Status updates for long tasks

Some of the kafe2 tasks like calculating contours can take relatively long.
Some sort of feedback regarding the current progress would be useful in such cases.

kafe2go doesn't support cost functions

kafe2go currently does not support saving or loading the cost functions used in a fit.

When the user uses any cost function other than the default cost function, the fit can't be reproduced solely by loading the fit file again.