Giter Club home page Giter Club logo

kafe2's Introduction

kafe2 - Karlsruhe Fit Environment 2

Documentation Status

kafe2 is an open-source Python package for the likelihood-based estimation of model parameters from measured data. As the spiritual successor to the original kafe package it aims to provide state-of-the-art statistical methods in a way that is still easy to use. More information here.

If you have installed pip just run

pip install kafe2

to install the latest stable version and you're (mostly) ready to go. The Python package iminuit which kafe2 uses internally for numerical optimization may fail to be installed automatically if no C++ compiler is available on your system . While iminuit is strictly speaking not required its use is heavily recommended. Make sure to read the pip installation log. As of kafe2 v2.4.0 only Python 3 is supported. kafe2 works with matplotlib version 3.4 and newer.

The documentation under kafe2.readthedocs.io has more detailed installation instructions. It also explains kafe2 usage as well as the mathematical foundations upon which kafe2 is built.

If you prefer a more practical approach you can instead look at the various examples. In addition to the regular Python/kafe2go files there are also Jupyter notebook tutorials (in English and in German) that mostly cover the same topics.

If you encounter any bugs or have an improvement proposal, please let us know by opening an issue here.

kafe2's People

Contributors

alexanderkaschta avatar cverstege avatar dsavoiu avatar guenterquast avatar johannesgaessler avatar kafetante avatar mitchilaser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kafe2's Issues

2ln(L) for Histogram fits not meaningful - think about normalisation (e.g. wiht reference to "fully saturated model")

2 ln(L)/ndf für Histogrammfits: kann so (auch im Grenzfall großer
Stichproben) nicht mit chi2/ndf gleichgesetzt werden. Darüber
müssen wir noch einmal nachdenken ... der Wert der Likelihood
am Minimum muss passend normiert werden; üblicherweise nimmt
man dazu den Wert, der sich ergibt, wenn alle Datenpunkte exakt
auf dem Modell liegen, also ln(L_norm) = ln(L(n_i|m_i)) - ln(L(m_i|m_i))
Im Grenzfall sollte chi2 herauskommen !

Template fits for Histogram fits

For a Bachelor Thesis, template fits for histograms are needed.
This is essentially the same as an indexed fit.
Should we implement something like this or say they have to use the indexed fit?
Another feature request from this bachelor thesis was weights for histograms.

x and y-labels in kafe2go

In kafe2go the yaml file is read with _fit = FitBase.from_file(_filename, format=_input_format) at line 62, which creates a fit object. When we want to add labels to kafe2go there are 2 possibilities:

  1. When generating the plot with _plot = _fit.generate_plot() with kwargs or setting it later with _plot.x_label = "some string", but this requires the file to be read again.
  2. Store the labels in the fit object when calling _fit = FitBase.from_file(_filename, format=_input_format) and then set the labels of the plot when calling _plot = _fit.generate_plot(). This requires the labels to be stored in the fit class as well, but the file is only read once.

Some feedback on how to implement this is appreciated @JohannesGaessler

Only use one curly brace for latex expressions

Currently when creating a latex expression to curly braces are needed. E.g. \\frac{{1}}{{x}} for 1/x
If only one curly brace is used kafe will lookup the associated param.
Example:

# assign latex names for the parameters for nicer display
fit.assign_parameter_latex_names(tau=r'\tau', fbg='f', a='a', b='b')
# assign a latex expression for the fit function for nicer display
# Currently when creating a latex expression to curly braces are needed. E.g. \\frac{{1}}{{x}} for 1/x
# If only one curly brace is used kafe will lookup the associated param e.g. \\tau for {tau} or f for {fbg}
fit.assign_model_function_latex_expression("(1-{fbg}) \\frac{{e^{{-{x}/{tau}}}}}"
                                           "{{{tau}(e^{{-{a}/{tau}}}-e^{{-{b}/{tau}}})}}"
                                           "+ {fbg} \\frac{{1}}{{{b}-{a}}}")

suggestion of simpler user interface

compared to kafe(1), the user interface of kafe2 is quite complicated

suggestions:

  • a wrapper class "Fit" avoiding XYFit, IndexedFit, etc. - the kind of fit is determined
    by the container object

  • as single method add_error(); again, add_simple_error or add_matrix_error are clearly
    distinguishable by the argument list (err_matrix=None)

plot layout: x and y plot range

Most plot look very "squeezed" because the plot range in x and y is chosen to just fit the
data points +/- error bars; the 2-sigma contours often are even cropped.

request: make default 10% larger in (xmin, xmax) and (ymin, max) than is presently set
(this should then be ok in most cases and will look much nicer)
suggestion: provide simple(!) method to conifiure, eg. as an option to the plot()-method

kafe2/matplotlib import order

Some examples state that it's important to import matplotlib after kafe2.
However, I did not notice a difference or problem when I imported matplotlib first.

please provide a function to calculate chi2 probability

  • eine Fuktion für die Chi2-Wahrscheinlichkeit einbauen:
    from scipy import stats

def chi2prob(chi2, ndf):
""" chi2-probability
Args:
* chi2: chi2 value
* ndf: number of degrees of freedom

Returns:
  * float: chi2 probability

"""
return 1.- stats.chi2.cdf(chi2, ndf)

Histogram fit constructor docstring is wrong

The docstring for the HistFit constructor states that data should be an iterable of float.
However, the constructor actually expects data to be a HistContainer object.
Presumably the docstring was just copied from another fit class and not changed.

What is the intended data format for HistFit?
Should the docstring or the code be changed?

Error band undersampling for large datasets

Currently the error bands for xy fits always use 50 data points for their calculation.
For model functions with > 50 data points and nonlinear behavior this causes undersampling and the error band can look like it doesn't fit the data at all.
Sine and cosine functions in particular are being affected.

Unexpected behavior when x is not first argument

If you specify am xy model function that has an argument other than x as its first argument it will treat the first argument as x.
For example the model function f(x, a, b) = a * x + b is interpreted as follows when changing the order of the parameters:
f(b, a, x) = a * x + b = c + b, with c being a real number.

Mutlifit for all Fit types

Currently only XY-Fit has a multifit option. it would be very nice if every fit type had the option to perform a multifit.

Asymmmetric and symmetric errors seem to differ drastically with Multifit

This needs more investigation!
When using plot.plot(with_asymmetric_parameter_errors=True) for a Multifit, the value of the cost functions, as well as the scale of the parameter uncertainties seem to change quite drastically.

For example the uncertainty for omega is about ten times higher when using the asymmetric parameter errors. The error for A_0 is about 100 times bigger with the symmetric errors.

The following pictures show the exact same fit, the only difference is the asymmetric parameter error when plotting the results,
lande_fit_sym
lande_fit_asym

The fit was performed using the iminuit minimizer.

fit.report() and plot.plot() use different keywrods for asymmetric errors

This should be unified to the same keyword in order to avoid confusion.
fit.report(asymmetric_parameter_errors=True)
plot.plot(with_asymmetric_parameter_errors=True)
Which option is the preferred one? I personally would remove with, but this would affect the rest of the function keywords as well. E.g. plot.plot(with_ratio=True) should then be named ratio.

Show asymmetric_parameter_errors fails with fixed parameters

When a parameter is fixed when performing the fit, fit.report(asymmetric_parameter_errors=True) fails with an key error.

Traceback when performing a line fit with a fixed to 1 (iminuit)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/cverstege/opt/pycharm-2019.1.3/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/cverstege/src/kafe2/examples/001_linear_regression/linear_regression.py", line 38, in <module>
    line_fit.report(asymmetric_parameter_errors=True)
  File "/home/cverstege/src/kafe2/kafe2/fit/xy/fit.py", line 1109, in report
    super(XYFit, self).report(output_stream=output_stream, asymmetric_parameter_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 599, in report
    self._update_parameter_formatters(update_asymmetric_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 111, in _update_parameter_formatters
    for _fpf, _ape in zip(self._model_function.argument_formatters, self.asymmetric_parameter_errors):
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 189, in asymmetric_parameter_errors
    return self._fitter.asymmetric_fit_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/fitters/nexus_fitter.py", line 145, in asymmetric_fit_parameter_errors
    return self._minimizer.asymmetric_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 111, in asymmetric_parameter_errors
    self._par_asymm_err = self._calculate_asymmetric_parameter_errors()
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/iminuit_minimizer.py", line 84, in _calculate_asymmetric_parameter_errors
    _asymm_par_errs[_index, 0] = _minos_result_dict[_par_name]['lower']
KeyError: 'a'

Same for scipy:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/cverstege/opt/pycharm-2019.1.3/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/cverstege/src/kafe2/examples/001_linear_regression/linear_regression.py", line 38, in <module>
    line_fit.report(asymmetric_parameter_errors=True)
  File "/home/cverstege/src/kafe2/kafe2/fit/xy/fit.py", line 1109, in report
    super(XYFit, self).report(output_stream=output_stream, asymmetric_parameter_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 599, in report
    self._update_parameter_formatters(update_asymmetric_errors=asymmetric_parameter_errors)
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 111, in _update_parameter_formatters
    for _fpf, _ape in zip(self._model_function.argument_formatters, self.asymmetric_parameter_errors):
  File "/home/cverstege/src/kafe2/kafe2/fit/_base/fit.py", line 189, in asymmetric_parameter_errors
    return self._fitter.asymmetric_fit_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/fitters/nexus_fitter.py", line 145, in asymmetric_fit_parameter_errors
    return self._minimizer.asymmetric_parameter_errors
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 111, in asymmetric_parameter_errors
    self._par_asymm_err = self._calculate_asymmetric_parameter_errors()
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 66, in _calculate_asymmetric_parameter_errors
    _min_parameters)
  File "/home/cverstege/src/kafe2/kafe2/core/minimizers/minimizer_base.py", line 81, in _find_chi_2_cut
    return brentq(f=_profile, a=low, b=high, xtol=self.tolerance)
  File "/home/cverstege/.virtualenvs/kafe2/local/lib/python2.7/site-packages/scipy/optimize/zeros.py", line 756, in brentq
    r = _zeros._brentq(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
ValueError: f(a) and f(b) must have different signs

[kafe2go] Support for customizing things like plots and contours via YAML

In kafe2go, some aspects of the plots created can currently be customized via a limited number CLI arguments (e.g. noband, noinfobox, etc.), but there is no way to do this via the YAML file.

It would be good to add support for this, possibly via additional YAML namespaces (e.g. plot, contours), which would then contain keywords for customizing the respective objects.

This would be particularly useful for implementing things like axis labels or other plot annotations, but is probably quite tricky to achieve in the current implementation (see related issue on axis labels: #34)

One possible way of implementing this is to add YAML deserialization (and validation) functionality to the "algorithm classes" (Plot, ContoursProfiler) themselves.

In addition, a dedicated kafe2go YAML reader would probably be needed instead of the current FitBase.from_file. This "meta-reader" could then delegate the deserialization of the dataset, plot and contours namespaces to the deserialization methods of the appropriate objects (in this case FitBase, Plot and ContoursProfiler, respectively).

default name for uncertainty band

in the latest version in branch labels, the default name for the error band is
self.update_plot_kwargs('model_error_band', dict(label="{} error".format(self._fit.model_label)))
As "error" depends on the language, this is not a very suitable name, suggest to use
self.update_plot_kwargs('model_error_band', dict(label="{}".format(self._fit.model_label) +r'$\pm 1\sigma' ))

Matplotlib deprecation warning

When calling cpf.plot_profiles_contours_matrix(show_grid_for='contours') this depracation warning shows up. Same for cpf.plot_profiles_contours_matrix(show_grid_for='all')

MatplotlibDeprecationWarning: Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.

ROOT installation instructions not working

The readme gives installation instructions for ROOT via the package manager.
However, when I tried installing ROOT this way it didn't work because apt could not find the described packages.
Is it perhaps necessary to add a repository first?

Preferred matplotlib frontend

The kafe2 description previously listed PyQT4 as the default matplotlib backend.
However, the backend specified in kafe2/config/kafe2.matplotlibrc.conf is TkAgg.
I changed the description to list tkinter as the default backend.
Is this correct?

Error when updating data of fit objects

Currently it's possible to update the data for all fit objects.
When the new data has a different dimension than the original data, this causes problems when performing the fit as the parametric model is not updated.
This currently affects indexed and XYMulti fits.
I was able to fix the rest in #46.
This probably needs some more attention from other people.
Please refer to #46 for the discussion.

Add label modifiers

By default the x and y labels are $x$ and $y$ as set in the default configuration. The user should be able to easily modify those labels eg. to Time $t$ [s].
Generally: quantity $symbol$ [unit] as in kafe1.
In kafe1 this was done when creating the dataset class kafe.dataset.Dataset(data=None, title='Untitled Dataset', axis_labels=['x', 'y'], axis_units=['', ''], **kwargs) where it required the user to set quantity $symbol$ as axis_label and unit. The parenthesis around the unit were added by the program.

model 0 and data 0 on the right side of the plot should be nameable as well .

To Do:

  • add function to modify x- and y-labels (#32) needs to be reworked for the new Plotting Class
  • add function to modify the name of model and data

Questions:

  • where to set the x- and y-labels? In the XYContainer and get it from the container object when plotting? --> Yes
  • modify data 0 similarly to kafe1 by setting a title in the XYContainer?
  • modify model 0 by specifying a model name in the XYFit object?

Unit tests

The unit tests are currently all failing on the travis system.
The reseon seems to be Travis running the test twice. The first run is fine, but when it tries to run the tests a second time, a numpy import error will fail the tests.

plot.customize() needs iterable value(s)

The plot.customize() function needs iterable values.
For example the call plot.customize('data', 'markersize', 3) will fail with:

  File "/home/cedric/.local/lib/python3.7/site-packages/kafe2/fit/_base/plot.py", line 1071, in customize
    for _val in values:
TypeError: 'int' object is not iterable

As a workaround plot.customize('data', 'markersize', [3]) can be used.

kafe formatting annotations

kafe has implemented annotations for formatting functions and their parameters.
Should this system be copied for kafe2?

macOS Python 2.7 compatibility

macOS 10.15.2 mit Python 2.7 über MacPorts

  • kafe2go wurde in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin installiert und war nicht automatisch im Pfad
  • Beispiel 001, 002, 006, 101 (da habe ich aufgehört zu testen) haben Probleme mit dem tkagg Backend:
    from matplotlib.backends import _tkagg
    ImportError: cannot import name _tkagg
    → anscheinend liegt das an der Reihenfolge der Importe. Wenn ich matplotlib.pyplot vor kafe2 importiere, funktionieren die Beispiele (zumindest 001 und 002).
  • Ich habe dann kafe2 auch noch für Python 3.8 installiert. Da gibt es obiges Problem mit tkagg nicht.

set name of inependent variable (x) in graphical output

In most practcial examples the name of the variable on the x-axis is not "x".
This convention in kafe2 is ~ok for specifying the code of the model function, but not acceptable in the graphical output - here, a possibility must be provided to specify another symbol reflection
the real nature of the x-data (e.g A for amplitude, E for energy, Phi for phase, ...)

Preferred rst line length

The rst files written by @dsavoiu are formatted to have very short lines.
I do not understand the reason, particularly since code comments seem to have longer lines.

show_info_box LaTeX default

PlotFigureBase.show_info_box does not format the info box as LaTeX by default.
Should this behaviour be changed?

@abstractproperty with Py2 and Py3

Since Python 3.3 the decorator @abc.abstractproperty is deprecated. Instead the usage of @property with @abc.abstractmethod is recommended.
This, however, is not supported with Python 2.7.
The code currently contains both options. This behavior should be unified.

One possible solution is to add a new decorator like this:

def compatibleabstractproperty(func):

    if sys.version_info > (3, 3):             
        return property(abstractmethod(func))
    else:
        return abstractproperty(func)

or we could stick to the deprecated @abc.abstractproperty.
See https://stackoverflow.com/questions/5960337/how-to-create-abstract-properties-in-python-abstract-classes for more details.

@dsavoiu and @JohannesGaessler, whats your opinion on this matter?

Chi_Square cost functions aren't guarded for NaN

The nll cost functions in base/cost.py all have

        # guard against returning NaN
        if np.isnan(_log_likelihood_ratio):
            return np.inf

Those checks are not performed for Chi Square Cost functions. Is this intentional? If no such check is in place, there are sometimes different results than with a similar check.
This is a quick fix. Can @dsavoiu or @JohannesGaessler comment, if this is intentional or not? If it's a bug, I'll fix it.

Status updates for long tasks

Some of the kafe2 tasks like calculating contours can take relatively long.
Some sort of feedback regarding the current progress would be useful in such cases.

kafe2go doesn't support cost functions

kafe2go currently does not support saving or loading the cost functions used in a fit.

When the user uses any cost function other than the default cost function, the fit can't be reproduced solely by loading the fit file again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.