scikit-hep / pyhf Goto Github PK

pure-Python HistFactory implementation with tensors and autodiff

License: Apache License 2.0

Python 99.43% Dockerfile 0.28% Shell 0.21% TeX 0.07%

high-energy-physics statistical-inference scientific-computations numpy scipy tensorflow pytorch asymptotic-formulas statistics frequentist-statistics

pyhf's Introduction

`scikit-hep`: metapackage for Scikit-HEP

Project info

The Scikit-HEP project is a community-driven and community-oriented project with the aim of providing Particle Physics at large with an ecosystem for data analysis in Python embracing all major topics involved in a physicist's work. The project started in Autumn 2016 and its packages are actively developed and maintained.

It is not just about providing core and common tools for the community. It is also about improving the interoperability between HEP tools and the Big Data scientific ecosystem in Python, and about improving on discoverability of utility packages and projects.

For what concerns the project grand structure, it should be seen as a toolset rather than a toolkit.

Getting in touch

There are various ways to get in touch with project admins and/or users and developers.

scikit-hep package

scikit-hep is a metapackage for the Scikit-HEP project.

Installation

You can install this metapackage from PyPI with `pip`:

python -m pip install scikit-hep

or you can use Conda through conda-forge:

conda install -c conda-forge scikit-hep

All the normal best-practices for Python apply; you should be in a virtual environment, etc.

Package version and dependencies

Please check the setup.cfg and requirements.txt files for the list of Python versions supported and the list of Scikit-HEP project packages and dependencies included, respectively.

For any installed scikit-hep the following displays the actual versions of all Scikit-HEP dependent packages installed, for example:

>>> import skhep
>>> skhep.show_versions()

System:
    python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
executable: /srv/conda/envs/notebook/bin/python
   machine: Linux-5.15.0-72-generic-x86_64-with-glibc2.27

Python dependencies:
       pip: 23.1.2
     numpy: 1.24.3
     scipy: 1.10.1
    pandas: 2.0.2
matplotlib: 3.7.1

Scikit-HEP package version and dependencies:
        awkward: 2.2.2
boost_histogram: 1.3.2
  decaylanguage: 0.15.3
       hepstats: 0.6.1
       hepunits: 2.3.2
           hist: 2.6.3
     histoprint: 2.4.0
        iminuit: 2.21.3
         mplhep: 0.3.28
       particle: 0.22.0
          pylhe: 0.6.0
       resample: 1.6.0
          skhep: 2023.06.09
         uproot: 5.0.8
         vector: 1.0.0

Note on the versioning system:

This package uses Calendar Versioning (CalVer).

pyhf's People

Contributors

Stargazers

Watchers

pyhf's Issues

multi-channel support

implement staterror

should be mostly straight forward but requires additional bookkeeping which samples participate. Essentially one additional constraint term per bin

set up unit-tests

pytest https://docs.pytest.org is what I use usually

write documentation

in yadage, we have a travis deploy job that automatically builds docs and pushes to gh-pages

https://github.com/diana-hep/yadage/blob/master/.travis.yml#L25

Setup CI
Add automodule in order to get source docs
Setup theme that supports Google Style docstrings

Add WS_Builder benchmarks

As pointed out in Issue #77, @vincecr0ft has made WS_Builder benchmarks at WorkSpaceBuilder. These need to be integrated into the pyhf benchmarking suite.

First thoughts on steps:

Understand benchmarks
Reimplement without ROOT (Vince's feedback will be helpful here)
Move into pyhf's pytest framework

implement MXNet autograd based optimizer (newton's method)

now that we have a mxnet backend #83 we can continue to get a optimizer for that backend.

check docstrings via pydocstyle

Description

The pydocstyle package can check docstrings for compliance with style conventions such as numpy.

https://github.com/PyCQA/pydocstyle

we should implement this as part of travis / documentation building

Bug Report: Optimizer not set when backend is changed

Description

If using a backend other than numpy_backend currently we have to manually set the optimizer. However, this should be done automatically when the backend is changed.

Otherwise this causes an error, as shown below with pyhf.runOnePoint().

Expected Behavior

When changing pyhf.tensorlib have this change be automatically detected and also change pyhf.optimizer appropriately.

Actual Behavior

When changing pyhf.tensorlib pyhf.optimizer must be manually set. If it is not, then errors can be caused, as seen below.

Steps to Reproduce

import pyhf
from pyhf.tensor.tensorflow_backend import tensorflow_backend
from pyhf.simplemodels import hepdata_like
import tensorflow as tf

if __name__ == '__main__':
    default_backend = pyhf.tensorlib
    pyhf.tensorlib = tensorflow_backend(session=tf.Session())

    n_bins = 5
    binning = [n_bins, -0.5, n_bins + 0.5]
    data = [120.0] * n_bins
    bkg = [100.0] * n_bins
    bkgerr = [10.0] * n_bins
    sig = [30.0] * n_bins
    source = {
        'binning': binning,
        'bindata': {
            'data': data,
            'bkg': bkg,
            'bkgerr': bkgerr,
            'sig': sig
        }
    }

    pdf = hepdata_like(source['bindata']['sig'],
                       source['bindata']['bkg'],
                       source['bindata']['bkgerr'])
    data = source['bindata']['data'] + pdf.config.auxdata

    pyhf.runOnePoint(1.0, data, pdf,
                     pdf.config.suggested_init(),
                     pdf.config.suggested_bounds())

    # Reset backend
    pyhf.tensorlib = default_backend

Traceback:

Traceback (most recent call last):
  File "/home/mcf/anaconda3/envs/pyhf/lib/python3.6/site-packages/scipy/optimize/slsqp.py", line 380, in _minimize_slsqp
    fx = float(np.asarray(fx))
TypeError: float() argument must be a string or a number, not 'Tensor'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "runOnePoint_example.py", line 36, in <module>
    pdf.config.suggested_bounds())
  File "/home/mcf/Code/GitHub/pyhf/pyhf/__init__.py", line 412, in runOnePoint
    pdf, init_pars, par_bounds))
  File "/home/mcf/Code/GitHub/pyhf/pyhf/__init__.py", line 380, in generate_asimov_data
    loglambdav, asimov_mu, data, pdf, init_pars, par_bounds)
  File "/home/mcf/Code/GitHub/pyhf/pyhf/optimize/opt_scipy.py", line 26, in constrained_bestfit
    method='SLSQP', args=(data, pdf), bounds=par_bounds)
  File "/home/mcf/anaconda3/envs/pyhf/lib/python3.6/site-packages/scipy/optimize/_minimize.py", line 495, in minimize
    constraints, callback=callback, **options)
  File "/home/mcf/anaconda3/envs/pyhf/lib/python3.6/site-packages/scipy/optimize/slsqp.py", line 382, in _minimize_slsqp
    raise ValueError("Objective function must return a scalar")
ValueError: Objective function must return a scalar

From the traceback it can be seen that opt_scipy is getting used instead of opt_tflow.

Checklist

Run git fetch to get the most up to date version of master
Searched through existing Issues to confirm this is not a duplicate issue
Filled out the Description, Expected Behavior, Actual Behavior, and Steps to Reproduce sections above or have edited/removed them in a way that fully describes the issue

check deploy secrets in .travis.yml

might need to redo secret env vars for new home under diana-hep

increase coverall tolerance to avoid false positives

Description

coveralls fails if the test coverage drops by even 0.01% which usually just happens because we removed some code or something, but it makes it hard to assess a PR on first glance due to red crosses. It should be possible for the test to only fail if the drop is larger than say 1%

create proper test fixtures

Many of the tests have large blocks of setup code that could be shared as proper @pytest.fixtures

https://docs.pytest.org/en/latest/fixture.html

Make sure pipenv-based development workflow works

Description

We support a number of extras like the various tensorbackends, but as far as I know none of them are mutually exclusive and all of them are needed for running the unit tests.

We should therefore make sure pip install -e[develop] installs all necessary packages to run the tests. I think also pytest-benchmark is missing (@kratsg mentioned that)

Add docstrings

The classes and methods of pyhf need docstrings for documentation of the code.

In anticipation of using Sphinx for the docs, the doc strings should follow something along the lines of PyTorch's style (maybe dipping into TensorFlow's style guide at times too). For an example, c.f. PyTorch's Bernoulli distribution.

Make Issue and Pull Request Templates

Having uniform templates for feature requests, bug reports, feature additions, and bug patches will make things easier for maintainers to parse and organize quickly.

Bug Report: Understand and fix TensorFlow scaling behavior

When benchmarking the performance of the TensorFlow backend and optimizer, the performance decreases (run time of test increases) with the number of iterations performed. This should not be happening and needs to be understood and fixed.

This has been noticed in Issue #77.

Expected Behavior

The performance of the backend should be independent of the number of iterations, and should be distributed about some central value for a particular configuration of a fit.

Actual Behavior

The performance is dependent on the number of iterations at a benchmark point

`n_runs=5`	`n_runs=8`

Steps to Reproduce

Enable the TensorFlow backend in tests/test_benchmark.py and run

pytest --benchmark-sort=mean --benchmark-histogram=benchmark_tf tests/test_benchmark.py

Checklist

Run git fetch to get the most up to date version of master
Searched through existing Issues to confirm this is not a duplicate issue
Filled out the Description, Expected Behavior, Actual Behavior, and Steps to Reproduce sections above or have edited/removed them in a way that fully describes the issue

create benchmarking code

we'll be interested in scaling behaviour w.r.t number of bins / channels / systematics / etc.. as a start it should not be too hard to measure e.g. pdf evaluation time for a simple hepdata like model

e.g. this code snippet runs sets up a 2-bin likelihood, and we can just generate N-bin likelihoods with a easy loop that spits outs these JSON specs

@matthewfeickert maybe this is something for you?

source = {
  "binning": [2,-0.5,1.5],
  "bindata": {
    "data":    [120.0, 180.0],
    "bkg":     [100.0, 150.0],
    "bkgerr":     [10.0, 10.0],
    "sig":     [30.0, 95.0]
  }
}

from pyhf.simplemodels import hepdata_like
pdf  = hepdata_like(source['bindata']['sig'], source['bindata']['bkg'], source['bindata']['bkgerr'])
data = source['bindata']['data'] + pdf.config.auxdata

#now the call we want to benchmark:
pdf.logpdf(pdf.config.suggested_init(), data)

the timeit module will be useful for this

https://docs.python.org/2/library/timeit.html

add pypi packaging

we want to be able to pip install pyhf

allow custom setting of POI

right now we hardode the POI (the parameter associated to the only normfactor in the model) but in general there are multiple normfactors and we should allow annotating the model to select which pars are the POI

what's the best way to approximate continuous poisson distr

right now we use gaussians, but maybe there is sth better for lower counts

shapesys breaks in pytorch

for some reason models with shapesys break in pytorch (infinite dim counting loop).. good test case is hepdata_like(...)

Back-end submodule structuring should be less convoluted?

Description

The current way of importing backends is a little redundant:

from pyhf.tensor.numpy_backend import numpy_backend
from pyhf.tensor.pytorch_backend import pytorch_backend
from pyhf.tensor.tensorflow_backend import tensorflow_backend
from pyhf.tensor.mxnet_backend import mxnet_backend

Something more pythonic would be

from pyhf.tensor.backends import numpy_backend, pytorch_backend, tensorflow_backend, mxnet_backend

Or is there something technically difficult about doing it this way?

implement optimizers that work with TF backend

now that we have the backend, we want to do actual fitting and interval estimation

PyPI and Zenodo have stopped updating

I'm not sure why, but at the moment we're on version 0.0.7, but PyPI is on v0.0.5 and Zenodo is on v0.0.4.

This isn't urgent, but @lukasheinrich can you look into this?

Add Zenodo DOI

@lukasheinrich as the repo is under your account can you enable third-party access for Zenodo (if you haven't already) and then create a DOI for the project?

Notebooks failing as result of no 'auxdata' attribute

Issue

While testing the Jupyter notebooks in Binder it was noticed that they fail in the prep_data() function as a result of the line

data = source['bindata']['data'] + pdf.auxdata

throwing an error as pdf (defined at pdf = hfpdf(spec)) does not have an 'auxdata' attribute.

Stack Trace

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-133b58fcb35a> in <module>()
     10 }
     11 
---> 12 d,pdf = prep_data(source)
     13 init_pars = pdf.config.suggested_init()
     14 par_bounds = pdf.config.suggested_bounds()

<ipython-input-2-b59dcf407366> in prep_data(source)
     30     }
     31     pdf  = hfpdf(spec)
---> 32     data = source['bindata']['data'] + pdf.auxdata
     33     return data, pdf

AttributeError: 'hfpdf' object has no attribute 'auxdata'

Fix

This can be fixed by by the following line replacement

data = source['bindata']['data'] + pdf.config.auxdata

write HistFactory XML + ROOT <-> JSON Importer/exporter

we should be able to run something like hist2workspace but such that it dumps a JSON which can be fed to the pyhf.hfpdf ctor

implement shared shapesys

unclear how ROOT implementation deals with non-compatible shapesys definitions that share a name, e.g. defs that have different histos of rel uncertainty per bin, but same name.. maybe this is undefined behaviour anyways and we can ignore that ?

speed up CI tests (do we need all conda packages?)

By using Conda, unfortunately the setup phase of the CI jobs have become a bit slower than without conda, maybe we can look into speeding them up again by checking whether we need all the packages that we install during CI

Understand differences in hypotest() test statistic across different backends

As noted in Issue #77 when running ~~pyhf.runOnePoint()~~ pyhf.utils.hypotest() the pyhf backends do not all agree with regards to the value of the test statistic, q_μ, that they return for the same model. Specifically, PyTorch returns a test statistic that seems to be consistently smaller then the NumPy backend and TensorFlow. This needs to be investigated and understood/fixed.

First items:

Plot the differences in the test statistic for the backends as a function of the number of nuisance parameters (bins in this specific case) in the fitparameters (bins in this specific case) in the fit
Test and validate the unconstrained_bestfit() method
Test and validate the constrained_bestfit() method

normalize tensorlib behaviour in `.sum(...)`

there are slightly different sematntics between pytorch and numpy for sums that result in a scalar. I notice @pablodecm also did something specific in the TF backend. Need to check what exactly the semantics are in numpy

In [2]: import pyhf.tensor.numpy_backend
In [3]: import pyhf.tensor.tensorflow_backend
In [4]: import pyhf.tensor.pytorch_backend
In [6]: backends = [pyhf.tensor.numpy_backend.numpy_backend(), pyhf.tensor.tensorflow_backend.tensorflow_backend(), pyhf.tensor.pytorch_backend.pytor
   ...: ch_backend()]
In [8]: for b in backends: print b.sum(b.astensor([1,2,3])).shape
()
()
(1L,)

Update schema to separate modifiers out

Need to group definitions of modifiers/data into different definitions as each modifier expects something different with data. Something like

oneOf:
- {$ref: '#/definitions/modifier_definitions/histosysdata'}
- {$ref: '#/definitions/modifier_definitions/normsysdata'}
- {$ref: '#/definitions/modifier_definitions/shapesysdata'}

is desired. This was raised in #113 and is related to #105.

Complete XML/ROOT Import to all supported modifieres

Description

The XML/ROOT import code currently only support OverallSys and NormFactor but by now we support more modifiers natively in pyhf. We should thus extend this code

https://github.com/diana-hep/pyhf/blob/master/pyhf/readxml.py

Implement Barlow-Beeston for partially analytic profiling

Description

the profiling across the gammas can be analytically solved

https://github.com/root-project/root/blob/4fa59f4fb80fd914cb84144e5218e55397810c01/roofit/histfactory/inc/RooStats/HistFactory/RooBarlowBeestonLL.h

https://github.com/root-project/root/blob/4cac5a12f98eebc39e9b9888ab6b11b40cddf09d/roofit/histfactory/src/HistFactorySimultaneous.cxx#L127

check numerical stability (switch to logpdf)

right now we use the actual pdf product and then take the log.. probably this leads to instabilities for very many bins.. should switch to logpdf

make test-statistics tensor-aware

Description

we want to be able to do e.g. tf.Session().run(qmu)

Define API pdf sampling via e.g. probabilistic frameworks like edward

Description

We want to have the ability to sample from the pdf. A nice way to do this is via native probprog frameworks like Edward that hook in somewhat natively into tensor backends (not sure if there are similar projects for PyTorch, MXnet @cranmer ?). Not clear to me yet how to do this cleanly across numpy/TF/PyTorch/MXnet

For reference I added this super-simplified notebook to show how to sample sth like

p(n,a | alpha) = Pois(n |nu(alpha) ) * Gaus(a | alpha)

which is the core structure of HF right now

https://github.com/diana-hep/pyhf/blob/master/examples/experiments/edwardpyhf.ipynb

optimize loops in hfpdf

At various stages, python loops are used in the hfpdf class that could feasibly be converted into more efficient numpy operations

For example:

map-reduce of poissons (series of poisson (n,lambda)-pairs reduces by multiplication)
https://github.com/lukasheinrich/pyhf/blob/master/pyhf/__init__.py#L299
https://github.com/lukasheinrich/pyhf/blob/master/pyhf/__init__.py#L303
histogram interpolation

right now python loop where for each bin we apply the same function .. can be batched
https://github.com/lukasheinrich/pyhf/blob/master/pyhf/__init__.py#L220

@matthewfeickert might look into this

avoid uncessary copying

probably should use np.asarray vs np.array casts

Understand Python 3.5 Failures

In Python 3.5 only the result of resetting the TensorFlow backend graph and session after each run identified in Issue #102 results in a failure of tests. This can be seen in the Python 3.5 output in Travis CI build #333.2 for PR #92.

A brief excerpt from the trace follows:

tests/test_benchmark.py ...FFF...                                        [ 29%]
tests/test_import.py F                                                   [ 32%]
tests/test_notebooks.py .                                                [ 35%]
tests/test_optim.py ...                                                  [ 45%]
tests/test_pdf.py FFFFFFF                                                [ 67%]
tests/test_tensor.py ...                                                 [ 77%]
tests/test_validation.py FFFFF.F                                         [100%]
=================================== FAILURES ===================================
_____________________ test_runOnePoint[tensorflow-10_bins] _____________________
self = <tensorflow.python.client.session.Session object at 0x7f06b412c160>
fn = <function BaseSession._do_run.<locals>._run_fn at 0x7f0699f82ea0>
args = ({b'concat:0': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=float32)}, [b'strided_slice_8:0'], [], None, None)
message = 'Input is not invertible.\n\t [[Node: MatrixInverse = MatrixInverse[T=DT_FLOAT, adjoint=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Reshape_1)]]'
m = <_sre.SRE_Match object; span=(27, 50), match='[[Node: MatrixInverse ='>
    def _do_call(self, fn, *args):
      try:
>       return fn(*args)
../../../miniconda/envs/test-environment/lib/python3.5/site-packages/tensorflow/python/client/session.py:1327: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

As this does not occur for Python 3.6 it would be worth understanding why. Additionally, asking if it is worth supporting Python 3.5 given that most Python 3 users are following releases and have Python 3.6 installed.

generate validation data from ROOT (based on validation/data/*.json)

add pythonic API for spec building

Description

we want a API similar (or best the same as) the ROOT HistFactory python bindings to iteratively build up a JSON spec for pyhf

http://ghl.web.cern.ch/ghl/html/HistFactoryDoc.html

  #!/usr/bin/env python

  #
  # A pyROOT script demonstrating
  # an example of writing a HistFactory
  # model using python
  #
  # This example was written to match
  # the example.xml analysis in
  # $ROOTSYS/tutorials/histfactory/
  #
  # Written by George Lewis
  #


  def main():

  try:
          import ROOT
  except:
          print "It seems that pyROOT isn't properly configured"
          return

  """
  Create a HistFactory measurement from python
  """

  InputFile = "./data/example.root"

  # Create the measurement
  meas = ROOT.RooStats.HistFactory.Measurement("meas", "meas")

  meas.SetOutputFilePrefix("./results/example_UsingPy")
  meas.SetPOI("SigXsecOverSM")
  meas.AddConstantParam("Lumi")
  meas.AddConstantParam("alpha_syst1")

  meas.SetLumi(1.0)
  meas.SetLumiRelErr(0.10)
  meas.SetExportOnly(False)

  # Create a channel

  chan = ROOT.RooStats.HistFactory.Channel("channel1")
  chan.SetData("data", InputFile)
  chan.SetStatErrorConfig(0.05, "Poisson")

  # Now, create some samples

  # Create the signal sample
  signal = ROOT.RooStats.HistFactory.Sample("signal", "signal", InputFile)
  signal.AddOverallSys("syst1",  0.95, 1.05)
  signal.AddNormFactor("SigXsecOverSM", 1, 0, 3)
  chan.AddSample(signal)


  # Background 1
  background1 = ROOT.RooStats.HistFactory.Sample("background1", "background1", InputFile)
  background1.ActivateStatError("background1_statUncert", InputFile)
  background1.AddOverallSys("syst2", 0.95, 1.05 )
  chan.AddSample(background1)


  # Background 1
  background2 = ROOT.RooStats.HistFactory.Sample("background2", "background2", InputFile)
  background2.ActivateStatError()
  background2.AddOverallSys("syst3", 0.95, 1.05 )
  chan.AddSample(background2)


  # Done with this channel
  # Add it to the measurement:

  meas.AddChannel(chan)

  # Collect the histograms from their files,
  # print some output, 
  meas.CollectHistograms()
  meas.PrintTree();

  # One can print XML code to an
  # output directory:
  # meas.PrintXML("xmlFromCCode", meas.GetOutputFilePrefix());

  meas.PrintXML("xmlFromPy", meas.GetOutputFilePrefix());

  # Now, do the measurement
  ROOT.RooStats.HistFactory.MakeModelAndMeasurementFast(meas);

  pass


  if __name__ == "__main__":
      main()

Add spec validation on hfpdf.init

Description

we now have a spec, that should allow us to do a lint-type check for validity of the spec. passing that doesn't guarantee correctness, but just that the spec is well-formed.

we might implement a validation routine whose first step would be linting and later we can add other checks (e.g. check same-length for all sample data)

autograd for numpy backend?

https://github.com/HIPS/autograd

Pythonic Specification Generation

I strongly suggest we keep the most stringent JSON spec in #105 as the only JSON-schema we have. A "lite JSON" would probably confuse newcomers and those wishing to use an API. Instead, a more user-friendly, pythonic generation can be built. Something like the following is certainly possible [inspired by how constructions does it]

NPs = [NormSys("JES1"), HistoSys("JES2")]
sample = Sample(
    NormSys("JES1") / Data(.....),
    HistoSys("JES2") / Data(....)
)
chan = "singlechannel" / Channel( "signal" / sample)

then doing something like JSON.dump(chan) would work out of the box, as you can define how to serialize such an object. chan can implement vars(chan) which returns the simple python structure that can be passed into hfpdf -- similar to how argparse::Namespace does it.

Caveat: division does not need to be done. One could just as easily do NormSys("JES1").Data("....") and so on.

tensor-library agnostic refactor

in the end, we want to use e.g. PyTorch, TensorFlow, or just NumPy in a way that is transparent

initial work started in #61

JSON Schema / Spec discussion

As initiated in #104, there are questions raised about the spec and the way forward with two overarching goals in mind:

intuitive and clean for the user
fully-specified and documentable by a schema for an API

There are two main issues raised as described below.

Fully-Specified

A schema like

{
  "singlechannel": {
    "background": {
      "data": [1,2,3,4],
      "mods": [...]
    },
    "signal": {
      "data": [1,2,3,4],
      "mods": [...]
    }
  }
}

is not fully-specified as it contains a dictionary with variable key-names (singlechannel, background, signal). A more fully-specified spec looks like so

[
  {
    "name": "singlechannel",
    "type": "channel",
    "samples": [
      {
        "name": "background",
            "data": [1,2,3,4]
        ],
        "mods": [...]
      },
      {
        "name": "signal",
        "data": [1,2,3,4],
        "mods": [...]
      }
    ]
  }
]

where an array of channels, and samples are specified. This is a first proposal, but still has a nested array which may or may not be useful for many -- and flattening the array is a possibility, through a process of denormalization (see firebase docs).

Intuitive-ness

Currently, modifications are defined as an array

        "mods": [
          {"type": "shapesys", "name": "mod_JES1", "data": [1,2,3,4]},
          {"type": "shapesys", "name": "mod_JES2", "data": [1,2,3,4]},
          {"type": "shapesys", "name": "mod_FlavTag", "data": [1,2,3,4]}
        ]

however, one of the draw-backs is that it makes a user think of each modification as an entire "object". That is, this should define three modification objects, which is not necessarily true. In spirit, a modification refers to a nuisance parameter, such as mod_JES1 along with configurations for that.

first proposal

A first proposal to make this more intuitive was to structure the modifications as a dictionary, with each key name referring to the nuisance parameter that is of interest

        "mods": {
          "mod_JES1": {"type": "shapesys", "data": [1,2,3,4]},
          "mod_JES2": {"type": "shapesys", "data": [1,2,3,4]},
          "mod_flavTag": {"type": "shapesys", "data": [1,2,3,4]}
        ]

A drawback is that we now have configurable dictionary key names, which does not help with JSON Schema / API specification.

second proposal

which separates the nuisance parameter from the actual modifications for a given sample/channel

        "NPs": [
          {"name": "mod_JES1", "mod": {"type": "shapesys", "data": [1,2,3,4]}},
          {"name": "mod_JES2", "mod": {"type": "shapesys", "data": [1,2,3,4]}},
          {"name": "mod_flavTag", "mod": {"type": "shapesys", "data": [1,2,3,4]}},
        ]

MXNet tensor backend

https://mxnet.incubator.apache.org

implement sampling from pdf

Description

We would like to be able to sample from the pdf. For that we need to

sample values for auxiliary measurements
based on aux data, derive poisson rate parameters
sample from poissons

e.g.

pdf = pyhf.pdf(...)
data  = pdf.sample()

express interpolation functions in pure numpy as opposed to np.vectorize

_hfinterp_code0 and _hfinterp_code1 use np.vectorize to do broadcasting. It should be relatively easy to express those in native numpy, which should speed up the code

see numpy docs here:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

assigning to @matthewfeickert

Acquire access to GPU cluster for testing

Access to GPU clusters are needed for performing benchmarks with GPU acceleration. While access to Amir Farbin's personal GPU cluster is available it would also be good to have something with wider support. In the 2018-02-28 CERN IML meeting Maxime Reis advertised that CERN's TechLab has GPU clusters available with support. We can follow up on this and see if we can use it for testing.

scikit-hep / pyhf Goto Github PK

pyhf's Introduction

scikit-hep: metapackage for Scikit-HEP

Project info

scikit-hep package

Installation

Package version and dependencies

pyhf's People

Contributors

Stargazers

Watchers

Forkers

pyhf's Issues

Description

Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Checklist

Description

Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Checklist

Description

Issue

Stack Trace

Fix

Description

Description

Description

Description

Description

Description

Fully-Specified

Intuitive-ness

first proposal

second proposal

Description

Recommend Projects

Recommend Topics

Recommend Org

`scikit-hep`: metapackage for Scikit-HEP