gradhep / center Goto Github PK

The center for all things differentiable analysis!

License: Apache License 2.0

Ruby 1.02% Makefile 0.83% Dockerfile 0.45% Shell 2.07% Python 1.45% Smarty 0.91% HTML 9.80% Jupyter Notebook 73.15% JavaScript 5.50% SCSS 4.81%

center's People

Contributors

Stargazers

Watchers

Forkers

phinate

center's Issues

Demystifying fixed-point differentiation

The ‘magic sauce’ that allows neos to differentiate through the fitting process is based on an implementation of fixed-point differentiation. Αs I understand it, the gist of how this works is that if a function has a fixed point, i.e. f(x) = x for some x (e.g. a minimize(F, x_init) routine evaluated at x_init = minimum of F), then one can evaluate the gradients through a second pass of the function, evaluated close to the fixed point.

It would be nice to consolidate some thoughts (perhaps in a notebook) on the technical details for those interested. The specific algorithm used in neos can be found in section 2.3 of this paper (two-phase method).

[fastpages] Automated Upgrade

Opening this issue will trigger GitHub Actions to fetch the lastest version of fastpages. More information will be provided in forthcoming comments below.

Road-map and milestones

Scope

From the discussions (particularly the library versus framework in #7) I'm starting to grasp the scope of what we mean by "differentiable analysis". I had thought the aim was to package Neos/Inferno into a loss-function to allow it to be used easily in HEP analyses. It seems, though, that the aim more broadly is to make the full analysis-chain from reco. ntuples to result be fully differentiable.

Milestones

Given the scale of this work, I'm wondering if might be best to break the effort down into milestones which gradually work backwards from result to ntuple skimming. This would offer a clearer scope for each stage of development and allow us to constantly monitor performance in a realistic benchmark analyses (rather than differentiating ALL TEH THINGS and finding out that it doesn't beat a more traditional approach).

An example could be:

Differentiably optimise a 1D cut on a summary stat (e.g. @alexander-held 's example)
Differentiably optimise the binning of the whole summary stat.
Differentiably optimise the summary stat (e.g. Inferno/Neos) on set training samples
Differentiably optimise the skimming of the training samples

This would allows us to continually evaluate the gain in sensitivity with every step. to help convince ourselves and other of the advantage of DA.

Framework versus library

At the end of the day, we want other researchers to use DA for official analyses. These are already time-consuming affairs and on top of that we are then expecting the people involved to completely change their approach to something with which they will are most likely unfamiliar. Whilst every analysis will be slightly different, I think it make the transition much smoother (pun mildly intended) if we were to offer a framework with a set series of steps to help walk new researchers through the process of making the analysis differentiable.

This framework could of course call our own internal libraries, but something with an intuitive API which abstracts away the technicalities and provides a recommended workflow, would (I think) be much more appealing than instead being provided with a large library of new methods and classes and being expected to piece everything together with a few limited examples.

As the community becomes more au fait with DA, then the importance of our particular framework may diminish as various groups either build their own frameworks, or write their own libraries in place of ours. I think that getting to this point would be good, however reaching it requires a critical mass of experience within the community, which an introductory framework could help accelerate. An example might be how Keras made DL vastly more accessible to new practitioners, but as community knowledge has grown, people are now moving to work with the more low-level libraries that Keras previously abstracted.

White paper organization

We agreed at the kick off meeting to form a white paper that highlights this fairly new analysis paradigm for people new to it.

I think this may end up becoming two papers: the ‘whys’ (initial motivations, existing efforts) and the ‘hows’ (evaluation and comparisons of the methods implemented with common tools in a realistic setting).

What should we include (in paper 1?) :)

Smooth histograms — which methods are best?

Following up on the discussion that took place during the kick-off meeting, an interesting point arose on the issue of which methods are best used to approximate a set of binned counts.

In the case where one has a neural network-based observable, one could apply the softmax function exp(nn / T) / sum(exp(nn / T)) ( T = temperature hyperparam) to the nn output, like one would do with a classification problem. This essentially puts every event in all bins, but weights each bin with the corresponding nn output (normalized to 1). The histogram is then the elementwise sum of the softmaxed output over all events.

An alternative to this is to take an approach using a kernel density estimate (kde), which is defined by instantiating a distribution (kernel) centered around each data point (e.g. standard normal distribution) and averaging their contributions. The smoothness of the resulting estimate is controlled with a 'bandwidth' hyperparameter, which scales the widths of the kernels. Once you choose a bandwidth, you can get 'counts' by integrating the pdf over a set of intervals (bins). One can equivalently use the cumulative density function.

Revisiting the case with the nn-based observable, one could then make the network output one value (regression net), and then define a kde over the set of outputs from a batch of events. The question is then: is there any clear advantage in expressivity or principle from using a kde, softmax, or other methods?

Another interesting question is where these approaches break down, e.g. using a kde in this way seems to be very senstitive to the bandwidth.

It's also worth noting that a kde may make more sense for general use as a candidate for smooth histogramming, but the case of the nn observable may be more nuanced.

Automatic differentiation for data analysis

Hi all,

I have been pointed out to this working group by a college. First some background: I am a researcher in Lattice QCD. I have found that the typical data analysis that we usually done is greatly simplified using automatic differentiation. The most relevant reference is:

Automatic differentiation for error analysis of Monte Carlo data
https://inspirehep.net/literature/1692759

The basic idea is to use AD to keep track of the derivatives of your data analysis results with respect to the input. This is al that is needed for error propagation, and turns out to be useful for sensitivity analysis (i.e. how much does my result depends of this parameter that I have fixed). A complete implementation of these ideas can be found in:

https://gitlab.ift.uam-csic.es/alberto/aderrors.jl

This package (in Julia) keeps track of the derivatives w.r.t the input in your codes. It is quite general and athough the main propose is error analysis in Lattice QCD, I am convinced that the ideas that are explained in the reference and implemented there might have many other applications. In my case it has changed how I do data analysis. An overview of the capabilities of the software (it only touches the basics) is available in a recent proceedings contribution:

Automatic differentiation for error analysis
https://inspirehep.net/literature/1837727

I would be happy to join this working group. I am also happy to give an overview of how we have been using these techniques for the past years, and of course, very interested to learn what you have been doing.

Thanks!

Differentiable analysis: from scratch vs existing methods

An interesting point from the kickoff meeting was that there are broadly two directions that any new tools should try to consider:

Creating a differentiable workflow from scratch
Interfacing with existing approaches

It would be very constructive to highlight the considerations one should make for each case :)

E.g. Kyle Cranmer pointed out that to be truly ‘optimal’, invariant of approach, one should learn the true likelihood function instead of objectives like those targeted by INFERNO (inverse fisher info) or neos (p-value from hyp test).

List of HEP primitives to make differentiable

It was pointed out in the IRIS-HEP analysis systems meeting today that it would be good to compile a list of operations in HEP workflows that are not normally differentiable, e.g. cutting and histogramming data.

The (evolving) list — please add below!

Histograms
Cuts
Fitting

Advantages of differentiable analyses

One great aspect of a differentiable analysis workflow is that it allows to use gradient-based methods to optimize the analysis. This might mean for example optimizing a jet pT cut to minimize the uncertainty of a parameter of interest.

There are likely other interesting things that become possible with differentiable workflows. It is worth thinking about what might specifically be useful for HEP.

An example might be the following: An analysis observes a mis-modelling in the form of a slope in the ratio of data vs model prediction as a function of jet pT. A question might be "is this related to the high lepton pT cut we apply?". This could be studied by scanning over lepton pT cuts and evaluating the data / model slope, or more efficiently by evaluating the gradient wrt. the lepton pT cut.

It would be especially interesting if it turns out that there are things that only become possible with a differentiable workflow (maybe when brute-force parameter scans would be computationally prohibitive).