Giter Club home page Giter Club logo

rootinteractive's Introduction

RootInteractive

Code for the interactive sttistical aggregation and visualisation of multidimensional data in ROOT or native Python formats (Panda, numpy).

Support for ROOT data structures:

  • TTree and TTreeFormula, aliases ...
  • TFormula, or any static Root/AliRoot functions.
  • RDataFrame <-> awkward - work in progress

Root and PyRoot (AliRoot/O2) data structures could be used as input data sources. However, the code also works with pandas only, without the need to have the ROOT package installed. Internally, these data structures are converted into the Bokeh CDS (ColumnDataSource) or in our RootInteractive CDS for the NDimensional histograms, projections and aggregated information.

RootInteractive content:

  • Interactive, easily configurable visualisation of non-binned and binned data.
  • Interactive n-dimensional histogramming/projection and derived aggregated information extraction
  • Client/server application Jupyter, Bokeh
  • Standalone client application - (Bokeh Standalone Dashboard)
  • Lossy and lossless data compression (server --> client)
  • ROOT and RDataFrame tools and interfaces

Interactive visualization, histogramming and data aggregation in N-dimensions on client

The figure array declaration is used as an argument in bokehDrawSA to create an array of figures/graphs/scatter plots/ Unbined or binned (Ndimension histogram and derived statistics/projection) bokeh data sources and derived variables and aggregated statistics can be used for drawing.

The declarative programming used in bokehDrawSA is a type of coding where developers express the computational logic without having to programme the control flow of each process. This can help simplify coding, as developers only need to describe what they want the programme to achieve, rather than explicitly prescribing the steps or commands required to achieve the desired result.

The interactive visualization is declared in the 6 arrays as ine the example bellow

bokehDrawSA.fromArray(df, None, figureArray, widgetParams, layout=figureLayoutDesc, tooltips=tooltips, parameterArray=parameterArray,
                          widgetLayout=widgetLayoutDesc, sizing_mode="scale_width", nPointRender=300,
                           aliasArray=aliasArray, histogramArray=histoArray,arrayCompression=arrayCompression)

figureArrray - figure parameterization

  • see READMEfigure
  • Defining scatter/histogram/derived figures using input data source
  • Example declaration of the figure from data source with columns ABCD
    figureArray = [
    [['A'], ['A*A-C*C'], {"size": 2, "colorZvar": "A", "errY": "errY", "errX":"0.01"}],
    [['A'], ['C+A', 'C-A', 'A/A']],
    [['B'], ['C+B', 'C-B'], { "colorZvar": "colorZ", "errY": "errY", "rescaleColorMapper": True}],
    [['D'], ['(A+B+C)*D'], {"colorZvar": "colorZ", "size": 10, "errY": "errY"} ],
    [['D'], ['D*10'], {"errY": "errY"}],
    {"size":"size", "legend_options": {"label_text_font_size": "legendFontSize"}}
    ]

histogramAray - interactive histogramming parameterization and examples

  • Defining interactive ND histogramsand derived statistics, updated based on the user selection, resp. by parametriz
  • see READMEhistogram
  • Example of creating a 3D histogram showing mean, sum and standard in the projection with colour code in the second dimension
    histoArray = [
            {"name": "histoABC", "variables": ["(A+C)/2", "B", "C"], "nbins": [8, 10, 12], "weights": "D", "axis": [0], "sum_range": [[.25, .75]]},
        ]
    figureArray = [
            [['bin_center_1'], ['mean']],
            [['bin_center_1'], ['sum_0']],
            [['bin_center_1'], ['std']],
            {"source": "histoABC_0", "colorZvar": "bin_center_2", "size": 7}
    ]

aliasArray alias/client side function parameterization

  • see READMEaliase
  • javascrript function with which you can define derived variables on the client. Used e.g. to parameterise the selection, histogram weights, efficiencies
  • newly created variables can be used in histogramArray, figureAray, aliasArray
  • Dependency trees to ensure consistency of aliases and the correct order of evaluation of derived variables and use in visualisation.
  • Example declaration:
        aliasArray = [
            # They can also be used as selection (boolen)  used e.g. for histogram weights
            {
                "name": "C_accepted",
                "variables": ["C"],
                "parameters": ["C_cut"],
                "func": "return C < C_cut"
            },
            # User-defined JS columns can also be created in histograms by specifying the context (CDS) parameter
            {
                "name": "efficiency_A",
                "variables": ["entries", "entries_C_cut"],
                "func": "return entries_C_cut / entries",
                "context": "histoA"
            },
            # Shorthand notation - only for scalar functions
            ("effC", "entries_C_cut / bin_count", "histoAC"),
        ]

widgetLayout - layout of the figures

  • READMElayout
  • Layout declared by and dictionary(tabs)/array of figure IDs (index or name ID)
  • Properties per row/simple layout/tab layout can be specified. More local properties have priority.
  • Example declaration:
    layout = {
        "A": [
            [0, 1, 2, {'commonX': 1, 'y_visible': 1, 'x_visible':1, 'plot_height': 300}],
            {'plot_height': 100, 'sizing_mode': 'scale_width', 'y_visible' : 2}
            ],
        "B": [
            [3, 4, {'commonX': 1, 'y_visible': 3, 'x_visible':1, 'plot_height': 100}],
            {'plot_height': 100, 'sizing_mode': 'scale_width', 'y_visible' : 2}
            ]
    }

layout - layout of the widgets

  • see READMElayoutWidget
  • Layout declared by and dictionary(tabs)/array of figure IDs (index or name ID)
  • Properties per row/simple layout/tab layout can be specified. More local properties have priority.
  • Example declaration:
    • simple layout
       widgetLayoutKine=[
           ["dca0","tgl","qPt","ncl"],
           ["dEdxtot","dEdxmax","mTime0"], 
           ["hasA","Run","IR","isMC"], 
           {'sizing_mode': 'scale_width'}
       ]
    • composed layout:
      widgetLayoutDesc={
          "Select":widgetLayoutKine,
          "Histograms":[["nbinsX","nbinsY", "varX","yAxisTransform"], {'sizing_mode': 'scale_width'}],
          "Legend": figureParameters['legend']['widgetLayout'],
          "Markers":["markerSize"]
      }

arrayCompresion -

Machine learning part - work in progrees

  • Wrappers for decision trees and Neural Net
  • Provides interface for the reducible, irreducible errors, proability density function
  • Local linear forest, resp. local kernel regression

RootInteractive Information

Tutorials

ALICE ROOTIntteractive tutorial

Sevearal ALICE use case (detector calibration, QA/QC)

Galery material in the ALICE agenda () and document server

rootinteractive's People

Contributors

bulukutlu avatar ehellbar avatar martinkroe avatar miranov25 avatar pl0xz0rz avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rootinteractive's Issues

bokehTools sliders not interactive after new bokeh interactive create

Notebook Example

  • first figure:
vars="drphiSector2:drphiSector4:drphiSector6:drphiSector7:drphiSector9:drphiSector16:drphiSector20:drphiSector30"
sliders="meanTRDCurrent(0,0.5,0.05,0,1):H2O(0,5,0.2,0,5):deltaTRDCurrentNorm(0.0,10.,0.1,0,10)"
plot2=bokehDrawTree(dfsplit.sample(200),"time>0","drphiMean",vars,"H2O",sliders,p3,ncols=3, commonX=1, commonY=1,tooltip=tooltips,size=5)
  • second figure:
vars="drphiSector2:drphiSector4:drphiSector6:drphiSector7:drphiSector9:drphiSector16:drphiSector20:drphiSector30"
varserr="drphiSector2err:drphiSector4err:drphiSector6err:drphiSector7err:drphiSector9err:drphiSector16err:drphiSector20err:drphiSector30err"
sliders="meanTRDCurrent(0,0.5,0.05,0,1):H2O(0,5,0.2,0,5):deltaTRDCurrentNorm(0.0,10.,0.1,0,10)"
plot2=bokehDrawTree(dfsplit.sample(200),"time>0","drphiMean",vars,"H2O",sliders,p3,ncols=3, commonX=1, commonY=1,tooltip=tooltips,size=5,errX="drphiMeanErr",errY=varserr)

Creating second figure, sliders in figure one stop to work

Definition of bokeh markers missing underscore

Hello Marian,
it seems like the definition of the markers are missing an underscore for few of them in bokehTools.py:
I think it should be:
bokehMarkers = ["square", "circle", "triangle", "diamond", "square_cross", "circle_cross", "diamond_cross", "cross", "dash", "hex", "inverted_triangle", "asterisk", "square_x", "x"]

You can check the exact names for the markers using
from bokeh.plotting import markers
markers()

Regards
Matthias

Tensorflow 2.5+ compatibility

Hello Marian,
when using tf > 2.4, keras is directly invoked from tensorflow, i.e. all
from keras import ...
have to be replaced by
from tensorflow.keras import ...
in RootInteractive/MLpipeline/NDFunctionInterface.py or the tf version has to be fixed at 2.4. I guess that there are more urgent issues, so I just would like to point it out to be aware of that.
Regards,
Ernst

Some pytest interactive drawing are not properly responding

In current Root Interactive some interactive plots are not reposnding

InteractiveDrawing/bokeh/test_bokehDrawSAArray_fromTTree.html - OK
InteractiveDrawing/bokeh/test_BokehDrawArrayWidgetNoScale.html - not responding
InteractiveDrawing/bokeh/test_BokehDrawArrayWidget.html - not responding
InteractiveDrawing/bokeh/test_bokehDrawSAArray.html - not responding
InteractiveDrawing/bokeh/test_bokehDrawSAOldInterface.html - not responding
InteractiveDrawing/bokeh/test_BokehRDrawArray_DrawSAfromArray.html - OK/
InteractiveDrawing/bokeh/test_BokehDrawArray_test_DrawfromArray.html - OK

RootInteractive - with root_numpy not supported in ROOT v6.20

RootInteractive - with root_numpy not supported in ROOT v6.20

Until recently we were using root_numpy and root_pandas. Since we moved to root6-20xxx , this does not work anymore
RootInteractive test is failing.
Creating issue in RootInteractive

See discussion:
https://root-forum.cern.ch/t/lxplus-root-numpy-not-supported-for-root-6-18-00/35585

you probably need to recompile root_numpy against the new ROOT version that is active on lxplus.
(or activate a ROOT version from CMS sw stack that is compatible w/ your root_numpy version)
alternatively, you could use root2npy 7 or a combination of uproot 1+numpy.

Optimization of makePdfMaps implementation

Follow up of issue #37 and PR #40

Observed time to make PDF map from histogram ~ 5 minutes (180 x 33 x 40 x 8 )

To be profiled and optimized.

The compressed input histograms can be found here:

(base) hellbaer@lxir128:/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation$ ls /lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/*gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_derMeanDistR_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_derMeanSC_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucDistRDiff_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucDistR_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucDistRPred_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucSC_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip

Inside each file there is a histogram:
{"H": H, "axes": axis, "name": histoInfo[0], "varNames": varList}

They can be opened with

 with gzip.open(inputFileStr, 'rb') as inputFile:
    histo = pickle.load(inputFile)

Integrate dynamic down-sampling in number of points in bokeh visualization + bug fix in preserving selected points in interactive selection

In routine makeJScalback:

RootInteractive/InteractiveDrawing/bokeh/bokehTools.py:def makeJScallback(widgetDict, **kwargs)
user defined widgets (slider,ranges ...) defined subset of the data to visualize.
Speed of the visualization depends on the number of points to render. Having more than critical ammount (broswer dependent)
visualization on client get slower.
Event for small data sample (e.g 100000 points) it could be already problematic.

Solution:

  • define threshould on number of point to be shipped to client e.g NpointsMax=(10^6)
  • define number of points to be rendered on client NrenderMax=10^5
    -- downsampling on client to be done in makeJScallback defined java script

Downsampling algorithm:

  • starting from random subset
  • sophistcation (e.g keep previously rendered) to be implemented later

Bug fix in selection

  • in maeJScalback - selected indeces are not reasigned
  • we should keep selection also after jscalback query

Histogram derived information - efficiency/integral/mean/rms in user derived ranges resp. quantilles

Functionality 1D, 2D, ND histogram:

user interface should use existing histogram, additional statistics will be calculated and used later following naming conventions

Examples:

  • {"name": "histoA", "variables": ["A"],"nbins":20, "quantiles":[0.05,0.5,0.95], "sumRange": [[-5,5]]}

    • quantiles 0.05,0.5 and 0.95 will b added to the stat table
    • stat table common for all histogram - if not calculated NaN to be added
  • {"name": "histoAB", "variables": ["A","B"],"nbins":20, "quantiles":[0.05,0.5,0.95], "sumRange": [[-5,5]], "axis":1}

  • export new 1D CDS with columns defined by naming convention

  • quantileArray =[], rangeArray=[],

    • quantile and ranges could use fixed numbers (floats) or parameters (string indicated widgets) (controlled in user slider widget)
    • quantile array input are quantiles (0-1) e.g 0.05, 0.1 output linearly interpolated position (not nearest)
    • rangeArray input are ranges output integral and fraction in range linearly interpolated between bins
    • quantiles to be added as _Q
      • for 1D to be added to the statInfo
        • will be added to the stat table
      • for 2D(ND) to be added to new create 1D CDS
        • axis to be specified by number (0,1,2 ....)
        • could be used in following function visualization - using function names
  • exported CDS for 2D, ND can be used i figure arrays

  • exported statInfo for 1D will be used i stat tables

Structures:

  • histogramArray
  • functionArray (array of java script functions)
  • system will create dependencies

Questions:

In the future we can use composition - histograms derived functions to be added to the function array

  • dependencies for derived variables
  • at the begiing we can use linear structure
  • hierarchical structure
  • edit: change name intRange -> sumRange

Bokeh wrappers - add support for categorical data in bokeDraw

New switch factor_mark and factor_cmap to be provided to handle categorical data.

See bokeh user_guide https://docs.bokeh.org/en/latest/docs/user_guide/data.html
section markers:

from bokeh.plotting import figure, show
from bokeh.sampledata.iris import flowers
from bokeh.transform import factor_cmap, factor_mark

SPECIES = ['setosa', 'versicolor', 'virginica']
MARKERS = ['hex', 'circle_x', 'triangle']

p = figure(title = "Iris Morphology")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Sepal Width'

p.scatter("petal_length", "sepal_width", source=flowers, legend_field="species", fill_alpha=0.4, size=12,
          marker=factor_mark('species', MARKERS, SPECIES),
          color=factor_cmap('species', 'Category10_3', SPECIES))

show(p)

Question regarding license

Hey,

I would like to try out your PyTorch histogramdd implementation for a research project. Unfortuntately, there's no license file in your repository which clarifies how this code may (not) be used by others (no license means no rights for others).

Could you let me know if it's possible to copy-paste and modify the code, or if it's intentionally all rights reserved? Or will it even be available soon in PyTorch (I saw your PR)?

Thanks in advance

bokehVisJS3DGraph.ts not copied to site-packages

Hello Marian,

this issue is not critical and basically everything seems to work as it should.

When using the bokeh tools in a jupyter notebook, I do the following:

from bokeh.io import output_notebook
output_notebook()
from RootInteractive.InteractiveDrawing.bokeh.bokehDrawSA import *

Output:

BokehJS 2.2.3 successfully loaded.
Welcome to JupyROOT 6.20/02
x bokehVisJS3DGraph.ts

You already see the issue about bokehVisJS3DGraph.ts.

When switching the order of the imports to

from RootInteractive.InteractiveDrawing.bokeh.bokehDrawSA import *
from bokeh.io import output_notebook
output_notebook()

, one gets

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-eeb3637b94e2> in <module>
     17 from RootInteractive.InteractiveDrawing.bokeh.bokehDrawSA import *
     18 from bokeh.io import output_notebook
---> 19 output_notebook()
...
...
further traceback
...
...
FileNotFoundError: [Errno 2] No such file or directory: '/home/hellbaer/software/miniconda3/lib/python3.6/site-packages/RootInteractive/InteractiveDrawing/bokeh/bokehVisJS3DGraph.ts'

I installed the latest RootInteractive using

python3 -m pip install rootinteractive

(base) hellbaer@hellbaer-Thinkpad:~/software/TPCwithDNN/tpcwithdnn$ pip3 show bokeh rootinteractive
Name: bokeh
Version: 2.2.3
Summary: Interactive plots and applications in the browser from Python
Home-page: http://github.com/bokeh/bokeh
Author: Bokeh Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /home/hellbaer/software/miniconda3/lib/python3.6/site-packages
Requires: typing-extensions, python-dateutil, pillow, tornado, packaging, PyYAML, numpy, Jinja2
Required-by: RootInteractive
---
Name: RootInteractive
Version: 0.0.24
Summary: UNKNOWN
Home-page: https://github.com/miranov25/RootInteractive
Author: Marian Ivanov
Author-email: [email protected]
License: Not defined yet. Most probably similar to ALICE (CERN)  license
Location: /home/hellbaer/software/miniconda3/lib/python3.6/site-packages
Requires: pandas, bqplot, requests, numpy, anytree, scikit-hep, keras, bokeh, beakerx, scipy, qgrid, uproot, ipywidgets, plotly, nb-clean, iminuit, sklearn, runtime, pytest, matplotlib, nbval, forestci, tabulate, tensorflow
Required-by: TPCwithDNN

Cheers,
Ernst

Problem with bokeh transition 1.34 --> 2.0

Pytest starts to fail after transition to bokeh 2.0.

Starting with new python environmnent installed from scratch - bokeh 2.0 is taken.
Bokeh 2.0 changed police for CDS.
Should be fixed as soon as possible.
Using older version of bokeh tests are running properly
Options to fix:

  • Fix the code itself
    • should work for old and new bokeh
  • Restrict bokeh version
    • can be used as a temporary solution

Test failure

(virtualenv3Test) miranov@miranov-Strix-17-GL703GE:~/github/RootInteractive3/RootInteractive/InteractiveDrawing$ pytest
====================================================================================== test session starts ======================================================================================
platform linux -- Python 3.6.9, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home2/miranov/github/RootInteractive3
plugins: nbval-0.9.5
collected 5 items / 2 errors / 1 skipped / 2 selected                                                                                                                                           

============================================================================================ ERRORS =============================================================================================
___________________________________________________________ ERROR collecting RootInteractive/InteractiveDrawing/bokeh/test_Layout.py ____________________________________________________________
bokeh/test_Layout.py:41: in <module>
   test_Draw()
bokeh/test_Layout.py:15: in test_Draw
   bokehFigure=bokehDraw(df, "A>0", "A", "A:B:C:D", "C", "slider.A(0,100.1,0,1),slider.B(0,100,100,100,300)", None, layout=testLayout)
bokeh/bokehDraw.py:106: in __init__
   self.updateInteractive("")
bokeh/bokehDraw.py:295: in updateInteractive
   self.bokehSource.data = newSource.data
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/has_props.py:273: in __setattr__
   super().__setattr__(name, value)
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/property/descriptors.py:969: in __set__
   raise ValueError(_CDS_SET_FROM_CDS_ERROR)
E   ValueError: 
E   ColumnDataSource.data properties may only be set from plain Python dicts,
E   not other ColumnDataSource.data values.
E   
E   If you need to copy set from one CDS to another, make a shallow copy by
E   calling dict: s1.data = dict(s2.data)
_______________________________________________________ ERROR collecting RootInteractive/InteractiveDrawing/bokeh/test_bokehDrawArray.py ________________________________________________________
bokeh/test_bokehDrawArray.py:47: in <module>
   test_DrawfromArray()
bokeh/test_bokehDrawArray.py:35: in test_DrawfromArray
   fig=bokehDraw.fromArray(df, "A>0", figureArray,"slider.A(0,100,0,0,100)",tooltips=tooltips, layout=figureLayout)
bokeh/bokehDraw.py:165: in fromArray
   self.updateInteractive("")
bokeh/bokehDraw.py:295: in updateInteractive
   self.bokehSource.data = newSource.data
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/has_props.py:273: in __setattr__
   super().__setattr__(name, value)
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/property/descriptors.py:969: in __set__
   raise ValueError(_CDS_SET_FROM_CDS_ERROR)
E   ValueError: 
E   ColumnDataSource.data properties may only be set from plain Python dicts,
E   not other ColumnDataSource.data values.
E   
E   If you need to copy set from one CDS to another, make a shallow copy by
E   calling dict: s1.data = dict(s2.data)

Using only columns used in interactive drawing and export function on cliet

Current bokehDrawSA algorithm is sending to client all columns from input data source

  • In new approach only columns used in figures resp. widget will be exported
    • Internally map with column usage status will be used
  • In case function columns can be evaluated on client - function representation (string) to be send
    • cdsCompress should recognize function (eventually test it)

ATO-459 - MLpipeline code to be restructured - models to be constructed/loaded/saved/reused independently of fitters

In order to include new regression, classifiers - MLpipeline code to be restructured
https://alice.its.cern.ch/jira/browse/ATO-459

Current version (TO BE deprecated)

  • design influenced by TMVA - does not scale
  • fitter, regressor created in fit function based on the names and options
    • method parameter defined in Register_Method
    • model created during the fit method
    • many if, does not scale

New version - to be implemented

  • models (regression, quantile regression wrappers) to be constructed by users
  • wrappers implement additional common functionality
  • models registered in Register_model
  • models reused for fits

tree2Panda - Extension usage of "hidden branches and functions"

Current interface

def tree2Panda(tree, include, selection, **kwargs):
    r"""
    Convert selected items from the tree into panda table
    TODO:
        * to  consult with uproot
            * currently not able to work with friend trees
        * check the latest version of RDeatFrame (in AliRoot latest v16.16.00)
        * Add filter on metadata - e.g class of variables
    :param tree:            input tree
    :param include:         regular expression array - processing Tree+Friends, branches, aliases
    :param selection:       tree selection ()
    :param kwargs:
        * exclude           exclude arrray
        * firstEntry        firt entry to enter
        * nEntries          number of entries to convert
        * column mask
    :return:                panda data frame
    """

Adding prediction intervals for GradientBoostingRegressor and review RandomForest prediction intervals - https://alice.its.cern.ch/jira/browse/ATO-459

GradientBoostingRegressor wrapper should be added to the list of wrappers i RootInteractive:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html
https://medium.com/@qucit/a-simple-technique-to-estimate-prediction-intervals-for-any-regression-model-2dd73f630bcb

To be integrated in similar way also QuantileRegressionForest -clone
https://scikit-garden.github.io/examples/QuantileRegressionForests/
https://jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf

Finally, quantile regression is not available for all types of regression models. In scikit-learn, the only 
model that implements it is the Gradient Boosted Regressor. Sometimes, such as in the case of XGBoost, 
you can customize the model’s cost-function to  obtain quantile regressor. You can read the details of how to do it here.

LoadTrees is failing in case multiple times used (sometime)

Invoking second time the same input data sets the code is stuck.
test to be added

tree, treeList, fileList = LoadTrees(inputStream,"(.*)","XXX",".*root",0)
for ftree in tree.GetListOfFriends():
    isOK='{}.entries>50&&{}.isFitValid&&abs({}.vecLTM.fElements[1]-{}.binMedian)<5&&abs({}.vecLTM.fElements[1]-{}.meanG)<5'.format(ftree.GetName(),ftree.GetName(),ftree.GetName(),ftree.GetName(),ftree.GetName(),ftree.GetName())
    tree.SetAlias(ftree.GetName()+".isOK",isOK)

Automatic update of the colz plot range - trigered by change of selection

Currently selecting points using sliders color map is not adjusted.
E.g selecting momentum range using sliders in small subset, the color code of associated points
is not readjusted.

This should be an option:

Callback function to be implemented -to be called after makeJSCalback for widget selection

Integration of the 3D graphics into RootInteractive using Bokeh

Starting Example:
http://docs.bokeh.org/en/latest/docs/user_guide/extensions_gallery/wrapping.html#userguide-extensions-examples-wrapping

According main author of Bokeh - not native support of 3D. In example wrapper for vis.js.
See: https://groups.google.com/a/continuum.io/g/bokeh/c/u_j46VHRyt0

Hi,

The mpl.to_bokeh function was never able to convert 3D plots, so it would not have helped here. In fact, it was barely ever able to do much at all. The long-discussed MPL JSON spec that might have made maintainable MPL compatibility possible never materialized, which is why the whole idea was ultimately abandoned.

In any case, the core Bokeh library does not focus on or provide any built-in 3d capability, and it's unlikely that it ever will. The "surface3d" example is mostly intended as a demonstration that Bokeh can be extended with custom models and external JS libraries, but the underlying vis.js library that it wraps is purely a toy, in my opinion. If anyone wanted to create a custom extension to integrate a better 3d JS library with Bokeh, that would be great. Something like three.js is probably a good candidate, though there may be others I am not aware of. It could be a little work or a lot of work, depending on whether the extension was very specialized and narrow, or more general. For reference, the chapter on creating custom extensions is here:

https://bokeh.pydata.org/en/latest/docs/user_guide/extensions.html

Thanks,

Bryan

Hints for histogram

Standard variables:

  • bin range
  • bin center
  • bin content
  • bin error

Histogram info:

  • name
  • Mean, RMS, entries

add better documentation and examples in aliTreePlayer.py and bokehDrawSA - kwargs documentation and examples

The declarative programming used in bokehDrawSA is a type of coding where developers express the computational logic without having to programme the control flow of each process. This can help simplify coding, as developers only need to describe what they want the programme to achieve, rather than explicitly prescribing the steps or commands required to achieve the desired result.

bokehDrawSA.fromArray(df, None, figureArray, widgetParams, layout=figureLayoutDesc, tooltips=tooltips, parameterArray=parameterArray, widgetLayout=widgetLayoutDesc, sizing_mode="scale_width", nPointRender=300, aliasArray=aliasArray, histogramArray=histoArray)```

The documentation for individual aspects can be log, and is not suitable for the inline code documentation. Separate README files to be created.

- READMEhistograms
- REDMEaliases
- READMEfigures
- ...

Problems with x_visible and y_visible

To share y axis and x axis option y_visible and x_visible can be used
y_visible:

  • 0 - do not show y axis
  • 1 - show y axis (default)
  • 2 - show why axis only left side

Older implementation did not implement feature properly.
To do:
Modify source code and improve tests

failure in makePdfMaps in case of empty slices

Posting bug report from JIRA - https://alice.its.cern.ch/jira/browse/ATO-495

Hello Marian Ivanov,

I would like to do multi-dimensional fits of the ML validation data, e.g. diff(Predicted-Calculated):r:z:meanSC. Here I would like to use the makePdfMaps function from RootInteractive.

Be design, for the additional variables, like meanSC, meanDist, flucDist, derivativeDist, there will be empty bins in some of the voxels when histograming the full volume of the TPC. I think that's why I get a crash like this:

Traceback (most recent call last):
  File "/lustre/alice/users/hellbaer/TPCwithDNN/tpcwithdnn/analyser.py", line 159, in <module>
    makeNDFits()
  File "/lustre/alice/users/hellbaer/TPCwithDNN/tpcwithdnn/analyser.py", line 69, in makeNDFits
    dfNDfits = makePdfMaps(histoND, slices, dimI)
  File "/lustre/alice/users/hellbaer/python/venv3/lib/python3.6/site-packages/RootInteractive/Tools/makePDFMaps.py", line 92, in makePdfMaps
    means.append(np.average(centerI, weights=iHisto))
  File "<__array_function__ internals>", line 6, in average
  File "/lustre/alice/users/hellbaer/python/venv3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 420, in average
    "Weights sum to zero, can't be normalized")
ZeroDivisionError: Weights sum to zero, can't be normalized
When bins have no entries, the weights iHisto will all be zero.

Now, is there a quick workaround for this, e.g. skip the statistics calculation for the empty bins and mask them in the data frame? Otherwise I will have to manually select regions in the TPC.

Cheers,
Ernst

Catch error ant write clearer error message and hints in figure array parsing

Example:

/usr/local/lib/python3.6/dist-packages/RootInteractive/InteractiveDrawing/bokeh/bokehTools.py in processBokehLayoutArray(widgetLayoutDesc, widgetArray)
    292         rowWidgetArray0 = []
    293         for i, iWidget in enumerate(rowWidget):
--> 294             figure = widgetArray[iWidget]
    295             rowWidgetArray0.append(figure)
    296             if hasattr(figure, 'x_range'):

IndexError: list index out of range

Error was due more figures used in layeout description than in the figure array

Custom query using edit window not working

In new code custom query function disabled because not working properly.
New custom function query to be implemented:

  • widgets as (javascript) function on clients
    • e.g log(A)
  • boolen javascript - custom
    • e.g A+B<C
  • logical expression on top of sliders/ranges/select/multiselect/checkbox

Multidimensional density estimation using forest of kd-trees -alternative for histograms for sparse data

Forest of kdtrees could be used as an approximator of the density estimation (https://en.wikipedia.org/wiki/Density_estimation).

This can be used as an analogue inspired by random forest for machine learning:

  • set of the kdtrees with bootstrapping to approximate smooth function
  • user defined/custom local metric
  • randomly selected

Example usage of the kdtrees (to be extended):

Client side histogramming in bokeh interface - unbinned and binned data

Proposal: syntax for unbinned data histogramming on client

Standard syntax example: https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/InteractiveDrawing/bokeh/test_bokehDrawSA.py

figureArray = [
#   ['A'], ['C-A'], {"color": "red", "size": 7, "colorZvar":"C", "filter": "A<0.5"}],
    [['A'], ['A*A-C*C'], {"color": "red", "size": 2, "colorZvar": "A", "varZ": "C", "errY": "errY", "errX":"0.01"}],
    [['A'], ['C+A', 'C-A', 'A/A']],
...
    [['D'], ['D*10'], {"size": 10, "errY": "errY"}],
]

->

figureArray = [
   ['A'], ['histo'], {"color": "red", "size": 7,nBins=10, range=() ],
   ['A','B'], ['histo'], {"color": "red", "size": 7, nBins=10 , range=()],
   ['A','B'], ['histo2D'], {"color": "red", "size": 7, nBins=10 , range=()],
]

Clean up of the old user interface. Move to v0-01-xx

Old string based interface to be clean up, as it is not possible to support
User interface/configuration will be based on Python data structures:

  • Dictionaries
  • Array

Before clean up back compatible Release v0-00-25 to be created and distributed

Data compression server -> client for CDS

Data compression - to reduce data transfer between server -> client

  • lossy compression first
  • lossless compression

Lossy compression of CDS columns

  • for integer not needed if lossless compression will follow
    • before implementing lossless compression round to the smallest
    • e.g int8, int16
    • or categories
  • identify if categories
    • could be boolean distinct float values
    • to check categories - how they are sent in bokeh
  • optional user defined strategies for floats:
    • relative precision
    • absolute precision
    • interval - linear transform
      • [xbeing, xend, nbins]
  • optional delta compression - can be automatic
  • Use value-previous value

Lossles compression

The same compression on server - > corresponding decompression on client
To define which compression to use

Default markers failing in case not proper marker type specified

marker could be secified in the figure array with key markers
In case varaible type not catecorical - drawing is stuck.
To be fixed and added to the unit test:

Example use case:

figureArray = [
[['rangeCm'], ["dEdx"], {"colorZvar":"Z","markers":"A"}],
[['rangeCm'], ["rangeSigmaCm"], {"colorZvar":"Z"}],
#["tableHisto", {"rowwise": True}],
{"size": 5}
]
widgetParams=[
['range', ["Z"]],
['range', ["A"]],
['range', ["p"]],
]
tooltips = [("Z", "@z"),("A","@A")]
widgetLayoutDesc=[
[0,1],
{'sizing_mode':'scale_width'}
]
figureLayoutDesc=[
[0,1,{'plot_height':450}],
#[6,{'plot_height':25}],
{'plot_height':240,'sizing_mode':'scale_width',"legend_visible":False}
]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.