miranov25 / rootinteractive Goto Github PK

Jupyter Notebook 20.85% Python 64.09% C 0.34% Shell 0.91% TypeScript 11.65% C++ 2.13% JavaScript 0.04%

rootinteractive's Introduction

RootInteractive

Code for the interactive sttistical aggregation and visualisation of multidimensional data in ROOT or native Python formats (Panda, numpy).

Support for ROOT data structures:

TTree and TTreeFormula, aliases ...
TFormula, or any static Root/AliRoot functions.
RDataFrame <-> awkward - work in progress

Root and PyRoot (AliRoot/O2) data structures could be used as input data sources. However, the code also works with pandas only, without the need to have the ROOT package installed. Internally, these data structures are converted into the Bokeh CDS (ColumnDataSource) or in our RootInteractive CDS for the NDimensional histograms, projections and aggregated information.

RootInteractive content:

Interactive, easily configurable visualisation of non-binned and binned data.
Interactive n-dimensional histogramming/projection and derived aggregated information extraction
Client/server application Jupyter, Bokeh
Standalone client application - (Bokeh Standalone Dashboard)
Lossy and lossless data compression (server --> client)
ROOT and RDataFrame tools and interfaces

Interactive visualization, histogramming and data aggregation in N-dimensions on client

The figure array declaration is used as an argument in bokehDrawSA to create an array of figures/graphs/scatter plots/ Unbined or binned (Ndimension histogram and derived statistics/projection) bokeh data sources and derived variables and aggregated statistics can be used for drawing.

The declarative programming used in bokehDrawSA is a type of coding where developers express the computational logic without having to programme the control flow of each process. This can help simplify coding, as developers only need to describe what they want the programme to achieve, rather than explicitly prescribing the steps or commands required to achieve the desired result.

The interactive visualization is declared in the 6 arrays as ine the example bellow

bokehDrawSA.fromArray(df, None, figureArray, widgetParams, layout=figureLayoutDesc, tooltips=tooltips, parameterArray=parameterArray,
                          widgetLayout=widgetLayoutDesc, sizing_mode="scale_width", nPointRender=300,
                           aliasArray=aliasArray, histogramArray=histoArray,arrayCompression=arrayCompression)

figureArrray - figure parameterization

see READMEfigure
Defining scatter/histogram/derived figures using input data source

Example declaration of the figure from data source with columns ABCD

figureArray = [
[['A'], ['A*A-C*C'], {"size": 2, "colorZvar": "A", "errY": "errY", "errX":"0.01"}],
[['A'], ['C+A', 'C-A', 'A/A']],
[['B'], ['C+B', 'C-B'], { "colorZvar": "colorZ", "errY": "errY", "rescaleColorMapper": True}],
[['D'], ['(A+B+C)*D'], {"colorZvar": "colorZ", "size": 10, "errY": "errY"} ],
[['D'], ['D*10'], {"errY": "errY"}],
{"size":"size", "legend_options": {"label_text_font_size": "legendFontSize"}}
]

histogramAray - interactive histogramming parameterization and examples

Defining interactive ND histogramsand derived statistics, updated based on the user selection, resp. by parametriz
see READMEhistogram

Example of creating a 3D histogram showing mean, sum and standard in the projection with colour code in the second dimension

histoArray = [
        {"name": "histoABC", "variables": ["(A+C)/2", "B", "C"], "nbins": [8, 10, 12], "weights": "D", "axis": [0], "sum_range": [[.25, .75]]},
    ]
figureArray = [
        [['bin_center_1'], ['mean']],
        [['bin_center_1'], ['sum_0']],
        [['bin_center_1'], ['std']],
        {"source": "histoABC_0", "colorZvar": "bin_center_2", "size": 7}
]

aliasArray alias/client side function parameterization

see READMEaliase
javascrript function with which you can define derived variables on the client. Used e.g. to parameterise the selection, histogram weights, efficiencies
newly created variables can be used in histogramArray, figureAray, aliasArray
Dependency trees to ensure consistency of aliases and the correct order of evaluation of derived variables and use in visualisation.

Example declaration:

    aliasArray = [
        # They can also be used as selection (boolen)  used e.g. for histogram weights
        {
            "name": "C_accepted",
            "variables": ["C"],
            "parameters": ["C_cut"],
            "func": "return C < C_cut"
        },
        # User-defined JS columns can also be created in histograms by specifying the context (CDS) parameter
        {
            "name": "efficiency_A",
            "variables": ["entries", "entries_C_cut"],
            "func": "return entries_C_cut / entries",
            "context": "histoA"
        },
        # Shorthand notation - only for scalar functions
        ("effC", "entries_C_cut / bin_count", "histoAC"),
    ]

widgetLayout - layout of the figures

READMElayout
Layout declared by and dictionary(tabs)/array of figure IDs (index or name ID)
Properties per row/simple layout/tab layout can be specified. More local properties have priority.

Example declaration:

layout = {
    "A": [
        [0, 1, 2, {'commonX': 1, 'y_visible': 1, 'x_visible':1, 'plot_height': 300}],
        {'plot_height': 100, 'sizing_mode': 'scale_width', 'y_visible' : 2}
        ],
    "B": [
        [3, 4, {'commonX': 1, 'y_visible': 3, 'x_visible':1, 'plot_height': 100}],
        {'plot_height': 100, 'sizing_mode': 'scale_width', 'y_visible' : 2}
        ]
}

layout - layout of the widgets

see READMElayoutWidget
Layout declared by and dictionary(tabs)/array of figure IDs (index or name ID)
Properties per row/simple layout/tab layout can be specified. More local properties have priority.

Example declaration:

simple layout

 widgetLayoutKine=[
     ["dca0","tgl","qPt","ncl"],
     ["dEdxtot","dEdxmax","mTime0"], 
     ["hasA","Run","IR","isMC"], 
     {'sizing_mode': 'scale_width'}
 ]

composed layout:

widgetLayoutDesc={
    "Select":widgetLayoutKine,
    "Histograms":[["nbinsX","nbinsY", "varX","yAxisTransform"], {'sizing_mode': 'scale_width'}],
    "Legend": figureParameters['legend']['widgetLayout'],
    "Markers":["markerSize"]
}

arrayCompresion -

see READMEcompression
- https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/Tools/compressArray.py#L141-L196
- https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/tutorial/bokehDraw/compression.ipynb
Significant data compression of the data (up to O(5-10%)) for server- > client transmission and for storage in the html file.
Compression depends heavily on the entropy of the data after lossy compression and on data repetition
Lossy and lossles compression expressed by regular expression per columns
In realistic use cases facto 10-100 compression achieved
further compression - using javascript aliases on client instead data transfer

Example declaration:

arrayCompressionParam=[(".*conv.*Sigma.*",[("relative",7), ("code",0), ("zip",0), ("base64",0)]),
                           (".*delta.*",[("relative",10), ("code",0), ("zip",0), ("base64",0)]),
                           (".*i2.*",[("relative",7), ("code",0), ("zip",0), ("base64",0)]),
                           (".*",[("relative",8), ("code",0), ("zip",0), ("base64",0)])]

Machine learning part - work in progrees

Wrappers for decision trees and Neural Net
Provides interface for the reducible, irreducible errors, proability density function
Local linear forest, resp. local kernel regression

RootInteractive Information

RootInteractive github (source code)
- https://github.com/miranov25/RootInteractive
- JIRA: https://alice.its.cern.ch/jira/browse/PWGPP-485
Documentation server at CERN (TODO -add reular update)
- https://rootinteractive.web.cern.ch/RootInteractive/html/
- Not yet regularly updated - TODO
- /eos/user/r/rootinteractive/www/html/

Tutorials

1.) Bokeh draw standalone (graphs,compression, down-sampling)
- https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/tutorial/bokehDraw/standAlone.ipynb
2.) N dimensional histogramming on client (data aggregation)
- https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/tutorial/bokehDraw/test_bokehClientHistogram.ipynb
3.) Custom function on client:
- https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/tutorial/bokehDraw/customJsColumns.ipynb

ALICE ROOTIntteractive tutorial

Sevearal ALICE use case (detector calibration, QA/QC)

https://indico.cern.ch/event/1135398/

Galery material in the ALICE agenda () and document server

Support material for RCU note [N2]
- [D1] Visualization of the common-mode effect dependencies using ROOT interactive ( 11 Dimensions)
  - https://gitlab.cern.ch/aliceeb/TPC/-/blob/master/SignalProcessing/commonModeFractionML.html
- [D2] Visualization of the ion-tail fit parameters and correction graphs using ROOT interactive (12 Dimensions)
  - https://gitlab.cern.ch/aliceeb/TPC/-/blob/master/SignalProcessing/ionTailFitParameters_sectorScan.html
- [D3] Visualization of the toy MC results using ROOT interactive (13 Dimensions)
  - https://gitlab.cern.ch/aliceeb/TPC/-/blob/master/simulationScan/toyMCParameterScan.html
Support material for V0 reconstruction studies [P1]
- [D4] Interactive invariant mass histogram dashboards (6+2 Dimensions)
  - https://indico.cern.ch/event/1088044/#sc-1-3-interactive-histograms
- [D5] Pt and invariant mass performance maps dashboards
  - https://indico.cern.ch/event/1088044/#sc-1-2-gamma-dashboards
  - https://indico.cern.ch/event/1088044/#sc-1-4-k0-dashboards
QA and production preparation :
- [D6] QA comparison of ongoing MC and raw data production (LHC18q,r, LHC18c,LHC16f,LHC17g..) See interactive dashboards in agenda of calibration/tracking meeting:
  - https://indico.cern.ch/event/991449/ , https://indico.cern.ch/event/991450/ , https://indico.cern.ch/event/991451/
PID
- [D7] TPC PID calibration and QA
  - https://indico.cern.ch/event/983778
    - https://alice.its.cern.ch/jira/secure/attachment/53371/qaPlotPion_test1.html
    - https://indico.cern.ch/event/991451/contributions/4220782/attachments/2184007/3689893/qaPlotPion_Delta.html
Fast MCkalman and event display
- [D8] Space charge distortion calibration (Run3) and performance optimization (Run2, Alice3) - [P9]
  - https://indico.cern.ch/event/1091510/contributions/4599999/attachments/2338476/3986580/residualTrackParam.html
  - https://indico.cern.ch/event/1087849/contributions/4577709/attachments/2331293/3973338/residual_track_parameter_Dist_GainIBF.html
- [D9] High dEdx (spallation product) reconstruction and magnetic monopole tracking
  - https://indico.cern.ch/event/991452/contributions/4222204/attachments/2184856/3691411/seed1Display2.html
Space charge distortion calibration
- [D10] digital current grouping and factorization studies
  - https://indico.cern.ch/event/1091510/
  - https://indico.cern.ch/event/1087849/

rootinteractive's People

Contributors

Stargazers

Watchers

Forkers

janthamade marslandalice arvindkhuntia martinkroe pl0xz0rz ehellbar bulukutlu asasikum vsandul liuyaozhang lauraser ialexpovad

rootinteractive's Issues

bokehTools sliders not interactive after new bokeh interactive create

Notebook Example

first figure:

vars="drphiSector2:drphiSector4:drphiSector6:drphiSector7:drphiSector9:drphiSector16:drphiSector20:drphiSector30"
sliders="meanTRDCurrent(0,0.5,0.05,0,1):H2O(0,5,0.2,0,5):deltaTRDCurrentNorm(0.0,10.,0.1,0,10)"
plot2=bokehDrawTree(dfsplit.sample(200),"time>0","drphiMean",vars,"H2O",sliders,p3,ncols=3, commonX=1, commonY=1,tooltip=tooltips,size=5)

second figure:

vars="drphiSector2:drphiSector4:drphiSector6:drphiSector7:drphiSector9:drphiSector16:drphiSector20:drphiSector30"
varserr="drphiSector2err:drphiSector4err:drphiSector6err:drphiSector7err:drphiSector9err:drphiSector16err:drphiSector20err:drphiSector30err"
sliders="meanTRDCurrent(0,0.5,0.05,0,1):H2O(0,5,0.2,0,5):deltaTRDCurrentNorm(0.0,10.,0.1,0,10)"
plot2=bokehDrawTree(dfsplit.sample(200),"time>0","drphiMean",vars,"H2O",sliders,p3,ncols=3, commonX=1, commonY=1,tooltip=tooltips,size=5,errX="drphiMeanErr",errY=varserr)

Creating second figure, sliders in figure one stop to work

Definition of bokeh markers missing underscore

Hello Marian,
it seems like the definition of the markers are missing an underscore for few of them in bokehTools.py:
I think it should be:
bokehMarkers = ["square", "circle", "triangle", "diamond", "square_cross", "circle_cross", "diamond_cross", "cross", "dash", "hex", "inverted_triangle", "asterisk", "square_x", "x"]

You can check the exact names for the markers using
from bokeh.plotting import markers
markers()

Regards
Matthias

Tensorflow 2.5+ compatibility

Hello Marian,
when using tf > 2.4, keras is directly invoked from tensorflow, i.e. all
from keras import ...
have to be replaced by
from tensorflow.keras import ...
in RootInteractive/MLpipeline/NDFunctionInterface.py or the tf version has to be fixed at 2.4. I guess that there are more urgent issues, so I just would like to point it out to be aware of that.
Regards,
Ernst

Some pytest interactive drawing are not properly responding

In current Root Interactive some interactive plots are not reposnding

InteractiveDrawing/bokeh/test_bokehDrawSAArray_fromTTree.html - OK
InteractiveDrawing/bokeh/test_BokehDrawArrayWidgetNoScale.html - not responding
InteractiveDrawing/bokeh/test_BokehDrawArrayWidget.html - not responding
InteractiveDrawing/bokeh/test_bokehDrawSAArray.html - not responding
InteractiveDrawing/bokeh/test_bokehDrawSAOldInterface.html - not responding
InteractiveDrawing/bokeh/test_BokehRDrawArray_DrawSAfromArray.html - OK/
InteractiveDrawing/bokeh/test_BokehDrawArray_test_DrawfromArray.html - OK

tree2Panda fix - in case too many variables for the query - conversion fail

tree->Draw "goffpara" used iternally to fill the panda
in case too an variables (to check how many) empty panda frame is created

Todo:

catch error state
try to load varaible in groups

RootInteractive - widgets does not work properly for the bool and integers

For the sliders with integers and boolens we should use automatic

min, max as before
if not specified otherwise bin width =1

RootInteractive - with root_numpy not supported in ROOT v6.20

Until recently we were using root_numpy and root_pandas. Since we moved to root6-20xxx , this does not work anymore
RootInteractive test is failing.
Creating issue in RootInteractive

See discussion:
https://root-forum.cern.ch/t/lxplus-root-numpy-not-supported-for-root-6-18-00/35585

you probably need to recompile root_numpy against the new ROOT version that is active on lxplus.
(or activate a ROOT version from CMS sw stack that is compatible w/ your root_numpy version)
alternatively, you could use root2npy 7 or a combination of uproot 1+numpy.

Optimization of makePdfMaps implementation

Follow up of issue #37 and PR #40

Observed time to make PDF map from histogram ~ 5 minutes (180 x 33 x 40 x 8 )

To be profiled and optimized.

The compressed input histograms can be found here:

(base) hellbaer@lxir128:/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation$ ls /lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/*gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_derMeanDistR_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_derMeanSC_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucDistRDiff_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucDistR_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucDistRPred_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip
/lustre/alice/users/hellbaer/NOTESData/JIRA/ATO-495/validation/outputNDHistos_flucSC_phi90_r17_z17_filter4_poo0_drop0.00_depth4_batch0_scaler0_useSCMean1_useSCFluc1_pred_doR1_dophi0_doz0.gzip

Inside each file there is a histogram:
{"H": H, "axes": axis, "name": histoInfo[0], "varNames": varList}

They can be opened with

 with gzip.open(inputFileStr, 'rb') as inputFile:
    histo = pickle.load(inputFile)

Integrate dynamic down-sampling in number of points in bokeh visualization + bug fix in preserving selected points in interactive selection

In routine makeJScalback:

RootInteractive/InteractiveDrawing/bokeh/bokehTools.py:def makeJScallback(widgetDict, **kwargs)
user defined widgets (slider,ranges ...) defined subset of the data to visualize.
Speed of the visualization depends on the number of points to render. Having more than critical ammount (broswer dependent)
visualization on client get slower.
Event for small data sample (e.g 100000 points) it could be already problematic.

Solution:

define threshould on number of point to be shipped to client e.g NpointsMax=(10^6)
define number of points to be rendered on client NrenderMax=10^5
-- downsampling on client to be done in makeJScallback defined java script

Downsampling algorithm:

starting from random subset
sophistcation (e.g keep previously rendered) to be implemented later

Bug fix in selection

in maeJScalback - selected indeces are not reasigned
we should keep selection also after jscalback query

Histogram derived information - efficiency/integral/mean/rms in user derived ranges resp. quantilles

Functionality 1D, 2D, ND histogram:

user interface should use existing histogram, additional statistics will be calculated and used later following naming conventions

Examples:

{"name": "histoA", "variables": ["A"],"nbins":20, "quantiles":[0.05,0.5,0.95], "sumRange": [[-5,5]]}
- quantiles 0.05,0.5 and 0.95 will b added to the stat table
- stat table common for all histogram - if not calculated NaN to be added
{"name": "histoAB", "variables": ["A","B"],"nbins":20, "quantiles":[0.05,0.5,0.95], "sumRange": [[-5,5]], "axis":1}
export new 1D CDS with columns defined by naming convention
quantileArray =[], rangeArray=[],
- quantile and ranges could use fixed numbers (floats) or parameters (string indicated widgets) (controlled in user slider widget)
- quantile array input are quantiles (0-1) e.g 0.05, 0.1 output linearly interpolated position (not nearest)
- rangeArray input are ranges output integral and fraction in range linearly interpolated between bins
- quantiles to be added as _Q
  - for 1D to be added to the statInfo
    - will be added to the stat table
  - for 2D(ND) to be added to new create 1D CDS
    - axis to be specified by number (0,1,2 ....)
    - could be used in following function visualization - using function names
exported CDS for 2D, ND can be used i figure arrays
exported statInfo for 1D will be used i stat tables

Structures:

histogramArray
functionArray (array of java script functions)
system will create dependencies

Questions:

In the future we can use composition - histograms derived functions to be added to the function array

dependencies for derived variables
at the begiing we can use linear structure
hierarchical structure

edit: change name intRange -> sumRange

Bokeh wrappers - add support for categorical data in bokeDraw

New switch factor_mark and factor_cmap to be provided to handle categorical data.

See bokeh user_guide https://docs.bokeh.org/en/latest/docs/user_guide/data.html
section markers:

from bokeh.plotting import figure, show
from bokeh.sampledata.iris import flowers
from bokeh.transform import factor_cmap, factor_mark

SPECIES = ['setosa', 'versicolor', 'virginica']
MARKERS = ['hex', 'circle_x', 'triangle']

p = figure(title = "Iris Morphology")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Sepal Width'

p.scatter("petal_length", "sepal_width", source=flowers, legend_field="species", fill_alpha=0.4, size=12,
          marker=factor_mark('species', MARKERS, SPECIES),
          color=factor_cmap('species', 'Category10_3', SPECIES))

show(p)

Question regarding license

Hey,

I would like to try out your PyTorch histogramdd implementation for a research project. Unfortuntately, there's no license file in your repository which clarifies how this code may (not) be used by others (no license means no rights for others).

Thanks in advance

bokehVisJS3DGraph.ts not copied to site-packages

Hello Marian,

this issue is not critical and basically everything seems to work as it should.

When using the bokeh tools in a jupyter notebook, I do the following:

from bokeh.io import output_notebook
output_notebook()
from RootInteractive.InteractiveDrawing.bokeh.bokehDrawSA import *

Output:

BokehJS 2.2.3 successfully loaded.
Welcome to JupyROOT 6.20/02
x bokehVisJS3DGraph.ts

You already see the issue about bokehVisJS3DGraph.ts.

When switching the order of the imports to

from RootInteractive.InteractiveDrawing.bokeh.bokehDrawSA import *
from bokeh.io import output_notebook
output_notebook()

, one gets

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-eeb3637b94e2> in <module>
     17 from RootInteractive.InteractiveDrawing.bokeh.bokehDrawSA import *
     18 from bokeh.io import output_notebook
---> 19 output_notebook()
...
...
further traceback
...
...
FileNotFoundError: [Errno 2] No such file or directory: '/home/hellbaer/software/miniconda3/lib/python3.6/site-packages/RootInteractive/InteractiveDrawing/bokeh/bokehVisJS3DGraph.ts'

I installed the latest RootInteractive using

python3 -m pip install rootinteractive

(base) hellbaer@hellbaer-Thinkpad:~/software/TPCwithDNN/tpcwithdnn$ pip3 show bokeh rootinteractive
Name: bokeh
Version: 2.2.3
Summary: Interactive plots and applications in the browser from Python
Home-page: http://github.com/bokeh/bokeh
Author: Bokeh Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /home/hellbaer/software/miniconda3/lib/python3.6/site-packages
Requires: typing-extensions, python-dateutil, pillow, tornado, packaging, PyYAML, numpy, Jinja2
Required-by: RootInteractive
---
Name: RootInteractive
Version: 0.0.24
Summary: UNKNOWN
Home-page: https://github.com/miranov25/RootInteractive
Author: Marian Ivanov
Author-email: [email protected]
License: Not defined yet. Most probably similar to ALICE (CERN)  license
Location: /home/hellbaer/software/miniconda3/lib/python3.6/site-packages
Requires: pandas, bqplot, requests, numpy, anytree, scikit-hep, keras, bokeh, beakerx, scipy, qgrid, uproot, ipywidgets, plotly, nb-clean, iminuit, sklearn, runtime, pytest, matplotlib, nbval, forestci, tabulate, tensorflow
Required-by: TPCwithDNN

Cheers,
Ernst

tree2Panda - number of branches/columns is limitted

In case too any variables (to quantify) - export is failing

Code fails in case columns 2 time defined in panda

...

Problem with bokeh transition 1.34 --> 2.0

Pytest starts to fail after transition to bokeh 2.0.

Starting with new python environmnent installed from scratch - bokeh 2.0 is taken.
Bokeh 2.0 changed police for CDS.
Should be fixed as soon as possible.
Using older version of bokeh tests are running properly
Options to fix:

Fix the code itself
- should work for old and new bokeh
Restrict bokeh version
- can be used as a temporary solution

Test failure

(virtualenv3Test) miranov@miranov-Strix-17-GL703GE:~/github/RootInteractive3/RootInteractive/InteractiveDrawing$ pytest
====================================================================================== test session starts ======================================================================================
platform linux -- Python 3.6.9, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /home2/miranov/github/RootInteractive3
plugins: nbval-0.9.5
collected 5 items / 2 errors / 1 skipped / 2 selected                                                                                                                                           

============================================================================================ ERRORS =============================================================================================
___________________________________________________________ ERROR collecting RootInteractive/InteractiveDrawing/bokeh/test_Layout.py ____________________________________________________________
bokeh/test_Layout.py:41: in <module>
   test_Draw()
bokeh/test_Layout.py:15: in test_Draw
   bokehFigure=bokehDraw(df, "A>0", "A", "A:B:C:D", "C", "slider.A(0,100.1,0,1),slider.B(0,100,100,100,300)", None, layout=testLayout)
bokeh/bokehDraw.py:106: in __init__
   self.updateInteractive("")
bokeh/bokehDraw.py:295: in updateInteractive
   self.bokehSource.data = newSource.data
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/has_props.py:273: in __setattr__
   super().__setattr__(name, value)
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/property/descriptors.py:969: in __set__
   raise ValueError(_CDS_SET_FROM_CDS_ERROR)
E   ValueError: 
E   ColumnDataSource.data properties may only be set from plain Python dicts,
E   not other ColumnDataSource.data values.
E   
E   If you need to copy set from one CDS to another, make a shallow copy by
E   calling dict: s1.data = dict(s2.data)
_______________________________________________________ ERROR collecting RootInteractive/InteractiveDrawing/bokeh/test_bokehDrawArray.py ________________________________________________________
bokeh/test_bokehDrawArray.py:47: in <module>
   test_DrawfromArray()
bokeh/test_bokehDrawArray.py:35: in test_DrawfromArray
   fig=bokehDraw.fromArray(df, "A>0", figureArray,"slider.A(0,100,0,0,100)",tooltips=tooltips, layout=figureLayout)
bokeh/bokehDraw.py:165: in fromArray
   self.updateInteractive("")
bokeh/bokehDraw.py:295: in updateInteractive
   self.bokehSource.data = newSource.data
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/has_props.py:273: in __setattr__
   super().__setattr__(name, value)
../../../../software/virtualenv3Test/lib/python3.6/site-packages/bokeh/core/property/descriptors.py:969: in __set__
   raise ValueError(_CDS_SET_FROM_CDS_ERROR)
E   ValueError: 
E   ColumnDataSource.data properties may only be set from plain Python dicts,
E   not other ColumnDataSource.data values.
E   
E   If you need to copy set from one CDS to another, make a shallow copy by
E   calling dict: s1.data = dict(s2.data)

Add column mask and option for table visualization in bokeh table

By default all elements of panda table visualized

mask - as for the panda creation
array
all used

RootInteractive fails in case NaN in panda used. Protection to be added

failure in the Python - master part
sometime failure happen on client part in unzipping

errX and errY options does not work in the array draw

This option was forgotten to implement in the array interface RootInteractive/InteractiveDrawing/bokeh/bokehTools.py

TODO:

enable errX,errY
include it in default test
- RootInteractive/InteractiveDrawing/bokeh/test_bokehDrawArray.py

https://alice.its.cern.ch/jira/browse/PWGPP-548

bokehDrawArray on bokeh server -server callback function

Should work on Jupyter notebook and standalone bokeh server

Using only columns used in interactive drawing and export function on cliet

Current bokehDrawSA algorithm is sending to client all columns from input data source

In new approach only columns used in figures resp. widget will be exported
- Internally map with column usage status will be used
In case function columns can be evaluated on client - function representation (string) to be send
- cdsCompress should recognize function (eventually test it)

ATO-459 - MLpipeline code to be restructured - models to be constructed/loaded/saved/reused independently of fitters

In order to include new regression, classifiers - MLpipeline code to be restructured
https://alice.its.cern.ch/jira/browse/ATO-459

Current version (TO BE deprecated)

design influenced by TMVA - does not scale
fitter, regressor created in fit function based on the names and options
- method parameter defined in Register_Method
- model created during the fit method
- many if, does not scale

New version - to be implemented

models (regression, quantile regression wrappers) to be constructed by users
wrappers implement additional common functionality
models registered in Register_model
models reused for fits

tree2Panda - Extension usage of "hidden branches and functions"

Current interface

def tree2Panda(tree, include, selection, **kwargs):
    r"""
    Convert selected items from the tree into panda table
    TODO:
        * to  consult with uproot
            * currently not able to work with friend trees
        * check the latest version of RDeatFrame (in AliRoot latest v16.16.00)
        * Add filter on metadata - e.g class of variables
    :param tree:            input tree
    :param include:         regular expression array - processing Tree+Friends, branches, aliases
    :param selection:       tree selection ()
    :param kwargs:
        * exclude           exclude arrray
        * firstEntry        firt entry to enter
        * nEntries          number of entries to convert
        * column mask
    :return:                panda data frame
    """

Downsampling is biased towards to edge

Specifying nPointRender in drawing points looks like not sampled randomly but they are biased

Adding prediction intervals for GradientBoostingRegressor and review RandomForest prediction intervals - https://alice.its.cern.ch/jira/browse/ATO-459

GradientBoostingRegressor wrapper should be added to the list of wrappers i RootInteractive:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html
https://medium.com/@qucit/a-simple-technique-to-estimate-prediction-intervals-for-any-regression-model-2dd73f630bcb

To be integrated in similar way also QuantileRegressionForest -clone
https://scikit-garden.github.io/examples/QuantileRegressionForests/
https://jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf

Finally, quantile regression is not available for all types of regression models. In scikit-learn, the only 
model that implements it is the Gradient Boosted Regressor. Sometimes, such as in the case of XGBoost, 
you can customize the model’s cost-function to  obtain quantile regressor. You can read the details of how to do it here.

LoadTrees is failing in case multiple times used (sometime)

Invoking second time the same input data sets the code is stuck.
test to be added

tree, treeList, fileList = LoadTrees(inputStream,"(.*)","XXX",".*root",0)
for ftree in tree.GetListOfFriends():
    isOK='{}.entries>50&&{}.isFitValid&&abs({}.vecLTM.fElements[1]-{}.binMedian)<5&&abs({}.vecLTM.fElements[1]-{}.meanG)<5'.format(ftree.GetName(),ftree.GetName(),ftree.GetName(),ftree.GetName(),ftree.GetName(),ftree.GetName())
    tree.SetAlias(ftree.GetName()+".isOK",isOK)

Automatic update of the colz plot range - trigered by change of selection

Currently selecting points using sliders color map is not adjusted.
E.g selecting momentum range using sliders in small subset, the color code of associated points
is not readjusted.

This should be an option:

fixed color code for points
automatically adjusted color code in <min,max> as in the example below
https://stackoverflow.com/questions/53825387/python-bokeh-update-scatter-plot-colors-on-callback

Callback function to be implemented -to be called after makeJSCalback for widget selection

Integration of the 3D graphics into RootInteractive using Bokeh

Starting Example:
http://docs.bokeh.org/en/latest/docs/user_guide/extensions_gallery/wrapping.html#userguide-extensions-examples-wrapping

According main author of Bokeh - not native support of 3D. In example wrapper for vis.js.
See: https://groups.google.com/a/continuum.io/g/bokeh/c/u_j46VHRyt0

Hi,

The mpl.to_bokeh function was never able to convert 3D plots, so it would not have helped here. In fact, it was barely ever able to do much at all. The long-discussed MPL JSON spec that might have made maintainable MPL compatibility possible never materialized, which is why the whole idea was ultimately abandoned.

In any case, the core Bokeh library does not focus on or provide any built-in 3d capability, and it's unlikely that it ever will. The "surface3d" example is mostly intended as a demonstration that Bokeh can be extended with custom models and external JS libraries, but the underlying vis.js library that it wraps is purely a toy, in my opinion. If anyone wanted to create a custom extension to integrate a better 3d JS library with Bokeh, that would be great. Something like three.js is probably a good candidate, though there may be others I am not aware of. It could be a little work or a lot of work, depending on whether the extension was very specialized and narrow, or more general. For reference, the chapter on creating custom extensions is here:

https://bokeh.pydata.org/en/latest/docs/user_guide/extensions.html

Thanks,

Bryan

Hints for histogram

Standard variables:

bin range
bin center
bin content
bin error

Histogram info:

name
Mean, RMS, entries

tree2Panda - use data types as in TTree branches - possible to specify compression syntax

Compression - check:

boolen
char
int32
int64
float

Singularity container for RootInteractive in https://cloud.sylabs.io + CERN software

Validated singularity containers should be regularly distributed in the https://cloud.sylabs.io

Prototype in
https://cloud.sylabs.io/library/miranov25/default/rootinteractive

add better documentation and examples in aliTreePlayer.py and bokehDrawSA - kwargs documentation and examples

bokehDrawSA.fromArray(df, None, figureArray, widgetParams, layout=figureLayoutDesc, tooltips=tooltips, parameterArray=parameterArray, widgetLayout=widgetLayoutDesc, sizing_mode="scale_width", nPointRender=300, aliasArray=aliasArray, histogramArray=histoArray)```

The documentation for individual aspects can be log, and is not suitable for the inline code documentation. Separate README files to be created.

- READMEhistograms
- REDMEaliases
- READMEfigures
- ...

Problems with x_visible and y_visible

To share y axis and x axis option y_visible and x_visible can be used
y_visible:

0 - do not show y axis
1 - show y axis (default)
2 - show why axis only left side

Older implementation did not implement feature properly.
To do:
Modify source code and improve tests

Make client configurable layout - enabling disabling collumns and row

https://stackoverflow.com/questions/52415577/bokeh-widget-to-show-hide-figures

Figure configuration e.g like checkbox

AppendStatPandas - replace columns in panda in case already exist

failure in makePdfMaps in case of empty slices

Posting bug report from JIRA - https://alice.its.cern.ch/jira/browse/ATO-495

Hello Marian Ivanov,

I would like to do multi-dimensional fits of the ML validation data, e.g. diff(Predicted-Calculated):r:z:meanSC. Here I would like to use the makePdfMaps function from RootInteractive.

Be design, for the additional variables, like meanSC, meanDist, flucDist, derivativeDist, there will be empty bins in some of the voxels when histograming the full volume of the TPC. I think that's why I get a crash like this:

Traceback (most recent call last):
  File "/lustre/alice/users/hellbaer/TPCwithDNN/tpcwithdnn/analyser.py", line 159, in <module>
    makeNDFits()
  File "/lustre/alice/users/hellbaer/TPCwithDNN/tpcwithdnn/analyser.py", line 69, in makeNDFits
    dfNDfits = makePdfMaps(histoND, slices, dimI)
  File "/lustre/alice/users/hellbaer/python/venv3/lib/python3.6/site-packages/RootInteractive/Tools/makePDFMaps.py", line 92, in makePdfMaps
    means.append(np.average(centerI, weights=iHisto))
  File "<__array_function__ internals>", line 6, in average
  File "/lustre/alice/users/hellbaer/python/venv3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 420, in average
    "Weights sum to zero, can't be normalized")
ZeroDivisionError: Weights sum to zero, can't be normalized
When bins have no entries, the weights iHisto will all be zero.

Now, is there a quick workaround for this, e.g. skip the statistics calculation for the empty bins and mask them in the data frame? Otherwise I will have to manually select regions in the TPC.

Cheers,
Ernst

LoadTrees - loading too many trees - bug fix

LoadTree - loading tree for each line of the input specification
In case of metadata specification (Title) - previous file was used

Catch error ant write clearer error message and hints in figure array parsing

Example:

/usr/local/lib/python3.6/dist-packages/RootInteractive/InteractiveDrawing/bokeh/bokehTools.py in processBokehLayoutArray(widgetLayoutDesc, widgetArray)
    292         rowWidgetArray0 = []
    293         for i, iWidget in enumerate(rowWidget):
--> 294             figure = widgetArray[iWidget]
    295             rowWidgetArray0.append(figure)
    296             if hasattr(figure, 'x_range'):

IndexError: list index out of range

Error was due more figures used in layeout description than in the figure array

Custom query using edit window not working

In new code custom query function disabled because not working properly.
New custom function query to be implemented:

widgets as (javascript) function on clients
- e.g log(A)
boolen javascript - custom
- e.g A+B<C
logical expression on top of sliders/ranges/select/multiselect/checkbox

Multidimensional density estimation using forest of kd-trees -alternative for histograms for sparse data

Forest of kdtrees could be used as an approximator of the density estimation (https://en.wikipedia.org/wiki/Density_estimation).

This can be used as an analogue inspired by random forest for machine learning:

set of the kdtrees with bootstrapping to approximate smooth function
user defined/custom local metric
randomly selected

Example usage of the kdtrees (to be extended):

Client side histogramming in bokeh interface - unbinned and binned data

Proposal: syntax for unbinned data histogramming on client

Standard syntax example: https://github.com/miranov25/RootInteractive/blob/master/RootInteractive/InteractiveDrawing/bokeh/test_bokehDrawSA.py

figureArray = [
#   ['A'], ['C-A'], {"color": "red", "size": 7, "colorZvar":"C", "filter": "A<0.5"}],
    [['A'], ['A*A-C*C'], {"color": "red", "size": 2, "colorZvar": "A", "varZ": "C", "errY": "errY", "errX":"0.01"}],
    [['A'], ['C+A', 'C-A', 'A/A']],
...
    [['D'], ['D*10'], {"size": 10, "errY": "errY"}],
]

figureArray = [
   ['A'], ['histo'], {"color": "red", "size": 7,nBins=10, range=() ],
   ['A','B'], ['histo'], {"color": "red", "size": 7, nBins=10 , range=()],
   ['A','B'], ['histo2D'], {"color": "red", "size": 7, nBins=10 , range=()],
]

Adding value error resp weights to the ML interface

Directory named aux makes the repository impossible to clone under Windows.

I can't use git clone to clone the repository under Windows, and when trying to unpack the archive, I get a Windows error caused by the aux directory name being invalid under Windows.

Joins and queries on client #96

To reduce data transfer sever ->client and to minimize data size on client joins should be used

relates to #96

Standard bokeh support only limitted number of operations

bitmask
...

Options to investigate:

row wise DB- sql on client
- e.g. alasql
column wise DB on client
custom logic

Benhcmarks to be done within other project

https://github.com/miranov25/Hallo-worldTypeScript

Make documnatio script again working and integrate to the release creation

See some comment in:
https://alice.its.cern.ch/jira/browse/PWGPP-535

RootInteractive3/sphinx

Clean up of the old user interface. Move to v0-01-xx

Old string based interface to be clean up, as it is not possible to support
User interface/configuration will be based on Python data structures:

Dictionaries
Array

Before clean up back compatible Release v0-00-25 to be created and distributed

Bokeh legend options - hide/ set font size/set box size

parameters in draw array should be parsed and moved to the Legend options

Data compression server -> client for CDS

Data compression - to reduce data transfer between server -> client

lossy compression first
lossless compression

Lossy compression of CDS columns

for integer not needed if lossless compression will follow
- before implementing lossless compression round to the smallest
- e.g int8, int16
- or categories
identify if categories
- could be boolean distinct float values
- to check categories - how they are sent in bokeh
optional user defined strategies for floats:
- relative precision
- absolute precision
- interval - linear transform
  - [xbeing, xend, nbins]
optional delta compression - can be automatic
Use value-previous value

Lossles compression

The same compression on server - > corresponding decompression on client
To define which compression to use

Continuous integration - test suit+ documentation using TensorFlow+Pytorch+RootInteractive container

Goal:

RootInteractive should be tested on regular basis in Singularity container from scratch.

All the non AliRoot test should run from scratch for master master update
All tests including AliRoot/ROOT should by running for Release

Default markers failing in case not proper marker type specified

marker could be secified in the figure array with key markers
In case varaible type not catecorical - drawing is stuck.
To be fixed and added to the unit test:

Example use case:

figureArray = [
[['rangeCm'], ["dEdx"], {"colorZvar":"Z","markers":"A"}],
[['rangeCm'], ["rangeSigmaCm"], {"colorZvar":"Z"}],
#["tableHisto", {"rowwise": True}],
{"size": 5}
]
widgetParams=[
['range', ["Z"]],
['range', ["A"]],
['range', ["p"]],
]
tooltips = [("Z", "@z"),("A","@A")]
widgetLayoutDesc=[
[0,1],
{'sizing_mode':'scale_width'}
]
figureLayoutDesc=[
[0,1,{'plot_height':450}],
#[6,{'plot_height':25}],
{'plot_height':240,'sizing_mode':'scale_width',"legend_visible":False}
]

miranov25 / rootinteractive Goto Github PK

rootinteractive's Introduction

RootInteractive

RootInteractive content:

Interactive visualization, histogramming and data aggregation in N-dimensions on client

figureArrray - figure parameterization

histogramAray - interactive histogramming parameterization and examples

aliasArray alias/client side function parameterization

widgetLayout - layout of the figures

layout - layout of the widgets

arrayCompresion -

Machine learning part - work in progrees

RootInteractive Information

Tutorials

ALICE ROOTIntteractive tutorial

Galery material in the ALICE agenda () and document server

rootinteractive's People

Contributors

Stargazers

Watchers

Forkers

rootinteractive's Issues

Notebook Example

RootInteractive - with root_numpy not supported in ROOT v6.20

In routine makeJScalback:

Solution:

Downsampling algorithm:

Bug fix in selection

Functionality 1D, 2D, ND histogram:

Structures:

Questions:

Pytest starts to fail after transition to bokeh 2.0.

Test failure

TODO:

Current bokehDrawSA algorithm is sending to client all columns from input data source

Current version (TO BE deprecated)

New version - to be implemented

Current interface

Posting bug report from JIRA - https://alice.its.cern.ch/jira/browse/ATO-495

Example:

Forest of kdtrees could be used as an approximator of the density estimation (https://en.wikipedia.org/wiki/Density_estimation).

Proposal: syntax for unbinned data histogramming on client

To reduce data transfer sever ->client and to minimize data size on client joins should be used

Data compression - to reduce data transfer between server -> client

Lossy compression of CDS columns

Lossles compression

Goal:

Recommend Projects

Recommend Topics

Recommend Org