Comments (7)
additional problem with the median shifted by 1 bin observed in the test
from rootinteractive.
I have to make a correction of the statement in the first comment #41 (comment)
Observed time to make PDF map from histogram ~ 5 minutes (180 x 33 x 40 x 8 )
I think this was for a first version in which the histograms have been smaller. With the histogram size quoted above, the time to create one map is of the order of 30 min on the GSI batch farm.
from rootinteractive.
Commit:
d96b620
- speeding up median calculation - caching quantiles
- fixing median bug
from rootinteractive.
Benchmark
In old implementattion sum was calcualeted in loop.
In new implementation (comit d96b620) cumulative function was calculated only once
code to be further speed up
python -m cProfile -s tottime RootInteractive/Tools/test_makePDFMaps.py
Old:
-
147250797 function calls (147219265 primitive calls) in 138.137 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
47292148 86.467 0.000 86.467 0.000 {method 'reduce' of 'numpy.ufunc' objects}
1 26.240 26.240 133.420 133.420 makePDFMaps.py:3(makePdfMaps)
47291767 10.851 0.000 104.146 0.000 {method 'sum' of 'numpy.ndarray' objects}
47291767 6.990 0.000 93.295 0.000 _methods.py:34(_sum)
12 2.665 0.222 2.665 0.222 {method 'astype' of 'numpy.ndarray' objects}
77 1.210 0.016 1.210 0.016 {method 'copy' of 'numpy.ndarray' objects}
New
-
5377715 function calls (5346183 primitive calls) in 17.280 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 9.322 9.322 12.237 12.237 makePDFMaps.py:3(makePdfMaps)
12 2.896 0.241 2.896 0.241 {method 'astype' of 'numpy.ndarray' objects}
77 1.298 0.017 1.298 0.017 {method 'copy' of 'numpy.ndarray' objects}
1 0.538 0.538 0.835 0.835 completer.py:103(<module>)
237600 0.455 0.000 0.455 0.000 {method 'cumsum' of 'numpy.ndarray' objects}
208 0.292 0.001 0.292 0.001 {method 'flatten' of 'numpy.ndarray' objects}
1102 0.176 0.000 0.176 0.000 {method 'reduce' of 'numpy.ufunc' objects}
from rootinteractive.
Reducing memory usage and speed up:
commit 409d70b (HEAD -> master, miranov25/master)
Author: miranov25 [email protected]
Date: Fri Apr 24 13:00:23 2020 +0200
Removing not necessary copy of data
* smaller memory usage and faster
Benchmark:
Now the time is spent mostly in python interpreter
next factor 10 to be gained optimizing loops
-
5377627 function calls (5346095 primitive calls) in 15.739 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 9.400 9.400 11.122 11.122 makePDFMaps.py:3(makePdfMaps)
12 2.679 0.223 2.679 0.223 {method 'astype' of 'numpy.ndarray' objects}
1 0.566 0.566 0.868 0.868 completer.py:103(<module>)
237600 0.430 0.000 0.430 0.000 {method 'cumsum' of 'numpy.ndarray' objects}
208 0.325 0.002 0.325 0.002 {method 'flatten' of 'numpy.ndarray' objects}
1102 0.272 0.000 0.272 0.000 {method 'reduce' of 'numpy.ufunc' objects}
from rootinteractive.
Pull request #43
Using np.searchsorted for median calculation
Timing improvement factor ~ 2
-
6324457 function calls (6292925 primitive calls) in 7.651 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
12 2.798 0.233 2.798 0.233 {method 'astype' of 'numpy.ndarray' objects}
1 0.736 0.736 2.938 2.938 makePDFMaps.py:3(makePdfMaps)
1 0.578 0.578 0.883 0.883 completer.py:103(<module>)
237600 0.424 0.000 0.424 0.000 {method 'cumsum' of 'numpy.ndarray' objects}
208 0.336 0.002 0.336 0.002 {method 'flatten' of 'numpy.ndarray' objects}
592 0.289 0.000 0.289 0.000 {method 'reduce' of 'numpy.ufunc' objects}
237605 0.255 0.000 0.255 0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
5 0.173 0.035 0.173 0.035 {pandas._libs.lib.maybe_convert_objects}
from rootinteractive.
Closing issue
In benchmark above speeding factor 20 achieved
Next improvement using py-torch GPU implementation + fitting
from rootinteractive.
Related Issues (20)
- Variable metadata support for variables (columns, aliases) in RootInteractuve - AxisLabel, Description, HOT 2
- Latex labels in RootInteractive - https://docs.bokeh.org/en/2.4.1/docs/user_guide/extensions_gallery/latex.html HOT 2
- Summary table - for the selection applied HOT 1
- Missing protection in the spinner creation HOT 1
- Webasm function in RootInteractive - prototype (Unfolding, linear algebra) - C++, Rust
- RandomForest def predictRFStat(rf, X, statDictionary,n_jobs) optimization (memory and CPU) HOT 2
- Ndimensional Linear regression in the RootInteractive - user interface specification https://en.wikipedia.org/wiki/Linear_regression HOT 1
- ROOT tree draw like C++ interface for interactive visualisation
- Better Error handling in cases of invalid widget specification
- Domain specific language on top of RDF (range, nearest, "rolling") - extending #297 HOT 2
- Custom variables non reacting in case previous sysntax problem in the widget
- Domain secific language using RDataFrame - documentation #335 & #297 HOT 1
- Parameterized Template -snapshot configuration on server and on client (relates. to #302) HOT 3
- LaTeX Formatting Not Working in RootInteractive dashboard with Bokeh 3.0.2+ HOT 2
- New template function - extending getDefaultVars with additions of weight and reference weight + scale (e.g resolution scale.) HOT 1
- Statistics cuts for the aggregated statistic drawing (number of entries in bin, STD cut)
- Data volume reduction on the client -data types and indices HOT 8
- Issue: Add WebAssembly Support for Numerical C++ Code in Bokeh HOT 1
- Data volume monitoring in user interface - for #355 HOT 1
- Augmented Random Forest with Kernel Convolution + predictRFStat extension - JIRAO2-5110 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rootinteractive.