Giter Club home page Giter Club logo

ntuple_processor's Introduction

ntuple_processor

Submodule of ntuple-analysis containing the actual software to run the analysis

Structure

Examples

Tests

Before merging, check that all the tests are green by running

$ python -m unittest -v

ntuple_processor's People

Contributors

maxgalli avatar mburkart avatar harrypuuter avatar stwunsch avatar ralfschmieder avatar conformist89 avatar nshadskiy avatar

Watchers

James Cloos avatar  avatar Artur Gottmann avatar cheideck avatar  avatar  avatar  avatar

ntuple_processor's Issues

Improvement of unbalanced datasets in multiprocessing

As it was noticed during the last benchmark tests run, the treatment of unbalanced datasets is suboptimal when running with multiprocessing enabled if one of the RDataFrames is built on top of a dataset whose size is much bigger than the others, the worker that process it end up creating a bottleneck for the entire analysis. Several ways (to be investigated and implemented separately) can fix this issue:

  • combine the usage of multiprocessing and multithreading: detect in advance the larger datasets and split the workers that get to process these into multiple threads; in order not to increase the number of cores used, the overall number of workers decreases;
  • using only multiprocessing: detect in advance the larger datasets and split them into different RDataFrames, so that they are taken by different workers; the results can be easily merged at the end to get the proper histograms; this solution also requires something to check that the largest RDataFrames are the first ones sent to the workers.

Measure time in each process for the event loop

Put a time measurement in the _run_multiprocess function and return the time. this should then be available for the user to see how imbalanced the runs are. also, add a debug message in the _run_multiprocess function for that purpose. that would clear up many cases of "bad scaling".

Branch cleanup

If I see it correctly we have all changes that are in the abcd_method branch also in the dev branch that I would like to merge into the master branch to keep a clean setup of branches.

This would mean the abcd_method branch is obsolete and could be removed to clean up stale branches we don't plan to develop further anyway. In the past, we often opened branches and then never merged them into the main branch.

Could someone please check again if the code from the abcd_method branch is already included in the dev branch and comment then here. I would then delete the abcd_method branch.

Proposal: Use hashes as tags in names

We could use hashes a tags appended to the names of the results. Each has is unique for each path we go in the computation graphs, which means we can identify results with the same name but different paths. This should then throw a warning.

Level 2 optimization

When i attempt a lvl 2 optimization, I get an error message

Traceback (most recent call last):
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/shapes/produce_shapes_tauID.py", line 1071, in <module>
    main(args)
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/shapes/produce_shapes_tauID.py", line 1036, in main
    g_manager.optimize(args.optimization_level)
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/ntuple_processor/optimization.py", line 116, in optimize
    self.optimize_selections()
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/ntuple_processor/optimization.py", line 140, in optimize_selections
    self._merge_children(merged_graph)
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/ntuple_processor/optimization.py", line 165, in _merge_children
    self._merge_children(child)
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/ntuple_processor/optimization.py", line 165, in _merge_children
    self._merge_children(child)
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/ntuple_processor/optimization.py", line 165, in _merge_children
    self._merge_children(child)
  [Previous line repeated 18 more times]
  File "/work/sbrommer/smhtt_ul/tauID/smhtt_ul/ntuple_processor/optimization.py", line 157, in _merge_children
    if child not in merged_children:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

unclear how this happens.

Add error handling to variations

I'm missing basic error handling in the variations, e.g., if you want to use ReplaceCut, then you should throw an error if the requested cut is not found. Same goes for all other classes.

Bug in printout of sample validations

The sample validation of the crown inputs yields an output that looks like the check has successfully passed even though it did not run all. This happens because an empty dict is created and not filled when no check is required but the dict is then used for the printout independent of the requirement of the sample validation.
I think we should improve on this and make the output also conditional.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.