Giter Club home page Giter Club logo

showerpipe's People

Contributors

jacanchaplais avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

showerpipe's Issues

Implement no LHE runs

PythiaGenerator currently goes untested without using a LHE file. I suspect it would not work. This should be investigated and implemented.

PyTorch compatible dataloaders

Enable online learning with PyTorch compatible dataloaders.

These should use the same underlying observer pattern of the CLI, with options to write out asynchronously to disk, and more.

Also should provide an interface to stack pre-processing steps in a pipeline before exposing the data to NNs.

Parallel event generators

Provide API to parallel event generation.

May require I/O update to heparchy, to enable distributed data writing.

mpi4py is compatible with h5py parallel I/O, and gets around the GIL. However, unsure if I can integrate this easily into PyTorch dataloaders down the line.

Remove structure of original observer pattern

Pipeline architecture ended up superseding the observer pattern #4. The functionality of providing more than one observer is made redundant by the branched pipe. Clean up and simplify the code, removing the traces of the original observer pattern.

LHE module consistency updates

Refactor lhe module so that the output of routines is no longer bytestrings by default. Either output LheData objects, or possibly io.BytesIO objects (maybe make it so that everything using LheData, but even when bytestrings would be exported, always do this in a BytesIO buffer?).

Also look at object instantiation, and improve the type hinting.

Increase type annotation coverage

Review the library and check where there are weakpoints. Running list is:

  • investigate what "library stubs" are, and add them if applicable
  • allow PythiaGenerator to take a Path object for the pythia settings

Edge calculation efficiency

The current method of calculating the edges uses a fairly expensive and confusing set of pandas operations, tracking inconsistently sized numbers of parents / children in tuples, exploding them, etc.

It occurs to me that this may be unnecessary, as well as ugly and inefficient. An array representing the adjacency matrix were pre-allocated, and a simple mapping between particle ids and the rows / columns were established. Then, each particle could use its list of parents as a numpy fancy index, flipping the element values to True on a given row. This would likely be much more efficient, and readable.

Efficiency could potentially be boosted further if used with scipy sparse arrays, or accelerated with numba (but I think numba may struggle with the jagged input, so preprocessing would probably be required).

Edit: acceleration using compiled libraries etc. has been abandoned for the simplicity and readability of a pure Python solution. This is easier to work with and understand, and is still 10ร— than the original pre-optimal pandas solution.

Implement pipeline plugin architecture

Need to adjust observer pattern, so observers are actually pipelines.

Pipelines are constructed of sources, filters, and sinks. The subject in the original observer pattern is the source. The data structure's interface, passing through this pipeline, should be of the same type at all times.

A plugin architecture will allow users to create their own filters and sinks, although the package will ship with generic filters to transform and cluster the data, as well as generic sinks to write the data to disk, or create visualisations.

Pipelines will then be defined by users in yaml files, and passed to the CLI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.