jacanchaplais / showerpipe Goto Github PK
View Code? Open in Web Editor NEWPythonic data pipeline for Monte-Carlo showering and hadronisation programs.
Home Page: https://showerpipe.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Pythonic data pipeline for Monte-Carlo showering and hadronisation programs.
Home Page: https://showerpipe.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Heparchy is a dependency over showerpipe anyway, and the cohesion makes more sense given heparchy is for IO.
Set up a jacanchaplais
channel on conda, and publish showerpipe. Ideally, submit it to conda-forge
after it has been successfully published.
Guide here:
https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-pkgs.html
PythiaGenerator
currently goes untested without using a LHE file. I suspect it would not work. This should be investigated and implemented.
Enable online learning with PyTorch compatible dataloaders.
These should use the same underlying observer pattern of the CLI, with options to write out asynchronously to disk, and more.
Also should provide an interface to stack pre-processing steps in a pipeline before exposing the data to NNs.
Provide API to parallel event generation.
May require I/O update to heparchy, to enable distributed data writing.
mpi4py is compatible with h5py parallel I/O, and gets around the GIL. However, unsure if I can integrate this easily into PyTorch dataloaders down the line.
Pipeline architecture ended up superseding the observer pattern #4. The functionality of providing more than one observer is made redundant by the branched pipe. Clean up and simplify the code, removing the traces of the original observer pattern.
Refactor lhe
module so that the output of routines is no longer bytestrings by default. Either output LheData
objects, or possibly io.BytesIO
objects (maybe make it so that everything using LheData, but even when bytestrings would be exported, always do this in a BytesIO buffer?).
Also look at object instantiation, and improve the type hinting.
Review the library and check where there are weakpoints. Running list is:
The current method of calculating the edges uses a fairly expensive and confusing set of pandas operations, tracking inconsistently sized numbers of parents / children in tuples, exploding them, etc.
It occurs to me that this may be unnecessary, as well as ugly and inefficient. An array representing the adjacency matrix were pre-allocated, and a simple mapping between particle ids and the rows / columns were established. Then, each particle could use its list of parents as a numpy fancy index, flipping the element values to True
on a given row. This would likely be much more efficient, and readable.
Efficiency could potentially be boosted further if used with scipy sparse arrays, or accelerated with numba (but I think numba may struggle with the jagged input, so preprocessing would probably be required).
Edit: acceleration using compiled libraries etc. has been abandoned for the simplicity and readability of a pure Python solution. This is easier to work with and understand, and is still 10ร than the original pre-optimal pandas solution.
Need to adjust observer pattern, so observers are actually pipelines.
Pipelines are constructed of sources, filters, and sinks. The subject in the original observer pattern is the source. The data structure's interface, passing through this pipeline, should be of the same type at all times.
A plugin architecture will allow users to create their own filters and sinks, although the package will ship with generic filters to transform and cluster the data, as well as generic sinks to write the data to disk, or create visualisations.
Pipelines will then be defined by users in yaml files, and passed to the CLI.
Currently the error is a XMLSyntaxError thrown by lxml - not very professional!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.