jacanchaplais / graphicle Goto Github PK
View Code? Open in Web Editor NEWUtilities for representing high energy physics data as graphs / networks.
Home Page: https://graphicle.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Utilities for representing high energy physics data as graphs / networks.
Home Page: https://graphicle.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
This issue is a log of the interface updates which will be made between version 0.1 -> 0.2
.
.data
-> .values
to obtain underlying array dataPdgArray.name
attribute.from_numpy
classmethods, automatically handle conversions behind the scenescalculate
module should be able to attribute final state properties in terms of the flow of properties from their ancestors. This may be done via a vector diffusion from "sources" in a directed "flow graph", explored extensively for electricity grids, see https://iopscience.iop.org/article/10.1088/1367-2630/17/10/105002.
Implement a function to calculate "colour" vectors (no relation to QCD) of final state particles. These colour vectors express the ratio of a given property which a particle possesses as a linear combination of contributions from its direct ancestors. However, the basis of this colour vector shall be one-hot vectors, representing only a selected set of "particles of interest", eg. hard partons + background. The function is therefore a natural example of recursion, with each parent colour vector being expressed in terms of its parents colour vectors, until the particles forming the basis are reached.
Use functools.cache
to save the repeated computation.
Unfortunately fields with converter
functions in attrs
appears not to be well supported by pyright, which messes up intellisense (see microsoft/pyright#1782). Need to fix, as users will see incorrect warnings about expected inputs.
One idea could be to use a Union
between all allowed inputs, and then to create a getter method which is only annotated with the target type.
Implement &
, |
, and ~
operations for convenience on both MaskArray and MaskGroup. Might be worth making MaskGroup contain an enum specifying its default aggregate representation, and bitwise and is a little too opinionated. Would be good if it could be switched to bitwise or, if desired.
Centroid value for phi can be a bit confusing. Easy to implement, just sum the MomentumArray._xy_pol
attribute, and find the angle. Feel like this should be a function, rather than a method, since it wouldn't really make much sense for a full generation DAG.
Seems like a fundamental bit of functionality. Should just be a simple generalisation where the differences and conjugates occur on two arrays, rather than the same one. Result would be a square matrix (though not symmetric, right @GiorgioCerro?).
Make second MomentumArray an optional parameter, where the default behaviour is the same as the current behaviour, to prevent breaking changes.
Also be sure to check the number of elements are the same in both particle sets.
Can extract this information from Pythia. Integrate it into ParticleSet
.
StatusArray can also have method to provide hard mask, and retrieve the hard vertex.
I needed to install manually the following packages:
Update to work with graphicle objects.
It may be worth moving the functionality into the transform
module. However, I think having a dedicated module for converting graphicle
intrinsic data structures to numpy
arrays is a good idea for code cohesion and orthogonality.
MaskGroup should emulate dictionaries more closely, with iteration etc.
I would also like to add a check during subscripting to see if it is being passed a boolean array. If so, it should return a MaskGroup with each of the children masked by that array.
CuPy, CuDF and CuGraph appear to offer all functionality needed to perform the same operations on GPU. Use strategy pattern or the like to offer CPU or GPU data structures, and create GPU alternative algorithms.
Don't make it a dependency on any given package, but instead create a protocol which defines the interface that must be satisfied for the automatic initialisation of graphicle.Graphicle
objects. This way the projects don't need to be coupled, and other objects are compatible.
Potential heparchy issue if the data has not been stored, but can just use a try / except
block.
Allow access to the individual MaskArray objects.
graphicle.select.hard_descendants()
includes particles entering the same hadronisation vertex as the hard parton, even if they did not themselves descend from it. Fix this!
.data
was a poor choice, as numpy arrays have an attribute called .data
as well, which refers to their memoryview. This is quite an inconvenient change which will break backwards compatibility a lot, but I think it's best to do it sooner rather than later.
Minor issue, just thought I'd mention that currently, graphicle requires at Python version >3.8 to avoid the error Module 'functools' has no attribute 'cached_property'
, which didn't exist before 3.8. Apologies if this dependancy is already mentioned somewhere and I missed it.
Initially replace with numpy outer implementation, and then replace with numba.
There are a number of changes that would be worthwhile for the 0.2
release. These include the merging and / or renaming of modules, the removal of outdated or wrong routines, removing overzealous use of @property
decorators on data structures, etc.
Each of these will be documented as checkbox lists in the comments as and when they present themselves. Each of these will cause breaking changes, so do not merge any branches into main
, or include them as patch releases. This issue will be closed upon the release of version 0.2.0
.
Set up at least one unit test, along with CI/CD pipeline. Preferably using tox.
Add functionality to initialise from square 2D numpy
arrays, eg.
import graphicle as gcl
delta_R = gcl.matrices.delta_R(...) # returns float square matrix
adj = gcl.AdjacencyList.from_affinity(delta_R)
Two classmethods:
.from_affinity()
: floating point square array populates adj.edges
and adj.weights
*.from_adjacency()
: integer square array populates adj.edges
only* need to complete #13 to enable this.
Add option to perform flow tracing which averages over all hadronisation vertices.
graph.hard_mask
currently returns an empty MaskGroup if no status codes provided, which may be confusing for people who don't know it uses status codes. Either raise an exception or warning to let people know what's wrong.
Enable EdgeList.to_networkx()
to produce topological structure, even without data embedded.
Functionality is comprised of two parts:
Due to #14, to improve computational efficiency delta_R_aff
uses protected vector.Momentum4D
view on data. This dependency couples the function to the third party vendor, rather than using the graphicle interface to these calculation techniques.
Referred to positive p_val twice, when should have described effect of negative value.
graphicle/graphicle/calculate.py
Line 267 in a2719b4
Currently numpy arrays only get converted to MaskArrays when using the individual assignment, eg.
masks = MaskGroup()
masks['foo'] = np.array([True, True, False])
but not
masks = MaskGroup({
'foo': np.array([True, True, False]),
})
Consider replacing edge weights with edge embeddings, and node embeddings as well. Would provide ability to add custom data to graphs, which may be helpful for users, eg. if they want to store coordinate data.
Add this, and also throw an exception when non-graphicle objects are passed to the initialiser, since currently this is allowed and absolutely shouldn't be. That goes for all of the constructor methods for composite objects!
Use __future__.annotations
and typing.TYPE_CHECKING
to prevent unnecessary static typing imports.
delta_R_aff
uses a Pythonic for loop over the final state particles. May get a big speed boost by implementing the loop in Cython.
Explicitly handle the event that a user passes a numpy array to the edge_data
or node_data
fields.
Add dunders for rich comparison, eg. .__eq__()
for MaskBase
objects.
See the Python docs.
Weights for the edges, double precision.
Set the exports in __all__
for each module to prevent modules exporting scipy
namespace, etc.
eg. should prevent this documentation from being autogenerated.
Bug in delta_R_aff function.
Patch so that theta1 = 3.0 and theta2 = -3.0 results in delta_theta = 0.28, not 6.
Remove duplicates particles from the shower.
MaskArray
objects are missing proper docstrings, and also __setitem__()
dunders.
Duplicate removal implemented in #6 uses for loops and nested if statements. Vectorise and remove branching, where possible.
Implement the numpy.typing
module where possible.
Charge tracing appears to be hit and miss - I believe this is because if incident charges on a vertex cancel out, any charged products will carry no ancestry information.
Explore potential solution in propagating positive and negative charge separately, then adding contributions at the end.
Already can group particles by their parents in a topological sense, using graphicle.select.hard_descendants()
. At hadronisation vertices, the ancestry becomes mixed, so only using topological information of the generation DAG results particles outgoing from a hadronisation vertex getting equal attribution to each hard parton's descendants entering.
However, the final products of hadronisation form distinct clusters whose centres match the momenta of the hard partons incident to the hadronisation vertex.
Take descendants from DAG, and where there is overlap in ancestry, provide functionality to attribute particles to their nearest parent in delta R.
The interface and data representations for this need careful thought. Considerations are:
The speed of subscripting or slicing graphicle objects needs improvement. The reason for the reduced performance is probably due to the need to cast data upon setting attributes (and perhaps object instantiation to wrap the data).
Start by profiling the process to see what is happening.
Particle of generation graphs are represented as edges, each end of which is terminated by an interaction vertex. So, parents in this regime will be the incoming edges of the incoming vertex ID for an edge, and children will be the outgoing edges of the outgoing vertex ID.
If these graphs are directed, parents (children) will simply be the incoming (outgoing) vertex ID of edges incident on the node. If not, there isn't really a clear definition for this.
InteractionVertex
object
Graphicle
objects, with incoming Graphicle
and outgoing Graphicle
The foundation of these can be lifted from the heparchy project jacanchaplais/heparchy#1.
Further features to be determined.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.