Giter Club home page Giter Club logo

graphicle's People

Contributors

giorgiocerro avatar jacanchaplais avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

graphicle's Issues

Interface updates

This issue is a log of the interface updates which will be made between version 0.1 -> 0.2.

  • .data -> .values to obtain underlying array data
  • object arrays -> tuple of strings for PdgArray.name attribute
  • scrap the .from_numpy classmethods, automatically handle conversions behind the scenes

Add flow tracing

calculate module should be able to attribute final state properties in terms of the flow of properties from their ancestors. This may be done via a vector diffusion from "sources" in a directed "flow graph", explored extensively for electricity grids, see https://iopscience.iop.org/article/10.1088/1367-2630/17/10/105002.

Implement a function to calculate "colour" vectors (no relation to QCD) of final state particles. These colour vectors express the ratio of a given property which a particle possesses as a linear combination of contributions from its direct ancestors. However, the basis of this colour vector shall be one-hot vectors, representing only a selected set of "particles of interest", eg. hard partons + background. The function is therefore a natural example of recursion, with each parent colour vector being expressed in terms of its parents colour vectors, until the particles forming the basis are reached.

Use functools.cache to save the repeated computation.

Type annotations from converter fields not correct

Unfortunately fields with converter functions in attrs appears not to be well supported by pyright, which messes up intellisense (see microsoft/pyright#1782). Need to fix, as users will see incorrect warnings about expected inputs.

One idea could be to use a Union between all allowed inputs, and then to create a getter method which is only annotated with the target type.

Add binary operations to MaskBase

Implement &, |, and ~ operations for convenience on both MaskArray and MaskGroup. Might be worth making MaskGroup contain an enum specifying its default aggregate representation, and bitwise and is a little too opinionated. Would be good if it could be switched to bitwise or, if desired.

Add function for getting the centroid from a MomentumArray

Centroid value for phi can be a bit confusing. Easy to implement, just sum the MomentumArray._xy_pol attribute, and find the angle. Feel like this should be a function, rather than a method, since it wouldn't really make much sense for a full generation DAG.

Allow the delta_R_aff to take two inputs

Seems like a fundamental bit of functionality. Should just be a simple generalisation where the differences and conjugates occur on two arrays, rather than the same one. Result would be a square matrix (though not symmetric, right @GiorgioCerro?).

Make second MomentumArray an optional parameter, where the default behaviour is the same as the current behaviour, to prevent breaking changes.

Also be sure to check the number of elements are the same in both particle sets.

Refactor matrices module with new data structures

Update to work with graphicle objects.

It may be worth moving the functionality into the transform module. However, I think having a dedicated module for converting graphicle intrinsic data structures to numpy arrays is a good idea for code cohesion and orthogonality.

Advanced subscripting over MaskGroup

MaskGroup should emulate dictionaries more closely, with iteration etc.

I would also like to add a check during subscripting to see if it is being passed a boolean array. If so, it should return a MaskGroup with each of the children masked by that array.

Add GPU compatibility using RAPIDS

CuPy, CuDF and CuGraph appear to offer all functionality needed to perform the same operations on GPU. Use strategy pattern or the like to offer CPU or GPU data structures, and create GPU alternative algorithms.

Add classmethod which reads from showerpipe / heparchy compatible interfaces

Don't make it a dependency on any given package, but instead create a protocol which defines the interface that must be satisfied for the automatic initialisation of graphicle.Graphicle objects. This way the projects don't need to be coupled, and other objects are compatible.

Potential heparchy issue if the data has not been stored, but can just use a try / except block.

Change numpy array getter from .data to .values

.data was a poor choice, as numpy arrays have an attribute called .data as well, which refers to their memoryview. This is quite an inconvenient change which will break backwards compatibility a lot, but I think it's best to do it sooner rather than later.

Dependency on Python >3.8

Minor issue, just thought I'd mention that currently, graphicle requires at Python version >3.8 to avoid the error Module 'functools' has no attribute 'cached_property', which didn't exist before 3.8. Apologies if this dependancy is already mentioned somewhere and I missed it.

API updates

There are a number of changes that would be worthwhile for the 0.2 release. These include the merging and / or renaming of modules, the removal of outdated or wrong routines, removing overzealous use of @property decorators on data structures, etc.

Each of these will be documented as checkbox lists in the comments as and when they present themselves. Each of these will cause breaking changes, so do not merge any branches into main, or include them as patch releases. This issue will be closed upon the release of version 0.2.0.

Unit testing

Set up at least one unit test, along with CI/CD pipeline. Preferably using tox.

Add classmethods to initialise AdjacencyList from matrices

Add functionality to initialise from square 2D numpy arrays, eg.

import graphicle as gcl


delta_R = gcl.matrices.delta_R(...)  # returns float square matrix
adj = gcl.AdjacencyList.from_affinity(delta_R)

Two classmethods:

  • .from_affinity(): floating point square array populates adj.edges and adj.weights *
  • .from_adjacency(): integer square array populates adj.edges only

* need to complete #13 to enable this.

delta_R_aff function coupled to third party vendor

Due to #14, to improve computational efficiency delta_R_aff uses protected vector.Momentum4D view on data. This dependency couples the function to the third party vendor, rather than using the graphicle interface to these calculation techniques.

Cast numpy arrays passed as dict to MaskGroup

Currently numpy arrays only get converted to MaskArrays when using the individual assignment, eg.

masks = MaskGroup()
masks['foo'] = np.array([True, True, False])

but not

masks = MaskGroup({
    'foo': np.array([True, True, False]),
})

Add node and edge embeddings

Consider replacing edge weights with edge embeddings, and node embeddings as well. Would provide ability to add custom data to graphs, which may be helpful for users, eg. if they want to store coordinate data.

ParticleSet does not have .from_numpy classmethod

Add this, and also throw an exception when non-graphicle objects are passed to the initialiser, since currently this is allowed and absolutely shouldn't be. That goes for all of the constructor methods for composite objects!

Cythonise delta_R_aff function

delta_R_aff uses a Pythonic for loop over the final state particles. May get a big speed boost by implementing the loop in Cython.

Fix flow_trace algorithm when dealing with charge

Charge tracing appears to be hit and miss - I believe this is because if incident charges on a vertex cancel out, any charged products will carry no ancestry information.

Explore potential solution in propagating positive and negative charge separately, then adding contributions at the end.

Add ability to tag particles

Already can group particles by their parents in a topological sense, using graphicle.select.hard_descendants(). At hadronisation vertices, the ancestry becomes mixed, so only using topological information of the generation DAG results particles outgoing from a hadronisation vertex getting equal attribution to each hard parton's descendants entering.

However, the final products of hadronisation form distinct clusters whose centres match the momenta of the hard partons incident to the hadronisation vertex.

Take descendants from DAG, and where there is overlap in ancestry, provide functionality to attribute particles to their nearest parent in delta R.

The interface and data representations for this need careful thought. Considerations are:

  1. overlap only occurs after hadronisation
  2. hadronisation vertices are clearly important, how should these be represented?
  3. hard partons may have descendants incident on multiple hadronisation vertices: should this be transparent in the heritage represented using the masks?
  4. when a W decays into quarks, the quarks are also part of the hard process and may produce distinct jets: should MaskGroups be used to represent the composite nature?

Subscripting speed not fast

The speed of subscripting or slicing graphicle objects needs improvement. The reason for the reduced performance is probably due to the need to cast data upon setting attributes (and perhaps object instantiation to wrap the data).

Start by profiling the process to see what is happening.

Add way of obtaining children and parents of a given particle

Problem statement

Generation graphs: particles as edges

Particle of generation graphs are represented as edges, each end of which is terminated by an interaction vertex. So, parents in this regime will be the incoming edges of the incoming vertex ID for an edge, and children will be the outgoing edges of the outgoing vertex ID.

Final state graphs: particles as nodes

If these graphs are directed, parents (children) will simply be the incoming (outgoing) vertex ID of edges incident on the node. If not, there isn't really a clear definition for this.

Ideas

  • Generation graphs and final state graphs may benefit from distinct identities
    • Could use a Factory or Strategy Pattern
    • Strategy Pattern may result in less duplication
  • Could implement an InteractionVertex object
    • Already have numerical IDs for each vertex (defined in the node)
    • Would naturally lead to idea of parentage
    • Could make it a composite of Graphicle objects, with incoming Graphicle and outgoing Graphicle
    • Very powerful, but seems expensive

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.