Giter Club home page Giter Club logo

Comments (4)

ramankhurana avatar ramankhurana commented on June 13, 2024 1

Dear Jim

Thanks a lot for all the ideas. They are very helpful and enlightening. I must confess I did not thought of using cross product. I think the numba technique might be very useful for some other instances when I have no other choice other than explicitly looping, but in the present case I think I can work with awkward0 following second solution from old message.

Thanks a lot for quick response.

from awkward-0.x.

jpivarski avatar jpivarski commented on June 13, 2024

It's definitely possible, though I need to think of a better way to present it. It's not only needed in ATLAS, but also CMS NanoAOD has flat branches instead of TLorentzVector objects— worse still (for me), the NanoAOD has them as pT, eta, phi, and E, rather than cartesian coordinates. Cartesian coordinates can be loaded zero-copy into an array or jagged array of TLorentzVector objects, but cylindrical coordinates require a conversion.

Here's how to do it now (without tidying up the interface). Suppose you have four arrays representing px, py, pz, and E:

>>> import numpy
>>> px = numpy.array([1.1, 2.2, 3.3, 4.4])
>>> py = numpy.array([1.1, 2.2, 3.3, 4.4])
>>> pz = numpy.array([1.1, 2.2, 3.3, 4.4])
>>> E = numpy.array([10.1, 20.2, 30.3, 40.4])

To make them look like an array of TLorentzVector objects (though in fact, no conversion occurs), wrap them as a TLorentzVectorArray:

>>> import uproot_methods.classes.TLorentzVector
>>> flat = uproot_methods.classes.TLorentzVector.TLorentzVectorArray(px, py, pz, E)
>>> flat
<TLorentzVectorArray [TLorentzVector(1.1, 1.1, 1.1, 10.1)
                      TLorentzVector(2.2, 2.2, 2.2, 20.2)
                      TLorentzVector(3.3, 3.3, 3.3, 30.3)
                      TLorentzVector(4.4, 4.4, 4.4, 40.4)] at 7b31c798e828>

There are other constructors for different coordinate systems, such as TLorentzVectorArray.from_ptetaphim, but these perform a conversion, creating new internal arrays. The plain constructor above takes the arrays you give it without copying or modification.

As in the case of getting TLorentzVectors from ROOT files, this only creates the TLorentzVector Python class instances when __getitem__ on the array returns a single element, and you can do efficient calculations across the whole array.

>>> flat[0]
TLorentzVector(1.1, 1.1, 1.1, 10.1)
>>> flat[0].pt
1.5556349186104046
>>> flat.pt
array([1.55563492, 3.11126984, 4.66690476, 6.22253967])

Now if you want an array of TLorentzVectors that is also a jagged array, you need to create the class on the fly. All of the physics properties defined in uproot-methods are mix-ins that merely add new methods to existing classes. TLorentzVector.TLorentzVectorArray is a mix of TLorentzVector.ArrayMethods (defining pt, mass, boosts, etc.) and ObjectArray (generates Python class instances on __getitem__), the most common case, but in principle any array-like object (chunked arrays, masked arrays, sparse arrays, etc.) might want to have physics methods.

Here's how to make a jagged array of TLorentzVectors. It assumes that you have an array of counts ([3, 0, 1] in this example) that adds up to the total number of elements in the flat array. JaggedArray has other constructors named from*, and you can use any of them here. (Since those constructors are all @classmethods, they'll make JaggedTLorentzVectorArrays, rather than plain JaggedArrays.)

>>> # make the class
>>> JaggedTLorentzVectorArray = awkward.Methods.mixin(
...     uproot_methods.classes.TLorentzVector.ArrayMethods, awkward.JaggedArray)
>>> # make the object
>>> jagged = JaggedTLorentzVectorArray.fromcounts([3, 0, 1], flat)
>>> jagged
<JaggedArrayMethods [[TLorentzVector(1.1, 1.1, 1.1, 10.1)
                      TLorentzVector(2.2, 2.2, 2.2, 20.2)
                      TLorentzVector(3.3, 3.3, 3.3, 30.3)]
                     []
                     [TLorentzVector(4.4, 4.4, 4.4, 40.4)]] at 7b31cb8eec88>

Like flat, this jagged executes all of the physics methods efficiently across the underlying Numpy arrays. It only creates subarrays or Python instances when requested through __getitem__.

>>> jagged.pt
<JaggedArray [[1.55563492 3.11126984 4.66690476] [] [6.22253967]] at 7b31bc7ad8d0>
>>> jagged[0]
<TLorentzVectorArray [TLorentzVector(1.1, 1.1, 1.1, 10.1)
                      TLorentzVector(2.2, 2.2, 2.2, 20.2)
                      TLorentzVector(3.3, 3.3, 3.3, 30.3)] at 7b31bcac1748>
>>> jagged[0][0]
TLorentzVector(1.1, 1.1, 1.1, 10.1)

from awkward-0.x.

ramankhurana avatar ramankhurana commented on June 13, 2024

Hello,

(not really an issue, but more of a doubt/question)

I am new to uproot and stuck in following situation, suppose i have 2 electron and 3 jets and I want to clean my jets against electrons based on deltar, how can i do it without looping over jets or events, expecting a result in the form:

def deltar(eta1, eta2, phi1, phi2):
return ( numpy.sqrt((eta1-eta2)**2 + (phi1-phi2)**2))

dr = deltar(jeteta, eleeta, jetphi, elephi)

dr should be [[dr1,dr2,dr3], [dr4,dr5,dr6]] or [ [dr1,dr2], [dr3,dr4], [dr5,dr6]]

or even if i convert them in to TLorentzVectorArrays can i get the desired results, or it it not possible.

The problem is actually to subtract jeteta and eleeta, jaggedarrays don't know how to respond to this, it is possible to add something in my local code which can give me all possible delta eta.

Thanks.

from awkward-0.x.

jpivarski avatar jpivarski commented on June 13, 2024

The "loop" is a call to a function that produces per-element Cartesian products, which in Awkward 0 is called cross and in Awkward 1 is called ak.cartesian. You're using Awkward 0, which is no longer in development, but since you're also using Lorentz vectors, you'd need the vector package to get Lorentz vectors in Awkward 1, and I don't know if that's ready.

What cross/ak.cartesian does is it aligns two arrays element by element, and those elements have to be lists. Then it makes a new array of lists that are pairs, the first part of each pair is from the first array, the second part of each pair is from the second array, and they form all combinations of pairs drawn from the two lists, list by list.

>>> import awkward1 as ak
>>> first = ak.Array([[1, 2, 3], [], [4, 5]])
>>> second = ak.Array([["a", "b"], ["c"], ["d", "e"]])
>>> ak.cartesian([first, second]).tolist()
[[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')], [], [(4, 'd'), (4, 'e'), (5, 'd'), (5, 'e')]]
>>> ak.unzip(ak.cartesian([first, second]))
(<Array [[1, 1, 2, 2, 3, 3, ... [4, 4, 5, 5]] type='3 * var * int64'>,
 <Array [['a', 'b', 'a', ... 'e', 'd', 'e']] type='3 * var * string'>)

The value for jet cleaning is that this can make all electron-jet pairs in each event (without mixing events) so that you can calculate the ΔR and make choices about what to keep based on those values.

I found this from an old message about how to do it in Awkward 0:

def delta_r(pair):
    return pair.i0.delta_r(pair.i1)

combinations = events.jets.cross(events.electrons, nested=True)

events.jets[~(delta_r(combinations) < 0.5).any()]

The nested=True means to increase the depth of nesting by grouping pairs by their left-hand value. For instance,

>>> ak.cartesian([first, second]).tolist()                # same level of nesting as the input
[[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')],
 [],
 [(4, 'd'), (4, 'e'), (5, 'd'), (5, 'e')]]
>>> ak.cartesian([first, second], nested=True).tolist()   # new brackets around each unique *number*
[[[(1, 'a'), (1, 'b')], [(2, 'a'), (2, 'b')], [(3, 'a'), (3, 'b')]],
 [],
 [[(4, 'd'), (4, 'e')], [(5, 'd'), (5, 'e')]]]

Here's a complete example in Awkward 1 without Lorentz vectors. (I don't have a copy of Awkward 0 around anymore.)

>>> # we want to find the "one" values that are close to a "two" value
>>> one = ak.Array([[1.1, 2.1, 3.1], [], [4.1, 5.1]])
>>> two = ak.Array([[3.5, 1.3, 2.2, 0.9], [1.0], [5.1]])
>>> one_nested, two_nested = ak.unzip(ak.cartesian([one, two], nested=True))
>>> one_nested.tolist()
[[[1.1, 1.1, 1.1, 1.1], [2.1, 2.1, 2.1, 2.1], [3.1, 3.1, 3.1, 3.1]], [], [[4.1], [5.1]]]
>>> two_nested.tolist()
[[[3.5, 1.3, 2.2, 0.9], [3.5, 1.3, 2.2, 0.9], [3.5, 1.3, 2.2, 0.9]], [], [[5.1], [5.1]]]
>>> # instead of a ΔR cut, I'll just do an absolute difference
>>> abs(one_nested - two_nested).tolist()
[[[2.4, 0.2, 1.1, 0.2], [1.4, 0.8, 0.1, 1.2], [0.4, 1.8, 0.9, 2.2]], [], [[1.0], [0.0]]]
>>> # asking for the ones that are less than 0.2 gives me nested booleans
>>> (abs(one_nested - two_nested) < 0.2).tolist()
[[[False, True, False, False], [False, False, True, False], [False, False, False, False]], [], [[False], [True]]]
>>> # but we want to know if a first-level nesting has *any* matches; axis=-1 means "apply to deepest level"
>>> cut = ak.any(abs(one_nested - two_nested) < 0.2, axis=-1)
>>> cut.tolist()   # note: same depth and number of elements as "one"
[[True, True, False], [], [False, True]]
>>> # now we can apply this cut to "one", either directly (dropping items) or as a mask (replace with None)
>>> one
<Array [[1.1, 2.1, 3.1], [], [4.1, 5.1]] type='3 * var * float64'>
>>> one[cut]
<Array [[1.1, 2.1], [], [5.1]] type='3 * var * float64'>
>>> one.mask[cut]
<Array [[1.1, 2.1, None], [], [None, 5.1]] type='3 * var * ?float64'>

It is very, very useful to do this out interactively, with small examples, printing with .tolist() to see and understand the data before taking the next step.


As a completely different alternative, you could make the cut (above) using Numba. You have to use Awkward 1 to use Numba, but you can apply the cut that you make with Numba to an Awkward 0 array, working around the fact that vector might not be ready yet. The ak.from_awkward0/ak.to_awkward0 functions translate between the two libraries, but won't preserve Lorentz-ness.

Suppose you have electrons and jets that, after ak.from_awkward0, look like this:

>>> electrons = ak.Array([[{"x": 1, "y": 1, "z": 1, "t": 1},
...                        {"x": 2, "y": 2, "z": 2, "t": 2},
...                        {"x": 3, "y": 3, "z": 3, "t": 3}],
...                       [],
...                       [{"x": 4, "y": 4, "z": 4, "t": 4},
...                        {"x": 5, "y": 5, "z": 5, "t": 5}]])
>>> jets = ak.Array([[{"x": 1.1, "y": 1.1, "z": 1.1, "t": 1.1},
...                   {"x": 3.3, "y": 3.3, "z": 3.3, "t": 3.3}],
...                  [{"x": 999, "y": 999, "z": 999, "t": 999}],
...                  [{"x": 5.5, "y": 5.5, "z": 5.5, "t": 5.5},
...                   {"x": 4.4, "y": 4.4, "z": 4.4, "t": 4.4}]])

Then you can write a function with explicit loops over these structures, build a new Awkward array with the ak.ArrayBuilder, and compile that function with Numba so that there's no performance loss from the explicit loops.

>>> @nb.jit
... def make_cut(electrons, jets, builder):
...     for event_electrons, event_jets in zip(electrons, jets):
...         builder.begin_list()
...         for electron in event_electrons:
...             keep = False
...             for jet in event_jets:
...                 if abs(electron.x - jet.x) < 0.45:   # calculate ΔR here
...                     keep = True
...                     break
...             builder.append(keep)
...         builder.end_list()
...     return builder
... 
>>> cut = make_cut(electrons, jets, ak.ArrayBuilder()).snapshot()
>>> cut
<Array [[True, False, True], ... [True, False]] type='3 * var * bool'>
>>> cut.tolist()
[[True, False, True], [], [True, False]]
>>> electrons[cut].tolist()
[[{'x': 1, 'y': 1, 'z': 1, 't': 1}, {'x': 3, 'y': 3, 'z': 3, 't': 3}],
 [],
 [{'x': 4, 'y': 4, 'z': 4, 't': 4}]]

This new cut can be turned into an Awkward 0 array with

>>> ak.to_awkward0(cut)
<JaggedArray [[True False True] [] [True False]] at 0x7fd3c7d21d00>

and you can apply it to your Awkward 0 Lorentz vectors. But this is just a work-around until the vector library is ready and you can do everything in Awkward 1.

from awkward-0.x.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.