Comments (4)
Dear Jim
Thanks a lot for all the ideas. They are very helpful and enlightening. I must confess I did not thought of using cross product. I think the numba technique might be very useful for some other instances when I have no other choice other than explicitly looping, but in the present case I think I can work with awkward0 following second solution from old message.
Thanks a lot for quick response.
from awkward-0.x.
It's definitely possible, though I need to think of a better way to present it. It's not only needed in ATLAS, but also CMS NanoAOD has flat branches instead of TLorentzVector objects— worse still (for me), the NanoAOD has them as pT, eta, phi, and E, rather than cartesian coordinates. Cartesian coordinates can be loaded zero-copy into an array or jagged array of TLorentzVector objects, but cylindrical coordinates require a conversion.
Here's how to do it now (without tidying up the interface). Suppose you have four arrays representing px, py, pz, and E:
>>> import numpy
>>> px = numpy.array([1.1, 2.2, 3.3, 4.4])
>>> py = numpy.array([1.1, 2.2, 3.3, 4.4])
>>> pz = numpy.array([1.1, 2.2, 3.3, 4.4])
>>> E = numpy.array([10.1, 20.2, 30.3, 40.4])
To make them look like an array of TLorentzVector objects (though in fact, no conversion occurs), wrap them as a TLorentzVectorArray
:
>>> import uproot_methods.classes.TLorentzVector
>>> flat = uproot_methods.classes.TLorentzVector.TLorentzVectorArray(px, py, pz, E)
>>> flat
<TLorentzVectorArray [TLorentzVector(1.1, 1.1, 1.1, 10.1)
TLorentzVector(2.2, 2.2, 2.2, 20.2)
TLorentzVector(3.3, 3.3, 3.3, 30.3)
TLorentzVector(4.4, 4.4, 4.4, 40.4)] at 7b31c798e828>
There are other constructors for different coordinate systems, such as TLorentzVectorArray.from_ptetaphim
, but these perform a conversion, creating new internal arrays. The plain constructor above takes the arrays you give it without copying or modification.
As in the case of getting TLorentzVectors
from ROOT files, this only creates the TLorentzVector
Python class instances when __getitem__
on the array returns a single element, and you can do efficient calculations across the whole array.
>>> flat[0]
TLorentzVector(1.1, 1.1, 1.1, 10.1)
>>> flat[0].pt
1.5556349186104046
>>> flat.pt
array([1.55563492, 3.11126984, 4.66690476, 6.22253967])
Now if you want an array of TLorentzVectors
that is also a jagged array, you need to create the class on the fly. All of the physics properties defined in uproot-methods are mix-ins that merely add new methods to existing classes. TLorentzVector.TLorentzVectorArray
is a mix of TLorentzVector.ArrayMethods
(defining pt, mass, boosts, etc.) and ObjectArray
(generates Python class instances on __getitem__
), the most common case, but in principle any array-like object (chunked arrays, masked arrays, sparse arrays, etc.) might want to have physics methods.
Here's how to make a jagged array of TLorentzVectors. It assumes that you have an array of counts
([3, 0, 1]
in this example) that adds up to the total number of elements in the flat array. JaggedArray
has other constructors named from*
, and you can use any of them here. (Since those constructors are all @classmethods
, they'll make JaggedTLorentzVectorArrays
, rather than plain JaggedArrays
.)
>>> # make the class
>>> JaggedTLorentzVectorArray = awkward.Methods.mixin(
... uproot_methods.classes.TLorentzVector.ArrayMethods, awkward.JaggedArray)
>>> # make the object
>>> jagged = JaggedTLorentzVectorArray.fromcounts([3, 0, 1], flat)
>>> jagged
<JaggedArrayMethods [[TLorentzVector(1.1, 1.1, 1.1, 10.1)
TLorentzVector(2.2, 2.2, 2.2, 20.2)
TLorentzVector(3.3, 3.3, 3.3, 30.3)]
[]
[TLorentzVector(4.4, 4.4, 4.4, 40.4)]] at 7b31cb8eec88>
Like flat
, this jagged
executes all of the physics methods efficiently across the underlying Numpy arrays. It only creates subarrays or Python instances when requested through __getitem__
.
>>> jagged.pt
<JaggedArray [[1.55563492 3.11126984 4.66690476] [] [6.22253967]] at 7b31bc7ad8d0>
>>> jagged[0]
<TLorentzVectorArray [TLorentzVector(1.1, 1.1, 1.1, 10.1)
TLorentzVector(2.2, 2.2, 2.2, 20.2)
TLorentzVector(3.3, 3.3, 3.3, 30.3)] at 7b31bcac1748>
>>> jagged[0][0]
TLorentzVector(1.1, 1.1, 1.1, 10.1)
from awkward-0.x.
Hello,
(not really an issue, but more of a doubt/question)
I am new to uproot and stuck in following situation, suppose i have 2 electron and 3 jets and I want to clean my jets against electrons based on deltar, how can i do it without looping over jets or events, expecting a result in the form:
def deltar(eta1, eta2, phi1, phi2):
return ( numpy.sqrt((eta1-eta2)**2 + (phi1-phi2)**2))
dr = deltar(jeteta, eleeta, jetphi, elephi)
dr should be [[dr1,dr2,dr3], [dr4,dr5,dr6]] or [ [dr1,dr2], [dr3,dr4], [dr5,dr6]]
or even if i convert them in to TLorentzVectorArrays can i get the desired results, or it it not possible.
The problem is actually to subtract jeteta and eleeta, jaggedarrays don't know how to respond to this, it is possible to add something in my local code which can give me all possible delta eta.
Thanks.
from awkward-0.x.
The "loop" is a call to a function that produces per-element Cartesian products, which in Awkward 0 is called cross
and in Awkward 1 is called ak.cartesian. You're using Awkward 0, which is no longer in development, but since you're also using Lorentz vectors, you'd need the vector package to get Lorentz vectors in Awkward 1, and I don't know if that's ready.
What cross
/ak.cartesian does is it aligns two arrays element by element, and those elements have to be lists. Then it makes a new array of lists that are pairs, the first part of each pair is from the first array, the second part of each pair is from the second array, and they form all combinations of pairs drawn from the two lists, list by list.
>>> import awkward1 as ak
>>> first = ak.Array([[1, 2, 3], [], [4, 5]])
>>> second = ak.Array([["a", "b"], ["c"], ["d", "e"]])
>>> ak.cartesian([first, second]).tolist()
[[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')], [], [(4, 'd'), (4, 'e'), (5, 'd'), (5, 'e')]]
>>> ak.unzip(ak.cartesian([first, second]))
(<Array [[1, 1, 2, 2, 3, 3, ... [4, 4, 5, 5]] type='3 * var * int64'>,
<Array [['a', 'b', 'a', ... 'e', 'd', 'e']] type='3 * var * string'>)
The value for jet cleaning is that this can make all electron-jet pairs in each event (without mixing events) so that you can calculate the ΔR and make choices about what to keep based on those values.
I found this from an old message about how to do it in Awkward 0:
def delta_r(pair):
return pair.i0.delta_r(pair.i1)
combinations = events.jets.cross(events.electrons, nested=True)
events.jets[~(delta_r(combinations) < 0.5).any()]
The nested=True
means to increase the depth of nesting by grouping pairs by their left-hand value. For instance,
>>> ak.cartesian([first, second]).tolist() # same level of nesting as the input
[[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')],
[],
[(4, 'd'), (4, 'e'), (5, 'd'), (5, 'e')]]
>>> ak.cartesian([first, second], nested=True).tolist() # new brackets around each unique *number*
[[[(1, 'a'), (1, 'b')], [(2, 'a'), (2, 'b')], [(3, 'a'), (3, 'b')]],
[],
[[(4, 'd'), (4, 'e')], [(5, 'd'), (5, 'e')]]]
Here's a complete example in Awkward 1 without Lorentz vectors. (I don't have a copy of Awkward 0 around anymore.)
>>> # we want to find the "one" values that are close to a "two" value
>>> one = ak.Array([[1.1, 2.1, 3.1], [], [4.1, 5.1]])
>>> two = ak.Array([[3.5, 1.3, 2.2, 0.9], [1.0], [5.1]])
>>> one_nested, two_nested = ak.unzip(ak.cartesian([one, two], nested=True))
>>> one_nested.tolist()
[[[1.1, 1.1, 1.1, 1.1], [2.1, 2.1, 2.1, 2.1], [3.1, 3.1, 3.1, 3.1]], [], [[4.1], [5.1]]]
>>> two_nested.tolist()
[[[3.5, 1.3, 2.2, 0.9], [3.5, 1.3, 2.2, 0.9], [3.5, 1.3, 2.2, 0.9]], [], [[5.1], [5.1]]]
>>> # instead of a ΔR cut, I'll just do an absolute difference
>>> abs(one_nested - two_nested).tolist()
[[[2.4, 0.2, 1.1, 0.2], [1.4, 0.8, 0.1, 1.2], [0.4, 1.8, 0.9, 2.2]], [], [[1.0], [0.0]]]
>>> # asking for the ones that are less than 0.2 gives me nested booleans
>>> (abs(one_nested - two_nested) < 0.2).tolist()
[[[False, True, False, False], [False, False, True, False], [False, False, False, False]], [], [[False], [True]]]
>>> # but we want to know if a first-level nesting has *any* matches; axis=-1 means "apply to deepest level"
>>> cut = ak.any(abs(one_nested - two_nested) < 0.2, axis=-1)
>>> cut.tolist() # note: same depth and number of elements as "one"
[[True, True, False], [], [False, True]]
>>> # now we can apply this cut to "one", either directly (dropping items) or as a mask (replace with None)
>>> one
<Array [[1.1, 2.1, 3.1], [], [4.1, 5.1]] type='3 * var * float64'>
>>> one[cut]
<Array [[1.1, 2.1], [], [5.1]] type='3 * var * float64'>
>>> one.mask[cut]
<Array [[1.1, 2.1, None], [], [None, 5.1]] type='3 * var * ?float64'>
It is very, very useful to do this out interactively, with small examples, printing with .tolist()
to see and understand the data before taking the next step.
As a completely different alternative, you could make the cut
(above) using Numba. You have to use Awkward 1 to use Numba, but you can apply the cut
that you make with Numba to an Awkward 0 array, working around the fact that vector might not be ready yet. The ak.from_awkward0/ak.to_awkward0 functions translate between the two libraries, but won't preserve Lorentz-ness.
Suppose you have electrons and jets that, after ak.from_awkward0
, look like this:
>>> electrons = ak.Array([[{"x": 1, "y": 1, "z": 1, "t": 1},
... {"x": 2, "y": 2, "z": 2, "t": 2},
... {"x": 3, "y": 3, "z": 3, "t": 3}],
... [],
... [{"x": 4, "y": 4, "z": 4, "t": 4},
... {"x": 5, "y": 5, "z": 5, "t": 5}]])
>>> jets = ak.Array([[{"x": 1.1, "y": 1.1, "z": 1.1, "t": 1.1},
... {"x": 3.3, "y": 3.3, "z": 3.3, "t": 3.3}],
... [{"x": 999, "y": 999, "z": 999, "t": 999}],
... [{"x": 5.5, "y": 5.5, "z": 5.5, "t": 5.5},
... {"x": 4.4, "y": 4.4, "z": 4.4, "t": 4.4}]])
Then you can write a function with explicit loops over these structures, build a new Awkward array with the ak.ArrayBuilder, and compile that function with Numba so that there's no performance loss from the explicit loops.
>>> @nb.jit
... def make_cut(electrons, jets, builder):
... for event_electrons, event_jets in zip(electrons, jets):
... builder.begin_list()
... for electron in event_electrons:
... keep = False
... for jet in event_jets:
... if abs(electron.x - jet.x) < 0.45: # calculate ΔR here
... keep = True
... break
... builder.append(keep)
... builder.end_list()
... return builder
...
>>> cut = make_cut(electrons, jets, ak.ArrayBuilder()).snapshot()
>>> cut
<Array [[True, False, True], ... [True, False]] type='3 * var * bool'>
>>> cut.tolist()
[[True, False, True], [], [True, False]]
>>> electrons[cut].tolist()
[[{'x': 1, 'y': 1, 'z': 1, 't': 1}, {'x': 3, 'y': 3, 'z': 3, 't': 3}],
[],
[{'x': 4, 'y': 4, 'z': 4, 't': 4}]]
This new cut
can be turned into an Awkward 0 array with
>>> ak.to_awkward0(cut)
<JaggedArray [[True False True] [] [True False]] at 0x7fd3c7d21d00>
and you can apply it to your Awkward 0 Lorentz vectors. But this is just a work-around until the vector library is ready and you can do everything in Awkward 1.
from awkward-0.x.
Related Issues (20)
- dynamically created methods are confusing for users HOT 1
- Achieve masking HOT 8
- AssertionError when Table is part of a list HOT 5
- Potential bug with subsequent masking HOT 2
- Reduction of empty elements HOT 2
- TLorentzVectorArray yields different values depending on masking order HOT 4
- IndexError when masking empty jaggedArray made from offsets HOT 1
- awkward method names HOT 9
- TypeError when using array.mean(weights) HOT 2
- Cyclic array? HOT 1
- broken link in readme HOT 1
- Installing awkward-numba in usermode breaks awkward HOT 3
- Syntax warning due to comparison of literals using is in Python 3.8 HOT 1
- Inconsistent Filesizes with .awkd Files HOT 6
- Bug in string comparison in StringArray HOT 1
- mean, std fail on ChunkedArrays HOT 1
- AttributeError when trying to read a particular format of awkward array HOT 5
- JaggedArray.fromiter() functions fails for python lists HOT 2
- Small detail; broadcasting seems to work a little different to what is implied in the documentation. HOT 6
- Accumulate numpy arrays inside the loop HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from awkward-0.x.