Comments (10)
It's designed to let you do mu_pt_sel * np.sin(mu_phi_sel)
, did you try it?
The starts
can change without changing the number of items per subarray. Masks do as little work as possible, dropping items from starts
and stops
but without consolidating the content
. Numpy ufunc application consolidates the content, if necessary, so that the ufunc can be applied directly to the contents of two arrays.
Their physical starts
and stops
may adjust to get aligned, but their logical counts
should be unchanged, so that there's still the right number of mu_pt_sel
for each np.sin(mu_phi_sel)
.
from awkward-0.x.
There's a theoretical overview of jaggedness representations here.
If mu_pt_sel * np.sin(mu_phi_sel)
doesn't actually work, that would be a bug. I'd need to see a minimal example to debug it, though.
Also, note that (s.counts == 2) & (s.sum == 2)
means "events that have exactly two muons and those two muons pass the cut." In HEP, I'm more familiar with s.sum() >= 2
for "events in which at least two muons pass the cut."
from awkward-0.x.
Thanks for the quick reply! Unfortunately mu_pt_sel * np.sin(mu_phi_sel)
doesn't work, I'm using awkward.__version__ = '0.6.0'
mu_pt_sel * np.sin(mu_phi_sel)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-179-2415a9b63e15> in <module>
----> 1 mu_pt_sel * np.sin(mu_phi_sel)
/usr/local/lib/python3.5/dist-packages/numpy/lib/mixins.py in func(self, other)
23 if _disables_array_ufunc(other):
24 return NotImplemented
---> 25 return ufunc(self, other)
26 func.__name__ = '__{}__'.format(name)
27 return func
/usr/local/lib/python3.5/dist-packages/awkward/array/jagged.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
778 starts, stops = inputs[i]._starts, inputs[i]._stops
779 else:
--> 780 inputs[i] = inputs[i]._tojagged(starts, stops, copy=False)
781
782 elif isinstance(inputs[i], (awkward.util.numpy.ndarray, awkward.array.base.AwkwardArray)):
/usr/local/lib/python3.5/dist-packages/awkward/array/jagged.py in _tojagged(self, starts, stops, copy)
757 index = self._starts[parents]
758 index += increase
--> 759 out = self.copy(starts=starts, stops=stops, content=self._content[index])
760 out._parents = parents
761 return out
IndexError: index 853180 is out of bounds for axis 1 with size 853176
Here's the minimal example, the input file is available at https://cernbox.cern.ch/index.php/s/YnqZg3XeZWrqQRk
.
import uproot
from awkward.util import numpy as anp
tt = uproot.open("E07AC933-F342-E811-A68D-3417EBE5354A.root").get("Events")
arrs = tt.arrays(["nMuon", "Muon_pt", "Muon_eta", "Muon_phi", "Muon_mass"])
mu_pt = arrs[b'Muon_pt']
mu_phi = arrs[b'Muon_phi']
xx = (mu_pt>20)
dimuon = (xx.counts == 2) & (xx.sum()==2)
mu_pt_sel = mu_pt[dimuon]
mu_phi_sel = mu_phi[dimuon]
mu_pt_sel * np.sin(mu_phi_sel)
I'm indeed looking for events with exactly two muons at this moment. What I actually want to do later is select the two passing muons and use only those from all events that have them.
from awkward-0.x.
Can you make that file visible to me? (CERN username "pivarski")
I have 5 minutes to download it before I have to catch a train. If not, I'll have to fix this later. It would be too bad: I want to release 0.7.0 (I know, a big jump), and it would be nice to include this fix. Otherwise, it would be a quick 0.7.1 follow-up (with uproot depending on >= 0.7.0).
from awkward-0.x.
The link is here: https://cernbox.cern.ch/index.php/s/YnqZg3XeZWrqQRk
If you can give some hints I can try to debug too.
from awkward-0.x.
Downloading now... The progress bar doesn't tell me how big it will be, so I might have to cut it off and run to catch my train.
What I suspect is an error in _tojagged
. That's what changes the physical layout of a jagged array to a different physical layout with the same logical structure. I recently generalized that function, so if this is a new error, it could be there. (We have to understand the new implementation; the old one is insufficient and had its own bugs.)
I'm going to look for the first differing element and understand why things go wrong there.
Or maybe not! I don't think it's going to finish downloading before I have to go (it's at 300 MB).
from awkward-0.x.
I'll have to do it tomorrow. But what I'll do is make the current release 0.6.1, fix this bug, and then release that as 0.7.0 with uproot taking it as a new minimum version.
from awkward-0.x.
Here's a more minimal example (still working on it):
import numpy, awkward
a = awkward.JaggedArray.fromcounts([2, 2, 2, 1, 2, 2], numpy.round(numpy.arange(0, 12, 1.1), 2))
m = numpy.array([True, True, True, False, False, True])
a[m] + (a[m] + 0)
This last line raises the exception and it's small enough to try to diagnose. (It's based on the first six events in your sample. Six because that's how many get printed to the screen without ...
)
from awkward-0.x.
Okay, that was a silly oversight: the problem didn't run as deep as I worried it might.
In the specification, there's a full description of how an offsets
array can describe dense content
(no gaps between subarrays) but you need the full generality of starts/stops
to describe content
with gaps (unreachable data) between the subarrays. When we do a mask, it's quickest to only filter the starts
and stops
arrays, leaving the content
untouched, which leads to gaps. If the data are very deeply nested, filtering does not trigger a recursive collapse of nested data (which could involve hundreds of columns, even columns that haven't been loaded from diskβ it's a big deal).
However, the logical array described by starts
and stops
with gaps is not uniquely described by that combination of starts
, stops
, and content
. Different physical combinations can have different numbers of unreachable elements. When you're doing mathematical operations on these arrays, you care about what the array logically means, not how it's physically arranged. So if you apply a Numpy operation on two arrays with the same logical structure but different physical structure, the second one was rearranged to have the same physical structure as the first.
The bug happened for a[m] + (a[m] + 0)
but not (a[m] + 0) + a[m]
: two Numpy calls (both numpy.add
) on a[m]
, which has gaps. The first addition collapsed the array to have no gaps, and the second addition tried to combine an array with gaps with an array without gaps. Since the array with gaps came first, it tried to add gaps to the second. The _tojagged
code didn't assume that you'd ever want to do that and it broke.
With this change, I inserted a blanket policy of always fully collapsing all arrays that go into a Numpy call, regardless of whether we really need to. Then the restructurings only collapse, never expand (the fully collapsed form is unique). This has an added bonus that math will only be calculated on the reachable elements, so it effectively becomes non-lazy when you use it for math.
from awkward-0.x.
This is awkward 0.6.2, soon to be replaced by 0.7.0, but you can try it now.
from awkward-0.x.
Related Issues (20)
- dynamically created methods are confusing for users HOT 1
- Achieve masking HOT 8
- AssertionError when Table is part of a list HOT 5
- Potential bug with subsequent masking HOT 2
- Reduction of empty elements HOT 2
- TLorentzVectorArray yields different values depending on masking order HOT 4
- IndexError when masking empty jaggedArray made from offsets HOT 1
- awkward method names HOT 9
- TypeError when using array.mean(weights) HOT 2
- Cyclic array? HOT 1
- broken link in readme HOT 1
- Installing awkward-numba in usermode breaks awkward HOT 3
- Syntax warning due to comparison of literals using is in Python 3.8 HOT 1
- Inconsistent Filesizes with .awkd Files HOT 6
- Bug in string comparison in StringArray HOT 1
- mean, std fail on ChunkedArrays HOT 1
- AttributeError when trying to read a particular format of awkward array HOT 5
- JaggedArray.fromiter() functions fails for python lists HOT 2
- Small detail; broadcasting seems to work a little different to what is implied in the documentation. HOT 6
- Accumulate numpy arrays inside the loop HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from awkward-0.x.