I wonder if this is my misunderstanding of the API, but I'm running in to the followin

There's a theoretical overview of jaggedness representations <a href="https://github.c

The link is here: <a href="https://cernbox.cern.ch/index.php/s/YnqZg3XeZWrqQRk" rel="n

Here's a more minimal example (still working on it): <div class="highlight highlig

numpy functions on jagged arrays don't produce compatible arrays about awkward-0.x HOT 10 CLOSED

jpata commented on June 3, 2024

numpy functions on jagged arrays don't produce compatible arrays

from awkward-0.x.

Comments (10)

jpivarski commented on June 3, 2024

It's designed to let you do mu_pt_sel * np.sin(mu_phi_sel), did you try it?

The starts can change without changing the number of items per subarray. Masks do as little work as possible, dropping items from starts and stops but without consolidating the content. Numpy ufunc application consolidates the content, if necessary, so that the ufunc can be applied directly to the contents of two arrays.

Their physical starts and stops may adjust to get aligned, but their logical counts should be unchanged, so that there's still the right number of mu_pt_sel for each np.sin(mu_phi_sel).

from awkward-0.x.

jpivarski commented on June 3, 2024

There's a theoretical overview of jaggedness representations here.

If mu_pt_sel * np.sin(mu_phi_sel) doesn't actually work, that would be a bug. I'd need to see a minimal example to debug it, though.

Also, note that (s.counts == 2) & (s.sum == 2) means "events that have exactly two muons and those two muons pass the cut." In HEP, I'm more familiar with s.sum() >= 2 for "events in which at least two muons pass the cut."

from awkward-0.x.

jpata commented on June 3, 2024

Thanks for the quick reply! Unfortunately mu_pt_sel * np.sin(mu_phi_sel) doesn't work, I'm using awkward.__version__ = '0.6.0'

mu_pt_sel * np.sin(mu_phi_sel)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-179-2415a9b63e15> in <module>
----> 1 mu_pt_sel * np.sin(mu_phi_sel)

/usr/local/lib/python3.5/dist-packages/numpy/lib/mixins.py in func(self, other)
     23         if _disables_array_ufunc(other):
     24             return NotImplemented
---> 25         return ufunc(self, other)
     26     func.__name__ = '__{}__'.format(name)
     27     return func

/usr/local/lib/python3.5/dist-packages/awkward/array/jagged.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    778                     starts, stops = inputs[i]._starts, inputs[i]._stops
    779                 else:
--> 780                     inputs[i] = inputs[i]._tojagged(starts, stops, copy=False)
    781 
    782             elif isinstance(inputs[i], (awkward.util.numpy.ndarray, awkward.array.base.AwkwardArray)):

/usr/local/lib/python3.5/dist-packages/awkward/array/jagged.py in _tojagged(self, starts, stops, copy)
    757             index = self._starts[parents]
    758             index += increase
--> 759             out = self.copy(starts=starts, stops=stops, content=self._content[index])
    760             out._parents = parents
    761             return out

IndexError: index 853180 is out of bounds for axis 1 with size 853176

Here's the minimal example, the input file is available at https://cernbox.cern.ch/index.php/s/YnqZg3XeZWrqQRk.

import uproot
from awkward.util import numpy as anp
tt = uproot.open("E07AC933-F342-E811-A68D-3417EBE5354A.root").get("Events")
arrs = tt.arrays(["nMuon", "Muon_pt", "Muon_eta", "Muon_phi", "Muon_mass"])

mu_pt = arrs[b'Muon_pt']
mu_phi = arrs[b'Muon_phi']

xx = (mu_pt>20)
dimuon = (xx.counts == 2) & (xx.sum()==2)

mu_pt_sel = mu_pt[dimuon]
mu_phi_sel = mu_phi[dimuon]
mu_pt_sel * np.sin(mu_phi_sel)

I'm indeed looking for events with exactly two muons at this moment. What I actually want to do later is select the two passing muons and use only those from all events that have them.

from awkward-0.x.

jpivarski commented on June 3, 2024

Can you make that file visible to me? (CERN username "pivarski")

I have 5 minutes to download it before I have to catch a train. If not, I'll have to fix this later. It would be too bad: I want to release 0.7.0 (I know, a big jump), and it would be nice to include this fix. Otherwise, it would be a quick 0.7.1 follow-up (with uproot depending on >= 0.7.0).

from awkward-0.x.

jpata commented on June 3, 2024

The link is here: https://cernbox.cern.ch/index.php/s/YnqZg3XeZWrqQRk
If you can give some hints I can try to debug too.

from awkward-0.x.

jpivarski commented on June 3, 2024

Downloading now... The progress bar doesn't tell me how big it will be, so I might have to cut it off and run to catch my train.

What I suspect is an error in _tojagged. That's what changes the physical layout of a jagged array to a different physical layout with the same logical structure. I recently generalized that function, so if this is a new error, it could be there. (We have to understand the new implementation; the old one is insufficient and had its own bugs.)

I'm going to look for the first differing element and understand why things go wrong there.

Or maybe not! I don't think it's going to finish downloading before I have to go (it's at 300 MB).

from awkward-0.x.

jpivarski commented on June 3, 2024

I'll have to do it tomorrow. But what I'll do is make the current release 0.6.1, fix this bug, and then release that as 0.7.0 with uproot taking it as a new minimum version.

from awkward-0.x.

jpivarski commented on June 3, 2024

Here's a more minimal example (still working on it):

import numpy, awkward

a = awkward.JaggedArray.fromcounts([2, 2, 2, 1, 2, 2], numpy.round(numpy.arange(0, 12, 1.1), 2))
m = numpy.array([True, True, True, False, False, True])
a[m] + (a[m] + 0)

This last line raises the exception and it's small enough to try to diagnose. (It's based on the first six events in your sample. Six because that's how many get printed to the screen without ...)

from awkward-0.x.

jpivarski commented on June 3, 2024

Okay, that was a silly oversight: the problem didn't run as deep as I worried it might.

In the specification, there's a full description of how an offsets array can describe dense content (no gaps between subarrays) but you need the full generality of starts/stops to describe content with gaps (unreachable data) between the subarrays. When we do a mask, it's quickest to only filter the starts and stops arrays, leaving the content untouched, which leads to gaps. If the data are very deeply nested, filtering does not trigger a recursive collapse of nested data (which could involve hundreds of columns, even columns that haven't been loaded from disk— it's a big deal).

However, the logical array described by starts and stops with gaps is not uniquely described by that combination of starts, stops, and content. Different physical combinations can have different numbers of unreachable elements. When you're doing mathematical operations on these arrays, you care about what the array logically means, not how it's physically arranged. So if you apply a Numpy operation on two arrays with the same logical structure but different physical structure, the second one was rearranged to have the same physical structure as the first.

The bug happened for a[m] + (a[m] + 0) but not (a[m] + 0) + a[m]: two Numpy calls (both numpy.add) on a[m], which has gaps. The first addition collapsed the array to have no gaps, and the second addition tried to combine an array with gaps with an array without gaps. Since the array with gaps came first, it tried to add gaps to the second. The _tojagged code didn't assume that you'd ever want to do that and it broke.

With this change, I inserted a blanket policy of always fully collapsing all arrays that go into a Numpy call, regardless of whether we really need to. Then the restructurings only collapse, never expand (the fully collapsed form is unique). This has an added bonus that math will only be calculated on the reachable elements, so it effectively becomes non-lazy when you use it for math.

from awkward-0.x.

jpivarski commented on June 3, 2024

This is awkward 0.6.2, soon to be replaced by 0.7.0, but you can try it now.

from awkward-0.x.

numpy functions on jagged arrays don't produce compatible arrays about awkward-0.x HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent