Comments (7)
This is actually useful enough that I wrote a whole special class to encapsulate this logic for xray, LazilyIndexedArray
. There may be some relevant logic there for task 1 (e.g., how to slice a slice object).
from dask.
Well, matching can easily find these terms, RewriteRule
could be generalized to take in an arbitrary right hand side. If it's a function, it accepts a dict representing the match. If it's a term, a specific subs function is created and linked in, so that rewriting is always match -> validate (for non linear patterns) -> apply func. I have something like this already written up for SymPy, I'll port it over the weekend/early next week.
from dask.
Here is a first pass on problem 1
from operator import getitem
def test_fuse_getitem():
pairs = [((getitem, (getitem, 'x', slice(1000, 2000)), slice(15, 20)),
(getitem, 'x', slice(1015, 1020))),
((getitem, (getitem, 'x', (slice(1000, 2000), slice(100, 200))),
(slice(15, 20), slice(50, 60))),
(getitem, 'x', (slice(1015, 1020), slice(150, 160)))),
((getitem, (getitem, 'x', slice(1000, 2000)), 10),
(getitem, 'x', 1010)),
((getitem, (getitem, 'x', (slice(1000, 2000), 10)),
(slice(15, 20),)),
(getitem, 'x', (slice(1015, 1020), 10))),
((getitem, (getitem, 'x', (10, slice(1000, 2000))),
(slice(15, 20),)),
(getitem, 'x', (10, slice(1015, 1020))))
]
for inp, expected in pairs:
result = fuse_getitem(inp)
assert result == expected
def fuse_getitem(term):
(_, (_, x, a), b) = term
c = fuse_slice(a, b)
return (getitem, x, c)
def normalize_slice(s):
start, stop, step = s.start, s.stop, s.step
if start is None:
start = 0
if step is None:
step = 1
return slice(start, stop, step)
def fuse_slice(a, b):
"""
>>> fuse_slice(slice(1000, 2000), slice(10, 15))
slice(1010, 1015, None)
>>> fuse_slice(slice(1000, 2000), 10)
1010
>>> fuse_slice(slice(1000, 2000, 5), 10)
1050
>>> fuse_slice(slice(1000, 2000, 5), slice(10, 20, 2))
slice(1050, 1100, 10)
"""
if isinstance(a, slice):
a = normalize_slice(a)
if isinstance(b, slice):
b = normalize_slice(b)
if isinstance(a, slice) and isinstance(b, int):
return a.start + b*(a.step if a.step is not None else 1)
if isinstance(a, slice) and isinstance(b, slice):
start = a.start + a.step * b.start
if b.stop is not None:
stop = a.start + a.step * b.stop
else:
stop = None
if a.stop is not None:
if stop is not None:
stop = min(a.stop, stop)
else:
stop = a.stop
stop = stop - ((stop - start) % b.step)
step = a.step * b.step
if step == 1:
step = None
return slice(start, stop, step)
if isinstance(a, tuple) and not isinstance(b, tuple):
b = (b,)
if isinstance(a, tuple) and isinstance(b, tuple):
j = 0
result = list()
for i in range(len(a)):
if isinstance(a[i], int):
result.append(a[i])
continue
result.append(fuse_slice(a[i], b[j]))
j += 1
return tuple(result)
raise NotImplementedError()
from dask.
Doesn't handle the stranger slicing conditions (e.g. negative steps) nor is it exhaustively tested.
from dask.
Addressed problem 2 with #106. Just make a function that accepts a dict of matches and returns a term, and use it as the rhs of the rule.
from dask.
I have a pretty exhaustive test suite that I wrote for xray -- it basically just checks all permutations of a long set of slices against the output that would be generated from numpy array:
https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/test/test_indexing.py#L85
Happy to help put things together but it looks like you already have a good start.
RE rewrite rules: what about optimizing cases like x.mean(axis=1)[:5]
? Ideally, there would be some notion of operations that can commute along a given axis.
from dask.
I'm fooling around with this in #107 . Feedback and test cases heartily welcomed there. It's getting a bit crazy.
from dask.
Related Issues (20)
- Deprecate axis keyword in rolling
- Non deterministic tokenization for empty numpy arrays after pickle roundtrip HOT 2
- Deprecate method and limit from fillna
- Deprecate convert_dtype in apply
- Doc build broken HOT 1
- GPU CI failing HOT 3
- `dask.array.tensordot` bug: negative values in argument `axes` give wrong results.
- Wrong `string[pyarrow]` dtype after using `unique()` with pyarrow as an engine
- _compute_sum_of_squares uses pandas.DataFrame.groupby without `observed` HOT 1
- `SeriesGroupBy.agg` does not accept `Aggregate`s when passed as kwargs HOT 3
- `make_meta` over a Dask Dataframe returns a reference, not a new object
- pandas 2.2: dask-expr failure in test_groupby_reduction_split[median] HOT 1
- pandas 2.2: changed behaviour in `test_multi.py::test_concat5` HOT 1
- SeriesGroupBy.agg does not accept `Aggregate` when paired with `'median'`
- ⚠️ Upstream CI failed ⚠️
- Tests for dummy data generation failing HOT 1
- pyright: "read_parquet" is not exported from module "dask.dataframe" HOT 1
- Moto 5 results in timeouts in s3 tests:
- ``test_tokenize_function_cloudpickle`` is very flaky
- applying tuple with pyarrow HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.