Comments (5)
Hrm, so the very easy thing to do is to assert the result of a python function onto each block without the barrier. Although in multi-threaded contexts this might get weird.
x = da.assert_(x, lambda block: block > 0, ValueError('x must be positive'))
Doing general full-array assertions is also doable (with a bit more complex graph magic) but would probably fill up cache space with the intermediate variables.
from dask.
@eric-czech brought this up today in a call. CC'ing him here.
@shoyer do you have any thoughts on how we would do this today? FWIW I think taht @eric-czech is operating on Dask under Xarray.
from dask.
👍
Yep, using Xarray over Dask. I'd love to be able to use lazy, runtime checks like that.
from dask.
Can you say a bit more about your needs @eric-czech ? Do you mostly need elementwise checks, or something more complicated? How would you like to spell these checks?
from dask.
Elementwise checks would cover the majority of cases I can think of as being useful. Checks on reductions would also be nice (i.e. sums along an axis equal 1), but not critical. I can see some value in making the assertion a terminal task as well, e.g.:
data = da.array(..., dtype=int)
mask = da.array(..., dtype=bool)
da.assertion(
data[mask].min(),
lambda v: v >= 0,
lambda v: ValueError(f'Data values must be >= 0 (found min value {v})')
)
# do stuff with data and mask but not data[mask]
as opposed to:
res = da.assertion(
data[mask].min(),
lambda v: v >= 0,
lambda v: ValueError(f'Data values must be >= 0 (found min value {v})')
)
# now I need to use `res` elsewhere in the graph for the
# assertion to fire, but I don't necessarily want to
from dask.
Related Issues (20)
- SeriesGroupBy.agg does not accept `Aggregate` when paired with `'median'`
- ⚠️ Upstream CI failed ⚠️
- Tests for dummy data generation failing HOT 1
- pyright: "read_parquet" is not exported from module "dask.dataframe" HOT 1
- Moto 5 results in timeouts in s3 tests:
- ``test_tokenize_function_cloudpickle`` is very flaky
- applying tuple with pyarrow HOT 2
- max number of tasks per dask worker HOT 1
- gpuCI failing HOT 2
- Tokenization meta-issue HOT 3
- Deprecation warning is pretty intense HOT 10
- combine_first: conditional type-cast to rhs's dtype HOT 6
- Explode method does not work for object column with DatetimeInterval values HOT 4
- [DISCUSSION] What is the timeline for `dask.dataframe` deprecation HOT 8
- [DISCUSSION] How to deal with limited annotation support in `dask-expr` HOT 3
- pandas upstream package fails to install HOT 3
- Pandas read_sql vs dask read_sql issues
- assert_eq sometimes doesn't raise for differing string dtypes
- Issue repartitioning a time series by frequency when loaded from parquet file HOT 4
- Accessing merge indicator column causes KeyError with shuffle.method = p2p
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.