Comments (11)
I got here by searching "Must install dask-expr to activate query planning" which is the error if you try to use dask.dataframe
under dask>=2024.3.0
. Solved by requiring dask[dataframe]
(actually in my case, dask[dataframe,diagnostics,distributed]
).
from dask.
(to be clear I'm saying "this concerns me / seems odd" not "change this now please")
from dask.
I don't mind having an option that much but we don't have a way to easily set this for users. I doubt that most users are actually maintaining a dask config in their home directory.
I do like the idea of supporting this dask config set
CLI command that does this for the user
from dask.
Coiled implements this FWIW
(test-env) benchmarks:~$ coiled config set foo bar
Updated [foo] to [bar], config saved to /Users/mrocklin/.config/dask/coiled.yaml
from dask.
If we're having to update our own CI due to our warning then I imagine that we're affecting any other project's CI that's depending on us. If an upstream project did this to us I'd feel pretty sad.
from dask.
The deprecation warning is indeed a problem for RAPIDS, and will likely require us to pin to dask-2024.1.1 until after the April release. Although dask-cudf was already planning to provide full support for "dataframe.query-planning": True
for RAPIDS-24.04, we cannot support a True
default just yet (it must be an opt-in situation for the first release - since down-stream RAPIDS libraries have negative developer cycles to deal with the regressions).
I can't say that I have any brilliant ideas to mitigate this problem for us at the moment. It certainly makes sense to be "loud" about the legacy dask.dataframe
deprecation. However, it would be really nice if there was a way that we could still use dask>=2024.2.0 (I'd be very open to suggestions).
from dask.
I agree wholeheartedly with the spirit of @mrocklin reporting this issue. While I realize that there's no perfect solution to evolving APIs and handling deprecations, I had a similar reaction to this change. I want to dutifully stay on near-latest versions of dask because it's the right thing to do, but this deprecation warning pollutes all the CLIs I've written, and there doesn't appear to be a great way to address it.
Will there soon be a version of dask/dask-expr that makes the suggested configuration the default? I'm fine with that, and then writing "hello world" programs would again be noise-free, right?
from dask.
I believe this is closed now by
Please reopen if I'm mistaken. :)
from dask.
I'm sorry @milesgranger but I haven't kept up with those PRs and looking briefly at the descriptions I'm not sure what they do. Can you help me understand what the new behavior is?
from dask.
Apologies, I should have been more explicit about how those are actually helpful. :)
#10925 adds the dataframe.query-planning-warning
config option to avoid the warning altogether.
#10921 adds the convenience dask config set
CLI mentioned in the last bit of the opening comment. ie. dask config set dataframe.query-planning-warning False
from dask.
Cool. Looking now at the code for those two PRs, should we add the dask config set
language to the warning message?
Also, FWIW I'd still prefer that we just not present this warning message at all. I don't think it's necessary and I suspect that it's somewhat harmful.
from dask.
Related Issues (20)
- Add a `dask.array.sample` functionality mirroring `dask.dataframe.sample` with an optional `ignore_nan` argument
- Inconsistency in ddf.astype(Arrow Dict) HOT 1
- CI is Failing HOT 4
- ddf.drop is inconsistent when passed a set of columns HOT 4
- test_division_or_partition in test_sql is failing for pandas 3
- Sorting by a categorical column doesn't always work
- Use case focused docs pages HOT 2
- TypeError: can only concatenate str (not "traceback") to str
- ⚠️ Upstream CI failed ⚠️
- Add support for `pip install dask[jobqueue]` HOT 4
- Mean fails to compute for very large column of pyarrow type HOT 1
- Previously working time series resampling breaks in new version of Dask HOT 3
- When using PyArrow dtypes, aggregations create NaNs of unexpected type HOT 1
- Column with object dtype get converted to string when selecting the column HOT 1
- aggregate function that operates on vector(array of numeric) data
- Dask .head() returns error as .compute returns ok! HOT 2
- API docs missing for `read_csv`, `read_fwf` and `read_table` HOT 3
- New CI failure showing up in fsspec HOT 5
- Overlap with `new_axis` option is not trimmed correctly HOT 1
- ValueError: An error occurred while calling the read_csv method registered to the pandas backend HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.