Comments (2)
Yeah that might make more sense, the current keyword is just weird
from dask.
Yeah, that code path does seem like an old optimization to me. When I first saw this issue, I thought the compute
arg may have something to do with the new divisions calculation. However, as far as I can tell, this block is indeed persisting the shuffled collection to disk.
Personal Thoughts
I can actually imagine many cases where a disk-based persist mechanism like this can be useful. However, (1) I don't think many people are using "disk" for the cases that would benefit, and (2) I don't think this kind of mechanism makes much sense within the shuffle
API like this. Instead, this kind of mechanism should be exposed to the user in such a way that it is clear that the query is being segmented. For example, the user should need to call something like df.persist(storage=..., shuffle_method=...)
(just an example, not necessarily a suggested API).
from dask.
Related Issues (20)
- pandas upstream package fails to install HOT 3
- Pandas read_sql vs dask read_sql issues HOT 2
- assert_eq sometimes doesn't raise for differing string dtypes
- Issue repartitioning a time series by frequency when loaded from parquet file HOT 5
- UnicodeDecodeError when using a Dataframe with byte data and pandas 2 HOT 1
- RFE: is it possible to start making github releases?🤔 HOT 3
- Dataframe doesn't copy lists when doing column projections
- Sphinx API documentation for `dask.config` shows the whole config
- Inconsistent casting behaviour with dask-expr Dataframe HOT 2
- Support bag.to_dataframe when query planning is enabled HOT 1
- Drop pandas 1.X support? HOT 1
- dask-expr: computing single partition after set_index in from_pandas dd.DataFrame fails HOT 2
- dask-expr: DataFrame.map_partitions no longer takes a `token` keyword HOT 1
- dask-expr is now a hard dependency HOT 3
- Sparse masking throws error HOT 1
- Importing dask 2023.7.1 breaks `sys.last_traceback` in IPython HOT 2
- Dask Nunique bug under dask 2024.2.1 HOT 7
- CI failing on `main`
- CI is printing tracebacks for all xfailed tests which can be very confusing
- Combined save and calculation is using excessive memory HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.