Giter Club home page Giter Club logo

Comments (8)

fjetter avatar fjetter commented on June 6, 2024

Thanks for opening the issue. First of all, this is all up for debate. Nothing here has been definitively decided.

Our intention is currently to enable the query planning as soon as possible. We feel good about the current performance and stability. However, we won't be able to release it with full API coverage and are rather focusing on the most important APIs (e.g. currently dask-expr does not support something like DataFrame.meld or named groupby aggregations). Considering that users can always opt-out we believe that this is the approach that is beneficial to most users. At this point we already believe that dask-expr is better for most users than the legacy DataFrame API.

There are two missing features that possibly lock out a larger number of users than we'd feel comfortable with. These are

  • Annotations #10937 / dask/dask-expr#13 which are important for all users who are relying on worker resources, worker/host restrictions, etc.
  • Distributed Scheduler integration dask/dask-expr#14 which would hit users with very large graphs

Giving a commitment for a specific release is difficult but given the current release schedule I consider 2024.3.0 (release date 2024-03-01) possible but a little optimistic. I am confident that we can manage 2024.3.1 (release date 2024-03-15). If we were to cut both feature we could flip the switch right now already.

Edit: I got mixed up in my calendar. I suspect the most realistic release date will be 2024.3.0 which should happen on 2024-03-08. If we reduce scope and are fine without annotations/scheduler integrations we can go sooner.

What is the earliest date that "dataframe.query-planning": "False" will be disabled entirely?

There hasn't been any decision about this, yet. My current assumption is that we'll hold on to this for a while until we're certain that we won't cut out larger user groups.
While it would be nice to be able to delete the old DataFrame code we're not in a rush considering that the old HLG backend is still in use for Arrays and Bags.

Please let us know if anything here sounds concerning or problematic. We're also interested if this all sounds too careful or too reckless :)

from dask.

fjetter avatar fjetter commented on June 6, 2024

The conversation about annotations is happening over in #10937

from dask.

mrocklin avatar mrocklin commented on June 6, 2024

In conversation @fjetter mentioned to me that we should probably try things out with xgboost.dask and make sure that that project is ok post-transition.

from dask.

mrocklin avatar mrocklin commented on June 6, 2024

For context with xgboost, they specify workers, but only after they've already converted to futures, which seems pretty safe for dask-xgboost.

from dask.

rjzamora avatar rjzamora commented on June 6, 2024

Linking dask/community#361 (sorry - just saw that issue now)

from dask.

fjetter avatar fjetter commented on June 6, 2024

(sorry - just saw that issue now)

my fault. I only opened that one now 😅

from dask.

fjetter avatar fjetter commented on June 6, 2024

We're currently seeing a couple of weird recursive import errors when using dask-expr in our coiled benchmarks test suite, see coiled/benchmarks#1419 This is something we definitely want to fix or at least have better understood before moving forward.
This test suite is also running xgboost and from what we can tell, it is running as expected. We encountered an error in dask-ml related to wrong imports but otherwise no other issues popped up, yet.

Therefore, I suggest to not block on any of the above issues, i.e. neithe on the annotations #10937 nor on the scheduler integration dask/dask-expr#14

This would mean that the next release would have dask-expr enabled by default. In preparation of this, I propose to change the default on main as soon as possible to give downstream projects a chance to test against this. If any medium sized blockers pop up we'd postpone the release until those are fixed. If anything major comes up, we could still revert the toggle if necessary.

This leaves the question about what to do with pandas 1.X support. I opened another issue for this #10962

from dask.

phofl avatar phofl commented on June 6, 2024

We're currently seeing a couple of weird recursive import errors when using dask-expr in our coiled benchmarks test suite, see coiled/benchmarks#1419 This is something we definitely want to fix or at least have better understood before moving forward.

This is fixed now on main

from dask.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.