Describe the issue : Since the latest update, impor

This is <a class="issue-link js-issue-link" data-error-text="Failed to load title" dat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

importing dask.dataframe changes pandas behaviour in 2024.3.0 about dask HOT 11 CLOSED

ivirshup commented on September 23, 2024

importing dask.dataframe changes pandas behaviour in 2024.3.0

from dask.

Comments (11)

fjetter commented on September 23, 2024

This is dask/dask-expr#932

@phofl is this expected when enabling copy on write?

from dask.

fjetter commented on September 23, 2024

FWIW The setting we're enabling here is something that will be enabled by default (if not enforced) in pandas 3.0 which is soon to be released. I suspect that this is desired behavior and you'll need to create a copy with to_numpy(copy=True)

from dask.

ivirshup commented on September 23, 2024

I agree that we're going to eventually need to support pandas with copy on write. I am also working on a patch which just works with copy-on-write doing this.

But:

I expect we're going to have to support pandas 2 for a while longer (we have optional dependencies and dependents pinning pandas pretty low)
I think this is an unintuitive side effect of importing dask.dataframe.

from dask.

ivirshup commented on September 23, 2024

Ran into a deeper issue where numcodecs doesn't like being passed a read-only buffer, so I think we'll end up needing to pin dask for a bit on our end.

zarr-developers/numcodecs#514

from dask.

fjetter commented on September 23, 2024

I understand the problem. We'll likely want to continue using COW in dask but I can offer that we put in a toggle to control this, e.g.

import dask
dask.config.set({"dataframe.copy-on-write": False})
import dask.dataframe

Would this be a feasible workaround for you?

from dask.

ivirshup commented on September 23, 2024

dask.config.set({"dataframe.copy-on-write": False})

Would that do something much different than me using pd.set_option?

I do think it makes sense that you want to opt-in to this. I was just hoping to be able to address this during the pandas 3.0 release candidate period when our canary would pick it up.

I also don't think we'll need to do any configuration if we can figure out the numcodecs issue, but unfortunately I don't know enough cython to figure out what incantation it wants.

from dask.

fjetter commented on September 23, 2024

Would that do something much different than me using pd.set_option?

We want to enable this by default for dask users. The config option would be a way to opt-out of this opinionated choice.

from dask.

phofl commented on September 23, 2024

I also don't think we'll need to do any configuration if we can figure out the numcodecs issue, but unfortunately I don't know enough cython to figure out what incantation it wants.

Their Cython Code can't deal with read-only arrays (we had similar stuff in pandas), a PR that addressed this is here:

pandas-dev/pandas#53703

from dask.

ivirshup commented on September 23, 2024

@phofl, thank you for the pointer! zarr-developers/numcodecs#515

from dask.

phofl commented on September 23, 2024

No worries, pr looks good and should address those issues.

from dask.

flying-sheep commented on September 23, 2024

We want to enable this by default for dask users. The config option would be a way to opt-out of this opinionated choice.

Please reconsider that philosophy. Package-wide settings are intended for users and applications, not for libraries. import in Python should be free of side effects.

You can make your APIs only return COW pd.DataFrames, as that’s part of your API, but you should make sure a import dask.dataframe doesn’t modify how pd.DataFrames behave in a different part of a user’s codebase that doesn’t use Dask at all.

from dask.

importing dask.dataframe changes pandas behaviour in 2024.3.0 about dask HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent