Giter Club home page Giter Club logo

Comments (5)

phofl avatar phofl commented on September 26, 2024

cc @MarcoGorelli is this pdep6 related? It looks like this is a case of upcasting

@erezinman all cases that don't work set inplace instead of swapping out the underlying data, so different semantics can happen.

from pandas.

MarcoGorelli avatar MarcoGorelli commented on September 26, 2024

thanks for the ping

looks like it's been like this since at least 2.0.2, so I don't think it's related to any pdep-6 work (which only started in 2.1):

In [2]: import pandas as pd

In [3]:
   ...: df = pd.DataFrame({'time': pd.to_datetime(['2021-01-01 12:00:00', '2021-01-01 12:00:01', '2021-01-01 12:00:02'])
   ...: ,
   ...:                            'value': [1, 2, 3]})

In [4]: df.iloc[:, 0] = df.iloc[:, 0].astype('M8[us]')

In [5]: df.dtypes
Out[5]:
time     datetime64[ns]
value             int64
dtype: object

In [6]: pd.__version__
Out[6]: '2.0.2'

from pandas.

tagyieh avatar tagyieh commented on September 26, 2024

take

from pandas.

tagyieh avatar tagyieh commented on September 26, 2024

Hello @MarcoGorelli and @phofl

I believe I have corrected this bug, however one of the tests (pandas/tests/copy_view/test_indexing.py::test_subset_set_column_with_loc) seems to be failing with my solution. The output is as follows:

@pytest.mark.parametrize(
        "dtype", ["int64", "float64"], ids=["single-block", "mixed-block"]
    )
    def test_subset_set_column_with_loc(backend, dtype):
        # Case: setting a single column with loc on a viewing subset
        # -> subset.loc[:, col] = value
        _, DataFrame, _ = backend
        df = DataFrame(
            {"a": [1, 2, 3], "b": [4, 5, 6], "c": np.array([7, 8, 9], dtype=dtype)}
        )
        df_orig = df.copy()
        subset = df[1:3]

        subset.loc[:, "a"] = np.array([10, 11], dtype="int64")

        subset._mgr._verify_integrity()
        expected = DataFrame(
            {"a": [10, 11], "b": [5, 6], "c": np.array([8, 9], dtype=dtype)},
            index=range(1, 3),
        )
>       tm.assert_frame_equal(subset, expected)
E       AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
E
E       Attribute "dtype" are different
E       [left]:  int64
E       [right]: Int64

If I switch the indexing method to subset["a"] = np.array([10, 11], dtype="int64") (instead of subset.loc[:, "a"]) and run the test with the original code (without my alterations), the test fails with the exact same error as mine.

My question is: if, according to the issue, the only indexing method providing the correct output is using the name of the column itself, i.e. subset["a"], and when running it the test fails, could this test be wrong?

Thank you in advance

from pandas.

asishm avatar asishm commented on September 26, 2024

@MarcoGorelli I think this is a duplicate of #52593 since the int equivalent of

df = pd.DataFrame({'a': [1,2,3]}, dtype='int64')
df.loc[:, 'a'] = df.loc[:, 'a'].astype('int32')
print(df.dtypes) # a is still int64

also doesn't change the dtype

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.