Comments (5)
cc @MarcoGorelli is this pdep6 related? It looks like this is a case of upcasting
@erezinman all cases that don't work set inplace instead of swapping out the underlying data, so different semantics can happen.
from pandas.
thanks for the ping
looks like it's been like this since at least 2.0.2, so I don't think it's related to any pdep-6 work (which only started in 2.1):
In [2]: import pandas as pd
In [3]:
...: df = pd.DataFrame({'time': pd.to_datetime(['2021-01-01 12:00:00', '2021-01-01 12:00:01', '2021-01-01 12:00:02'])
...: ,
...: 'value': [1, 2, 3]})
In [4]: df.iloc[:, 0] = df.iloc[:, 0].astype('M8[us]')
In [5]: df.dtypes
Out[5]:
time datetime64[ns]
value int64
dtype: object
In [6]: pd.__version__
Out[6]: '2.0.2'
from pandas.
take
from pandas.
Hello @MarcoGorelli and @phofl
I believe I have corrected this bug, however one of the tests (pandas/tests/copy_view/test_indexing.py::test_subset_set_column_with_loc) seems to be failing with my solution. The output is as follows:
@pytest.mark.parametrize(
"dtype", ["int64", "float64"], ids=["single-block", "mixed-block"]
)
def test_subset_set_column_with_loc(backend, dtype):
# Case: setting a single column with loc on a viewing subset
# -> subset.loc[:, col] = value
_, DataFrame, _ = backend
df = DataFrame(
{"a": [1, 2, 3], "b": [4, 5, 6], "c": np.array([7, 8, 9], dtype=dtype)}
)
df_orig = df.copy()
subset = df[1:3]
subset.loc[:, "a"] = np.array([10, 11], dtype="int64")
subset._mgr._verify_integrity()
expected = DataFrame(
{"a": [10, 11], "b": [5, 6], "c": np.array([8, 9], dtype=dtype)},
index=range(1, 3),
)
> tm.assert_frame_equal(subset, expected)
E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
E
E Attribute "dtype" are different
E [left]: int64
E [right]: Int64
If I switch the indexing method to subset["a"] = np.array([10, 11], dtype="int64") (instead of subset.loc[:, "a"]) and run the test with the original code (without my alterations), the test fails with the exact same error as mine.
My question is: if, according to the issue, the only indexing method providing the correct output is using the name of the column itself, i.e. subset["a"], and when running it the test fails, could this test be wrong?
Thank you in advance
from pandas.
@MarcoGorelli I think this is a duplicate of #52593 since the int equivalent of
df = pd.DataFrame({'a': [1,2,3]}, dtype='int64')
df.loc[:, 'a'] = df.loc[:, 'a'].astype('int32')
print(df.dtypes) # a is still int64
also doesn't change the dtype
from pandas.
Related Issues (20)
- BUG: HOT 3
- BUG: .mode(dropna=False) doesn't work with nullable integers HOT 1
- BUG: iterrows() on an awkward array with equal-length rows results in a ValueError HOT 3
- BUG: float64 convert to int64 get confuse result HOT 1
- QST: HOT 2
- BUG: `diff()` returns confusing output when dealing with negative timestamp deltas HOT 1
- ENH: add "right" argument to "qcut", like "cut" has HOT 2
- BUG: default value for dtype_backend is not applied by default HOT 5
- DOC: add detailed comments pandas/tests/arithmatic/datetime/ line no: 750, test_dti_cmp_tdi_tzawareness HOT 1
- BUG: Refactor test_dti_cmp_tdi_tzawareness function to reduce redundancy pandas/tests/arithmetic/test_datetime64.py HOT 2
- BUILD: HOT 1
- BUG: AttributeError: 'Engine' object has no attribute 'cursor' HOT 5
- BUG: FutureWarning when following documentation HOT 4
- BUG: Column of dtype Categorical in DataFrame encounters error when taking a row that includes nan in the column HOT 5
- DOC: Searching DataFrame.eval( blocks me from the documentation HOT 4
- PERF: groupby is significantly slower for `DatetimeIndex` with timezone HOT 2
- Potential performance regression with "CLN: enforce deprecation of frequencies deprecated for offsets"
- PERF: 100% CPU utilization with matplotlib (macosx)
- BUG: `interval_range` ignores type of `start` (regression from pandas 2.2.2)
- Inconsistent "index"/"name" property referring to both rows and columns based on iloc selection HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.