Comments (7)
What I find weirder is that if I instead do ix2.to_timestamp(freq="MS") (on main) then that fails with the message:
ValueError("for Period, please use 'MS' instead of 'MS'")
Ah this makes no sense, you're right, thanks. @natmokval fancy taking a look at this error message?
thanks for fixing the error message so quickly.
from pandas.
Thanks for the report!
This sequence seems logical and I'd expect it to work. Although this does work if you don't pass freq (e.g. ix2.to_timestamp()
)
What I find weirder is that if I instead do ix2.to_timestamp(freq="MS")
(on main) then that fails with the message:
ValueError("for Period, please use 'MS' instead of 'MS'")
cc @natmokval
from pandas.
I believe passing strings stopped working in the relevant places since pandas 2.2.0
, this also caused a failure in a core module of sktime
, see failure and fix here:
Are you not testing the various round trips and conversions?
I have no good view of the testing framework, but I would just test to_period
and to_timestamp
with the exhaustive set of all frequencies supported, including more exotic ones like multiples, start, end, weekday, etc.
Including:
- round trips, there are two round trips, period-timestamp-period and timestamp-period-timestamp
- passing frequencies in the two parts of the round trip or not, so four variations
- using strings vs using freq objects (the latter still seems to work in places where strings have stopped working)
We have seen so many changes of behaviour around freq
, it is not very pleasant - pandas
has a huge user base, and ensuring stable contracts for sth like freq
imo is crucial.
Very related, imo a key issue is "unified API" across dfferent frequencies. E.g., if you do the round trip and pass freq
for one type of frequency, the same sequence should work for all others, or you end up having to write case distinctions in dependent code.
from pandas.
thanks @fkiraly for the report
Just for a bit of context: for offsets, pandas distinguishes between the start of the period and the end. 'MS'
means "month start", and 'ME'
means "month end" (the latter used to be 'M'
, which everyone would interpret as "month" and would cause a lot of headaches). For period, there's no distinction between start and end, which is why for Period
, the alias is just 'M'
See #9586 for further discussion
As for this issue...let's take a look. The docs for to_timestamp say
Cast to DatetimeIndex of timestamps, at beginning of period.
So, it looks correct (or at least, as-documented) that the input should be a Period alias, and that how
determines whether we go to the start (default) or end of the Period.
The start and end time for a month Period are at the beginning and end of the month:
In [13]: pd.Period('2000-01-01', freq='M').start_time
Out[13]: Timestamp('2000-01-01 00:00:00')
In [14]: pd.Period('2000-01-01', freq='M').end_time
Out[14]: Timestamp('2000-01-31 23:59:59.999999999')
Based on that, I'd have expected pd.period_range('2000', periods=3, freq=MonthEnd()).to_timestamp('M')
to output
DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01'], dtype='datetime64[ns]', freq='MS')
But, it doesn't - it outputs
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='ME')
This can be observed even going back to pandas 2.0.3, before #9586 started to be addressed
Are you not testing the various round trips
Based on the docstring of to_timestamp
, I don't think this one was ever meant to round-trip
The road forwards
It looks like to_timestamp
was never respecting its own documentation. What should be done now, going forwards? Not really sure, gonna have to think this one through
@natmokval did you look into this?
What I find weirder is that if I instead do ix2.to_timestamp(freq="MS") (on main) then that fails with the message:
ValueError("for Period, please use 'MS' instead of 'MS'")
Ah this makes no sense, you're right, thanks. @natmokval fancy taking a look at this error message?
from pandas.
@fkiraly if you want something which round-trips, I think this should do the trick:
ix = pd.date_range("1/1/1870", periods=20, freq=freq)
ix2 = ix.to_period()
ix3 = ix2.to_timestamp() + pd.tseries.frequencies.to_offset(f'0{freq}')
ix3.freq = freq
though I'd suggest considering whether you really need Period at all
from pandas.
The road forwards
It looks like
to_timestamp
was never respecting its own documentation. What should be done now, going forwards? Not really sure, gonna have to think this one through @natmokval did you look into this?
Yeah, this is odd behavior. I missed this, unfortunately.
Also, the how
parameter for converting a period to a timestamp doesnโt seem to work properly:
>> ix1 = pd.period_range('2000', periods=3, freq='M').to_timestamp('M', how='s')
>>> ix1
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='ME')
>>>
>>> ix2 = pd.period_range('2000', periods=3, freq='M').to_timestamp('M', how='e')
>>> ix2
DatetimeIndex(['2000-01-31 23:59:59.999999999',
'2000-02-29 23:59:59.999999999',
'2000-03-31 23:59:59.999999999'],
dtype='datetime64[ns]', freq=None)
Whyix2.freq = None
, but not <MonthEnd>
? it is confusing to me. @MarcoGorelli, what do you think?
In the meantime, maybe we can update the documentation for to_timestamp
until this issue is resolved?
from pandas.
In the meantime, maybe we can update the documentation for to_timestamp until this issue is resolved?
yeah maybe - let's take this discussion over to #59371, and leave this one for how to round-trip
from pandas.
Related Issues (20)
- ENH: Adding `skipna:bool` to Rolling.sum HOT 2
- BUG: Replace fails after NaN in a Series of `string` HOT 2
- DEPR: future.no_silent_downcasting option HOT 1
- BUILD: Python Docker Build Issues HOT 4
- DOC: Incorrect Documentation: pd.Dataframe.dropna HOT 2
- BUG: Spurious `FutureWarning` when using `pd.read_json()` HOT 4
- BUG: read_json silently ignores the dtype when engine=pyarrow HOT 2
- ENH: support the Arrow PyCapsule Interface on pandas.Series (export) HOT 14
- BUG: HOT 3
- BUG: "Bad CRC-32 for file'docProps/core.xml" for Read large Excel file HOT 1
- BUG: Incorrect float32/float64 comparison result HOT 7
- BUG: import has error HOT 3
- BUG: subtracting datetime series from datetime dataframe, or datetime dataframe from datetime series, raises TypeError or UFuncTypeError HOT 2
- BUG: read_csv's usecols type hint isn't match with list of strings HOT 2
- BUG: OverflowError: value too large to convert to int when manipulating very large dataframes HOT 6
- DOC: Wrong bug number in what's new v3.0.0 HOT 3
- ENH: Checking gaps for time series HOT 2
- ENH: Support non-categorical values for pandas bar plots when x axis is datetime values HOT 2
- ENH: In pandas.testing.assert_frame_equal, support per-column configuration HOT 5
- DOC: read_csv: date_format HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.