Giter Club home page Giter Club logo

Comments (7)

natmokval avatar natmokval commented on September 27, 2024 2

What I find weirder is that if I instead do ix2.to_timestamp(freq="MS") (on main) then that fails with the message:
ValueError("for Period, please use 'MS' instead of 'MS'")

Ah this makes no sense, you're right, thanks. @natmokval fancy taking a look at this error message?

thanks for fixing the error message so quickly.

from pandas.

Aloqeely avatar Aloqeely commented on September 27, 2024 1

Thanks for the report!
This sequence seems logical and I'd expect it to work. Although this does work if you don't pass freq (e.g. ix2.to_timestamp())

What I find weirder is that if I instead do ix2.to_timestamp(freq="MS") (on main) then that fails with the message:

ValueError("for Period, please use 'MS' instead of 'MS'")

cc @natmokval

from pandas.

fkiraly avatar fkiraly commented on September 27, 2024

I believe passing strings stopped working in the relevant places since pandas 2.2.0, this also caused a failure in a core module of sktime, see failure and fix here:

sktime/sktime#6500

Are you not testing the various round trips and conversions?

I have no good view of the testing framework, but I would just test to_period and to_timestamp with the exhaustive set of all frequencies supported, including more exotic ones like multiples, start, end, weekday, etc.

Including:

  • round trips, there are two round trips, period-timestamp-period and timestamp-period-timestamp
  • passing frequencies in the two parts of the round trip or not, so four variations
  • using strings vs using freq objects (the latter still seems to work in places where strings have stopped working)

We have seen so many changes of behaviour around freq, it is not very pleasant - pandas has a huge user base, and ensuring stable contracts for sth like freq imo is crucial.

Very related, imo a key issue is "unified API" across dfferent frequencies. E.g., if you do the round trip and pass freq for one type of frequency, the same sequence should work for all others, or you end up having to write case distinctions in dependent code.

from pandas.

MarcoGorelli avatar MarcoGorelli commented on September 27, 2024

thanks @fkiraly for the report

Just for a bit of context: for offsets, pandas distinguishes between the start of the period and the end. 'MS' means "month start", and 'ME' means "month end" (the latter used to be 'M', which everyone would interpret as "month" and would cause a lot of headaches). For period, there's no distinction between start and end, which is why for Period, the alias is just 'M'

See #9586 for further discussion


As for this issue...let's take a look. The docs for to_timestamp say

Cast to DatetimeIndex of timestamps, at beginning of period.

So, it looks correct (or at least, as-documented) that the input should be a Period alias, and that how determines whether we go to the start (default) or end of the Period.

The start and end time for a month Period are at the beginning and end of the month:

In [13]: pd.Period('2000-01-01', freq='M').start_time
Out[13]: Timestamp('2000-01-01 00:00:00')

In [14]: pd.Period('2000-01-01', freq='M').end_time
Out[14]: Timestamp('2000-01-31 23:59:59.999999999')

Based on that, I'd have expected pd.period_range('2000', periods=3, freq=MonthEnd()).to_timestamp('M') to output

DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01'], dtype='datetime64[ns]', freq='MS')

But, it doesn't - it outputs

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='ME')

This can be observed even going back to pandas 2.0.3, before #9586 started to be addressed

Are you not testing the various round trips

Based on the docstring of to_timestamp, I don't think this one was ever meant to round-trip

The road forwards

It looks like to_timestamp was never respecting its own documentation. What should be done now, going forwards? Not really sure, gonna have to think this one through
@natmokval did you look into this?


What I find weirder is that if I instead do ix2.to_timestamp(freq="MS") (on main) then that fails with the message:
ValueError("for Period, please use 'MS' instead of 'MS'")

Ah this makes no sense, you're right, thanks. @natmokval fancy taking a look at this error message?

from pandas.

MarcoGorelli avatar MarcoGorelli commented on September 27, 2024

@fkiraly if you want something which round-trips, I think this should do the trick:

ix = pd.date_range("1/1/1870", periods=20, freq=freq)
ix2 = ix.to_period()

ix3 = ix2.to_timestamp() + pd.tseries.frequencies.to_offset(f'0{freq}')
ix3.freq = freq

though I'd suggest considering whether you really need Period at all

from pandas.

natmokval avatar natmokval commented on September 27, 2024

The road forwards

It looks like to_timestamp was never respecting its own documentation. What should be done now, going forwards? Not really sure, gonna have to think this one through @natmokval did you look into this?

Yeah, this is odd behavior. I missed this, unfortunately.
Also, the how parameter for converting a period to a timestamp doesnโ€™t seem to work properly:

>> ix1 = pd.period_range('2000', periods=3, freq='M').to_timestamp('M', how='s')
>>> ix1
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]', freq='ME')
>>>
>>> ix2 = pd.period_range('2000', periods=3, freq='M').to_timestamp('M', how='e')
>>> ix2
DatetimeIndex(['2000-01-31 23:59:59.999999999',
               '2000-02-29 23:59:59.999999999',
               '2000-03-31 23:59:59.999999999'],
              dtype='datetime64[ns]', freq=None)

Whyix2.freq = None, but not <MonthEnd>? it is confusing to me. @MarcoGorelli, what do you think?

In the meantime, maybe we can update the documentation for to_timestamp until this issue is resolved?

from pandas.

MarcoGorelli avatar MarcoGorelli commented on September 27, 2024

In the meantime, maybe we can update the documentation for to_timestamp until this issue is resolved?

yeah maybe - let's take this discussion over to #59371, and leave this one for how to round-trip

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.