Giter Club home page Giter Club logo

Comments (16)

soerenwolfers avatar soerenwolfers commented on June 2, 2024

Isn't duckdb's duckdb_plus60 the correct answer according to the linked blog post?

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 2, 2024

If your problem is that python_plus60 gives the "wrong" result, then that is not a duckdb issue, since it can be reproduced as follows

import datetime 
import pandas
import dateutil 

print(pandas.Timestamp("2018-02-14 12:00:00-05:00") + datetime.timedelta(days=60))

print(pandas.Timestamp("2018-02-14 12:00:00", tz="America/New_York") + datetime.timedelta(days=60))

print(pandas.Timestamp("2018-02-14 12:00:00", tz=dateutil.tz.gettz("America/New_York")) + datetime.timedelta(days=60))
2018-04-15 12:00:00-05:00

2018-04-15 13:00:00-04:00

2018-04-15 13:00:00-04:00

from duckdb.

paultiq avatar paultiq commented on June 2, 2024

Isn't duckdb's duckdb_plus60 the correct answer according to the linked blog post?

I interpret as the opposite: that the correct timezone aware offset is the one given by the python. And that duckdb is returning the non normalized version. But, let me mull over the pandas version you sent!

from duckdb.

paultiq avatar paultiq commented on June 2, 2024

The first example (UTC-05:00) isn't reproducing, since it's a fixed offset w/ no daylight savings component.

The behavior I'm expecting is given by your second and third examples:

  • print(pandas.Timestamp("2018-02-14 12:00:00", tz="America/New_York") + datetime.timedelta(days=60))
  • 2018-04-15 13:00:00-04:00

from duckdb.

hawkfish avatar hawkfish commented on June 2, 2024

What time zone are you in?

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 2, 2024

Could you summarize what you think is wrong in duckdb, at the moment? (I still don't think anything is wrong.)

Also, note that your results depend on duckdb's timezone setting, so you should explicitly set that in your example via

conn.query("SET TimeZone='America/New_York'")

If you use UTC instead, you get different results (*):

import duckdb

conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT ('2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day) - '2018-04-15 17:00:00+00'::TIMESTAMPTZ"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ((CAST('2018-02-14 12:00:00 America/New_York' AS TIMESTAMP WITH TIME ZONE) + to_days(CAST(trunc(CAST(60 AS DOUBLE)…  │
│                                                       interval                                                       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 00:00:00                                                                                                             │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ((CAST('2018-02-14 12:00:00 America/New_York' AS TIMESTAMP WITH TIME ZONE) + to_days(CAST(trunc(CAST(60 AS DOUBLE)…  │
│                                                       interval                                                       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ -01:00:00                                                                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

(*) and I believe correctly so: a sensible definition of TIMESTAMPTZ + INTERVAL is: (1) convert the TIMESTAMPTZ to TIMESTAMP (i.e.: how would someone in the configured timezone write down the point in time specified by the given TIMESTAMPTZ) (2) add the interval (3) convert to TIMESTAMPTZ (i.e., what point in time would be denoted by someone in the configured timezone with the given TIMESTAMP)

from duckdb.

hawkfish avatar hawkfish commented on June 2, 2024
from duckdb_settings() where name='TimeZone';
name value description input_type scope
TimeZone America/Los_Angeles The current time zone VARCHAR GLOBAL
select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ as d, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day  as base_date;
d base_date
2018-02-14 09:00:00-08 2018-04-15 09:00:00-07

This looks correct for me (at least this is what ICU does when you cross a DST boundary: preserve wall clock time).

from duckdb.

hawkfish avatar hawkfish commented on June 2, 2024

Pandas may do something different here, but what adding 60 days across a DST boundary should return is a matter for theologians really.

I'd also like to point out that pandas uses pytz, which is deprecated(!) and suffers from the Y2038 problem (something one of my clients tripped over the other day when doing long range forecasting.)

from duckdb.

paultiq avatar paultiq commented on June 2, 2024

Pandas may do something different here, but what adding 60 days across a DST boundary should return is a matter for theologians really.

I'd also like to point out that pandas uses pytz, which is deprecated(!) and suffers from the Y2038 problem (something one of my clients tripped over the other day when doing long range forecasting.)

I'll answer the other questions this evening, but doesn't DuckDB also use pytz? Or am I misreading the discussion here: #11974

from duckdb.

hawkfish avatar hawkfish commented on June 2, 2024

No, we use ICU internally (which doesn't have these problems). We only use pytz when producing pandas output because that its what pandas requires.

from duckdb.

paultiq avatar paultiq commented on June 2, 2024

No, we use ICU internally (which doesn't have these problems). We only use pytz when producing pandas output because that its what pandas requires.

Yup, that's what I was referring to. I was expecting it (a timezone aware date plus an interval 60 day) to behave the same as the second example above.

I haven't (yet) had a chance to review your subsequent comments which seems to contradict the examples I had at the beginning.

from duckdb.

hawkfish avatar hawkfish commented on June 2, 2024

Not sure what you meant by "the second example above"?

It sounds to me like you are expecting the date arithmetic to add exactly 60 days to the instant, which is what pandas does? ICU does not behave that way, so duckdb does not either. Moreover, this is not what Postgres does:

hawkfish=# select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval '60 day'  as base_date;
      timestamptz       |       base_date        
------------------------+------------------------
 2018-02-14 09:00:00-08 | 2018-04-15 09:00:00-07
(1 row)

(BTW I am in America/Los_Angeles, which is why the displayed offsets are different.)

from duckdb.

hawkfish avatar hawkfish commented on June 2, 2024

Yes that does what @paultiq is expecting:

select 
    '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ d, 
    '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval (24 * 60) hours  as base_date;
d base_date
2018-02-14 09:00:00-08 2018-04-15 10:00:00-07

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 2, 2024

Sorry, didn't meant to make you look talking to yourself; deleted my comment so I could check first whether it actually works because I was on my phone at the time. Anyway, glad it works, so duckdb doesn't only do the theologically right thing from the perspective of the blog post, but also offers the heretic option in a simple way.

Only sad thing to be taken from this is that duckdb can't currently do "add X wall clock days to timestamptz Y from the perspective of timezone Z" with any Z other than the one on the settings (in particular not with a Z that's variable across a table). Not that that's surprising given that TIMESTAMPTZ doesn't store timezones, and not that I can see myself needing it anytime soon, but potentially something for a future separate feature request. (Which would strictly speaking also require specifying what to do if the result lands in the midst of a daylight savings fold or a gap, so really should be a request for a function add(TIMESTAMPTZ, INTERVAL, TIMEZONE, FOLD, GAP) and there might be more complications I haven't thought about. )

from duckdb.

paultiq avatar paultiq commented on June 2, 2024

Yes that does what @paultiq is expecting: ...

Yes, thanks... that's the part I was hung up on. That this version is (24*60 - 1) hours, but the Pandas version is 24*60 hours.

I totally accept this is a. a question for the theologians and b. that this is how icu works. So, feel free to close as "as designed" :) Thanks for the discussion.

import duckdb

conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ start_date , '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day end_date"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))
│        start_date        │         end_date         │
│ timestamp with time zone │ timestamp with time zone │
├──────────────────────────┼──────────────────────────┤
│ 2018-02-14 23:00:00-05   │ 2018-04-15 23:00:00-04   │
└──────────────────────────┴──────────────────────────┘

Yet this version:

import datetime 
import pandas
pts = pandas.Timestamp("2018-02-14 23:00:00", tz="America/New_York")
print(pts)
print(pts + datetime.timedelta(days=60))

is ```
2018-02-14 23:00:00-05:00
2018-04-16 00:00:00-04:00

from duckdb.

soerenwolfers avatar soerenwolfers commented on June 2, 2024

(You can close issues yourself as "won't do')

from duckdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.