What happens? I noticed pytz mentioned in a recent issue and noted

<div class="highlight highlight-source-sql notranslate position-relative overflow-auto" dir="auto" d

Inconsistent date offsets with timezone handling about duckdb HOT 16 CLOSED

paultiq commented on June 2, 2024

Inconsistent date offsets with timezone handling

from duckdb.

Comments (16)

soerenwolfers commented on June 2, 2024

Isn't duckdb's duckdb_plus60 the correct answer according to the linked blog post?

from duckdb.

soerenwolfers commented on June 2, 2024

If your problem is that python_plus60 gives the "wrong" result, then that is not a duckdb issue, since it can be reproduced as follows

import datetime 
import pandas
import dateutil 

print(pandas.Timestamp("2018-02-14 12:00:00-05:00") + datetime.timedelta(days=60))

print(pandas.Timestamp("2018-02-14 12:00:00", tz="America/New_York") + datetime.timedelta(days=60))

print(pandas.Timestamp("2018-02-14 12:00:00", tz=dateutil.tz.gettz("America/New_York")) + datetime.timedelta(days=60))

2018-04-15 12:00:00-05:00

2018-04-15 13:00:00-04:00

2018-04-15 13:00:00-04:00

from duckdb.

paultiq commented on June 2, 2024

Isn't duckdb's duckdb_plus60 the correct answer according to the linked blog post?

I interpret as the opposite: that the correct timezone aware offset is the one given by the python. And that duckdb is returning the non normalized version. But, let me mull over the pandas version you sent!

from duckdb.

paultiq commented on June 2, 2024

The first example (UTC-05:00) isn't reproducing, since it's a fixed offset w/ no daylight savings component.

The behavior I'm expecting is given by your second and third examples:

print(pandas.Timestamp("2018-02-14 12:00:00", tz="America/New_York") + datetime.timedelta(days=60))
2018-04-15 13:00:00-04:00

from duckdb.

hawkfish commented on June 2, 2024

What time zone are you in?

from duckdb.

soerenwolfers commented on June 2, 2024

Could you summarize what you think is wrong in duckdb, at the moment? (I still don't think anything is wrong.)

Also, note that your results depend on duckdb's timezone setting, so you should explicitly set that in your example via

conn.query("SET TimeZone='America/New_York'")

If you use UTC instead, you get different results (*):

import duckdb

conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT ('2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day) - '2018-04-15 17:00:00+00'::TIMESTAMPTZ"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))


┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ((CAST('2018-02-14 12:00:00 America/New_York' AS TIMESTAMP WITH TIME ZONE) + to_days(CAST(trunc(CAST(60 AS DOUBLE)…  │
│                                                       interval                                                       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 00:00:00                                                                                                             │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ((CAST('2018-02-14 12:00:00 America/New_York' AS TIMESTAMP WITH TIME ZONE) + to_days(CAST(trunc(CAST(60 AS DOUBLE)…  │
│                                                       interval                                                       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ -01:00:00                                                                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

(*) and I believe correctly so: a sensible definition of TIMESTAMPTZ + INTERVAL is: (1) convert the TIMESTAMPTZ to TIMESTAMP (i.e.: how would someone in the configured timezone write down the point in time specified by the given TIMESTAMPTZ) (2) add the interval (3) convert to TIMESTAMPTZ (i.e., what point in time would be denoted by someone in the configured timezone with the given TIMESTAMP)

from duckdb.

hawkfish commented on June 2, 2024

from duckdb_settings() where name='TimeZone';

name	value	description	input_type	scope
TimeZone	America/Los_Angeles	The current time zone	VARCHAR	GLOBAL

select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ as d, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day  as base_date;

d	base_date
2018-02-14 09:00:00-08	2018-04-15 09:00:00-07

This looks correct for me (at least this is what ICU does when you cross a DST boundary: preserve wall clock time).

from duckdb.

hawkfish commented on June 2, 2024

Pandas may do something different here, but what adding 60 days across a DST boundary should return is a matter for theologians really.

I'd also like to point out that pandas uses pytz, which is deprecated(!) and suffers from the Y2038 problem (something one of my clients tripped over the other day when doing long range forecasting.)

from duckdb.

paultiq commented on June 2, 2024

Pandas may do something different here, but what adding 60 days across a DST boundary should return is a matter for theologians really.

I'd also like to point out that pandas uses pytz, which is deprecated(!) and suffers from the Y2038 problem (something one of my clients tripped over the other day when doing long range forecasting.)

I'll answer the other questions this evening, but doesn't DuckDB also use pytz? Or am I misreading the discussion here: #11974

from duckdb.

hawkfish commented on June 2, 2024

No, we use ICU internally (which doesn't have these problems). We only use pytz when producing pandas output because that its what pandas requires.

from duckdb.

paultiq commented on June 2, 2024

No, we use ICU internally (which doesn't have these problems). We only use pytz when producing pandas output because that its what pandas requires.

Yup, that's what I was referring to. I was expecting it (a timezone aware date plus an interval 60 day) to behave the same as the second example above.

I haven't (yet) had a chance to review your subsequent comments which seems to contradict the examples I had at the beginning.

from duckdb.

hawkfish commented on June 2, 2024

Not sure what you meant by "the second example above"?

It sounds to me like you are expecting the date arithmetic to add exactly 60 days to the instant, which is what pandas does? ICU does not behave that way, so duckdb does not either. Moreover, this is not what Postgres does:

hawkfish=# select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval '60 day'  as base_date;
      timestamptz       |       base_date        
------------------------+------------------------
 2018-02-14 09:00:00-08 | 2018-04-15 09:00:00-07
(1 row)

(BTW I am in America/Los_Angeles, which is why the displayed offsets are different.)

from duckdb.

hawkfish commented on June 2, 2024

Yes that does what @paultiq is expecting:

select 
    '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ d, 
    '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval (24 * 60) hours  as base_date;

d	base_date
2018-02-14 09:00:00-08	2018-04-15 10:00:00-07

from duckdb.

soerenwolfers commented on June 2, 2024

Sorry, didn't meant to make you look talking to yourself; deleted my comment so I could check first whether it actually works because I was on my phone at the time. Anyway, glad it works, so duckdb doesn't only do the theologically right thing from the perspective of the blog post, but also offers the heretic option in a simple way.

Only sad thing to be taken from this is that duckdb can't currently do "add X wall clock days to timestamptz Y from the perspective of timezone Z" with any Z other than the one on the settings (in particular not with a Z that's variable across a table). Not that that's surprising given that TIMESTAMPTZ doesn't store timezones, and not that I can see myself needing it anytime soon, but potentially something for a future separate feature request. (Which would strictly speaking also require specifying what to do if the result lands in the midst of a daylight savings fold or a gap, so really should be a request for a function add(TIMESTAMPTZ, INTERVAL, TIMEZONE, FOLD, GAP) and there might be more complications I haven't thought about. )

from duckdb.

paultiq commented on June 2, 2024

Yes that does what @paultiq is expecting: ...

Yes, thanks... that's the part I was hung up on. That this version is (24*60 - 1) hours, but the Pandas version is 24*60 hours.

I totally accept this is a. a question for the theologians and b. that this is how icu works. So, feel free to close as "as designed" :) Thanks for the discussion.

import duckdb

conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ start_date , '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day end_date"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))

│        start_date        │         end_date         │
│ timestamp with time zone │ timestamp with time zone │
├──────────────────────────┼──────────────────────────┤
│ 2018-02-14 23:00:00-05   │ 2018-04-15 23:00:00-04   │
└──────────────────────────┴──────────────────────────┘

Yet this version:

import datetime 
import pandas
pts = pandas.Timestamp("2018-02-14 23:00:00", tz="America/New_York")
print(pts)
print(pts + datetime.timedelta(days=60))

is ```
2018-02-14 23:00:00-05:00
2018-04-16 00:00:00-04:00

from duckdb.

soerenwolfers commented on June 2, 2024

(You can close issues yourself as "won't do')

from duckdb.

Inconsistent date offsets with timezone handling about duckdb HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent