Comments (16)
Isn't duckdb's duckdb_plus60 the correct answer according to the linked blog post?
from duckdb.
If your problem is that python_plus60
gives the "wrong" result, then that is not a duckdb issue, since it can be reproduced as follows
import datetime
import pandas
import dateutil
print(pandas.Timestamp("2018-02-14 12:00:00-05:00") + datetime.timedelta(days=60))
print(pandas.Timestamp("2018-02-14 12:00:00", tz="America/New_York") + datetime.timedelta(days=60))
print(pandas.Timestamp("2018-02-14 12:00:00", tz=dateutil.tz.gettz("America/New_York")) + datetime.timedelta(days=60))
2018-04-15 12:00:00-05:00
2018-04-15 13:00:00-04:00
2018-04-15 13:00:00-04:00
from duckdb.
Isn't duckdb's duckdb_plus60 the correct answer according to the linked blog post?
I interpret as the opposite: that the correct timezone aware offset is the one given by the python. And that duckdb is returning the non normalized version. But, let me mull over the pandas version you sent!
from duckdb.
The first example (UTC-05:00) isn't reproducing, since it's a fixed offset w/ no daylight savings component.
The behavior I'm expecting is given by your second and third examples:
print(pandas.Timestamp("2018-02-14 12:00:00", tz="America/New_York") + datetime.timedelta(days=60))
2018-04-15 13:00:00-04:00
from duckdb.
What time zone are you in?
from duckdb.
Could you summarize what you think is wrong in duckdb, at the moment? (I still don't think anything is wrong.)
Also, note that your results depend on duckdb's timezone setting, so you should explicitly set that in your example via
conn.query("SET TimeZone='America/New_York'")
If you use UTC instead, you get different results (*):
import duckdb
conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT ('2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day) - '2018-04-15 17:00:00+00'::TIMESTAMPTZ"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ((CAST('2018-02-14 12:00:00 America/New_York' AS TIMESTAMP WITH TIME ZONE) + to_days(CAST(trunc(CAST(60 AS DOUBLE)… │
│ interval │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 00:00:00 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ ((CAST('2018-02-14 12:00:00 America/New_York' AS TIMESTAMP WITH TIME ZONE) + to_days(CAST(trunc(CAST(60 AS DOUBLE)… │
│ interval │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ -01:00:00 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(*) and I believe correctly so: a sensible definition of TIMESTAMPTZ + INTERVAL
is: (1) convert the TIMESTAMPTZ
to TIMESTAMP
(i.e.: how would someone in the configured timezone write down the point in time specified by the given TIMESTAMPTZ
) (2) add the interval (3) convert to TIMESTAMPTZ
(i.e., what point in time would be denoted by someone in the configured timezone with the given TIMESTAMP
)
from duckdb.
from duckdb_settings() where name='TimeZone';
name | value | description | input_type | scope |
---|---|---|---|---|
TimeZone | America/Los_Angeles | The current time zone | VARCHAR | GLOBAL |
select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ as d, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day as base_date;
d | base_date |
---|---|
2018-02-14 09:00:00-08 | 2018-04-15 09:00:00-07 |
This looks correct for me (at least this is what ICU does when you cross a DST boundary: preserve wall clock time).
from duckdb.
Pandas may do something different here, but what adding 60 days across a DST boundary should return is a matter for theologians really.
I'd also like to point out that pandas uses pytz
, which is deprecated(!) and suffers from the Y2038 problem (something one of my clients tripped over the other day when doing long range forecasting.)
from duckdb.
Pandas may do something different here, but what adding 60 days across a DST boundary should return is a matter for theologians really.
I'd also like to point out that pandas uses
pytz
, which is deprecated(!) and suffers from the Y2038 problem (something one of my clients tripped over the other day when doing long range forecasting.)
I'll answer the other questions this evening, but doesn't DuckDB also use pytz? Or am I misreading the discussion here: #11974
from duckdb.
No, we use ICU internally (which doesn't have these problems). We only use pytz
when producing pandas
output because that its what pandas
requires.
from duckdb.
No, we use ICU internally (which doesn't have these problems). We only use
pytz
when producingpandas
output because that its whatpandas
requires.
Yup, that's what I was referring to. I was expecting it (a timezone aware date plus an interval 60 day) to behave the same as the second example above.
I haven't (yet) had a chance to review your subsequent comments which seems to contradict the examples I had at the beginning.
from duckdb.
Not sure what you meant by "the second example above"?
It sounds to me like you are expecting the date arithmetic to add exactly 60 days to the instant, which is what pandas does? ICU does not behave that way, so duckdb does not either. Moreover, this is not what Postgres does:
hawkfish=# select '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ, '2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval '60 day' as base_date;
timestamptz | base_date
------------------------+------------------------
2018-02-14 09:00:00-08 | 2018-04-15 09:00:00-07
(1 row)
(BTW I am in America/Los_Angeles
, which is why the displayed offsets are different.)
from duckdb.
Yes that does what @paultiq is expecting:
select
'2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ d,
'2018-02-14 12:00:00 America/New_York'::TIMESTAMPTZ + interval (24 * 60) hours as base_date;
d | base_date |
---|---|
2018-02-14 09:00:00-08 | 2018-04-15 10:00:00-07 |
from duckdb.
Sorry, didn't meant to make you look talking to yourself; deleted my comment so I could check first whether it actually works because I was on my phone at the time. Anyway, glad it works, so duckdb doesn't only do the theologically right thing from the perspective of the blog post, but also offers the heretic option in a simple way.
Only sad thing to be taken from this is that duckdb can't currently do "add X wall clock days to timestamptz Y from the perspective of timezone Z" with any Z other than the one on the settings (in particular not with a Z that's variable across a table). Not that that's surprising given that TIMESTAMPTZ doesn't store timezones, and not that I can see myself needing it anytime soon, but potentially something for a future separate feature request. (Which would strictly speaking also require specifying what to do if the result lands in the midst of a daylight savings fold or a gap, so really should be a request for a function add(TIMESTAMPTZ, INTERVAL, TIMEZONE, FOLD, GAP)
and there might be more complications I haven't thought about. )
from duckdb.
Yes that does what @paultiq is expecting: ...
Yes, thanks... that's the part I was hung up on. That this version is (24*60 - 1)
hours, but the Pandas version is 24*60
hours.
I totally accept this is a. a question for the theologians and b. that this is how icu works. So, feel free to close as "as designed" :) Thanks for the discussion.
import duckdb
conn=duckdb.connect()
conn.query("SET TimeZone='UTC'")
query = "SELECT '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ start_date , '2018-02-14 23:00:00 America/New_York'::TIMESTAMPTZ + interval 60 day end_date"
print(conn.query(query))
conn.query("SET TimeZone='America/New_York'")
print(conn.query(query))
│ start_date │ end_date │
│ timestamp with time zone │ timestamp with time zone │
├──────────────────────────┼──────────────────────────┤
│ 2018-02-14 23:00:00-05 │ 2018-04-15 23:00:00-04 │
└──────────────────────────┴──────────────────────────┘
Yet this version:
import datetime
import pandas
pts = pandas.Timestamp("2018-02-14 23:00:00", tz="America/New_York")
print(pts)
print(pts + datetime.timedelta(days=60))
is ```
2018-02-14 23:00:00-05:00
2018-04-16 00:00:00-04:00
from duckdb.
(You can close issues yourself as "won't do')
from duckdb.
Related Issues (20)
- ARRAY_VALUE(0.5, 0.4, 0.1) --> Unimplemented type "DECIMAL(2,1)[3]"
- Error parsing dataframe from variable name in 0.10.3 HOT 1
- C API: Writing (large) list with Appender fails HOT 7
- cannot define a macro inside a cursor - Cross catalog dependencies are not supported. HOT 5
- Why is the db.file size still large after deleting data or dropping tables? HOT 18
- Inconsistent/ unexpected error when using strptime() in combination with format-list option HOT 1
- `get_table_names` erroring on query using `generate_series` in CTE, but executes it without issue
- Aggregate Regression function SLOPE and INTERCEPT seem to produce wrong results HOT 8
- read_csv invalidates DuckDB instance HOT 7
- Polars example raises InvalidInputException error HOT 2
- Parquet hive partition write error with OVERWRITE_OR_IGNORE - Deleting other partition data HOT 3
- read_parquet defaults hive_partitioning to auto, not false HOT 2
- ATTACH does not show new table in attached DB on DBeaver HOT 1
- Segmentation fault during PIVOTING table (CLI) using struct field HOT 3
- Entry type not supported in PhysicalCopyDatabase HOT 3
- BM25 matching scores seems to be invalid HOT 1
- read_csv with with store_rejects=true does not work as intented with constraints HOT 3
- DuckDB goes into fatal mode when it fails to write parquet data in a mounted file system
- Issue with Python duckdb.read_csv not working with 0.10.2+ HOT 7
- Treatment of interval type in Python
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb.