Comments (21)
iMHO we shouldn't switch the default behaviour to suit Iceberg. Changing it will break behaviour for Delta and Hive.
Ideally it is solved in the engine itself. I have filed a PR on Trino to fix this but it still under review.
from dbt-trino.
Once this Trino pr is merged, we can remove those docs.
from dbt-trino.
I just tried making a dbt snapshot with a dbt-trino connector that eventually end up in iceberg files.
Underlying, dbt snapshots use current_timestamp
, so I also got the following error:
13:58:41 TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.", query_id=20221228_135841_01246_szyby)
I then modified the current_timestamp
macro in the dbt-trino adapter by casting to timestamp(6), as in RobbertDM@6e8d13d
Do I get correctly from this thread and the one linked above that this problem will soon be resolved, and this modification to the current_timestamp
will not be necessary anymore?
from dbt-trino.
I dove a bit deeper and had some additional issues with current_timestamp
. When taking a snapshot, it would always put a wrong updated_at
field in the snapshot table, always off by one hour. That is because it appears to take the UTC date and then cast it to timestamp(6)
(which is the same as timestamp(6) without time zone
).
When using with time zone
as in https://github.com/starburstdata/dbt-trino/pull/204/files , the timestamps are nicely stored as UTC. I opened a PR #204.
from dbt-trino.
Note that current_timestamp already returns timestamp with time zone. See trino docs.
To make it compatible with Iceberg you should just override the macro as mentioned in the docs.
{% macro trino__current_timestamp() %}
current_timestamp(6)
{% endmacro %}
from dbt-trino.
Thanks! Indeed, the cast is unnecessary. Changed it in the PR.
Do you think it makes sense to have current_timestamp(6)
there by default, so that not every iceberg user has to manually modify a macro file right after running pip install dbt-trino
? Would it break anything to have current_timestamp(6)
in there?
Or does the above thread (linked in your original post) resolve the issue altogether?
from dbt-trino.
Ok, makes sense. Thanks a lot for following up so quickly! :)
from dbt-trino.
I found another case where this timestamp(3)
vs. timestamp(6) with time zone
thing causes trouble. If you run dbt snapshot
on a source table, then add a timestamp(6) with time zone
column, run dbt snapshot
again, it will get into this do create_columns
statement at https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/snapshots/snapshot.sql#L44-L58
I believe that either there, or in https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/snapshots/helpers.sql#L11 , a part of this data type gets lost, because the query that's eventually run on starburst is alter table "iceberg"."dbt_snapshots"."my_snapshot" add column "extract_ts" timestamp with time zone
.
So somewhere, this (6)
gets lost 🤔
Of course, that results in
TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.", query_id=20230127_150706_02226_b7axv)
Edit:
I narrowed it down to:
adapter.get_missing_columns(staging_table, target_relation)
is returning
columns[TrinoColumn(column='extract_ts', dtype='timestamp with time zone', char_size=6, numeric_precision=None, numeric_scale=None)]
While, when describe "iceberg"."dbt_robbert"."my_snapshot__dbt_tmp";
, it says for that missing column timestamp(6) with time zone
Is this a bug in get_missing_columns
? Should I make a new issue?
from dbt-trino.
Good find. I don't think dbt-trino
implements get_missing_columns
it looks like it's being taken from dbt-core, so I assume that you should create an issue there.
from dbt-trino.
I will, thank you!
from dbt-trino.
I don't think there is a bug in dbt-core. I think we now fill up char_size with the precision of time types while actually char_size is used for ensuring characters are not cut off and expanded. In the case of time types we should not alter the type.
@damian3031 : this is related to some code you changed recently
from dbt-trino.
Ah indeed, it seems like
dbt-trino/dbt/adapters/trino/column.py
Lines 32 to 36 in fa901fa
is the culprit.
I'm not sure why you're rebuilding the data type from the regex matches instead of passing on the raw one, @damian3031 ?
from dbt-trino.
The reason is that from_description
should return data_type
and char_size
(or scale
and precision
) as separate tuple elements. So, if raw_data_type argument is timestamp(6) with time zone, this method returns timestamp with time zone as data_type and 6 as char_size.
I think that it is correct implementation regarding purpose of this method.
I will take closer look at this specific use case, maybe we should somehow adjust this method or code which is invoking it.
from dbt-trino.
I think it will be safer to not alter the type if not string or numerical. I think dbt only widens string types. The variable name charsize seems also to point to that.
from dbt-trino.
So do you mean to extend this condition to sth like:
if raw_data_type.startswith(("array", "map", "row", "date", "time", "interval", "boolean")):
return cls(name, raw_data_type)
from dbt-trino.
Yes, i would also invert the logic to only do the size detection for numeric and string types.
from dbt-trino.
The reason is that
from_description
should returndata_type
andchar_size
(orscale
andprecision
) as separate tuple elements. So, if raw_data_type argument is timestamp(6) with time zone, this method returns timestamp with time zone as data_type and 6 as char_size. I think that it is correct implementation regarding purpose of this method. I will take closer look at this specific use case, maybe we should somehow adjust this method or code which is invoking it.
Is there a spec somewhere that describes this? I find it a bit weird to chop the datatype in pieces 🤔
I'm using https://github.com/calogica/dbt-expectations/blob/main/macros/schema_tests/column_values_basic/expect_column_values_to_be_in_type_list.sql right now, and again failing because I have columns that are DECIMAL(20,4)
, but in there, they also just call adapter.get_columns_in_relation
and access only the dtype
fields.
Could it be an option to fill in char_size
where needed, but always return raw_data_type
as data_type
?
from dbt-trino.
Btw I also have this problem with varchar(20)
.
from dbt-trino.
Hmmm, trino itself seems to indeed separate the datatype and precision:
https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/adapters/base/column.py#L124-L160
Then I guess dbt-expectations is wrong in only accessing dtype
from dbt-trino.
Apparently, the solution is to use column.data_type
, not column.dtype
.
See https://docs.getdbt.com/reference/dbt-classes#column and https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/adapters/base/column.py#L41
from dbt-trino.
^ this is me discovering the difference between data_type
and dtype
🤦
In the snapshot logic, it already uses data_type
so you can ignore my last 4 comments.
Sorry for the noise. Carry on :-)
from dbt-trino.
Related Issues (20)
- Invalid SQL generated for dbt_project_evaluator HOT 1
- dbt-trino adds comment into table create statement by default HOT 2
- dbt-Trino snapshot cannot create __dbt_tmp table after first run HOT 12
- Add tests for aliases
- Add copyright notices to files HOT 1
- Support elementary in improving data-quality capabilities HOT 3
- Extend Hive test coverage
- upgrade to support dbt-core v1.7.0 HOT 1
- Incorrect Schema Used When Renaming Materialized Views HOT 4
- Support CASCADE dropping relations
- Solving for large stage depths HOT 3
- Support `CREATE OR REPLACE` HOT 2
- get_relation not working as already_exists HOT 4
- deltalake rename managed table not allowed arised HOT 1
- Failed to connect to Trino cluster using LDAP auth and HTTP connection HOT 3
- Extra credentials in connection HOT 3
- Not able to create snapshot of a model using dbt-trino==1.7.1 HOT 3
- Refactor to use dbt-adapters interface layer HOT 2
- merge_exclude_columns doesn't work
- upgrade to support dbt-core v1.8.0 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-trino.