starburstdata / dbt-trino Goto Github PK

View Code? Open in Web Editor NEW

189.0 8.0 49.0 812 KB

The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)

License: Apache License 2.0

Python 99.43% Shell 0.40% Makefile 0.17%

analytics-engineering data-engineering dbt python starburst trino sql

dbt-trino's Introduction

dbt-trino

Introduction

dbt is a data transformation workflow tool that lets teams quickly and collaboratively deploy analytics code, following software engineering best practices like modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL to build production-grade data pipelines.

One frequently asked question in the context of using dbt tool is:

Can I connect my dbt project to two databases?

(see the answered question on the dbt website).

TL;DR dbt stands for transformation as in T within ELT pipelines, it doesn't move data from source to a warehouse.

dbt-trino adapter uses Trino as a underlying query engine to perform query federation across disperse data sources. Trino connects to multiple and diverse data sources (available connectors) via one dbt connection and process SQL queries at scale. Transformations defined in dbt are passed to Trino which handles these SQL transformation queries and translates them to queries specific to the systems it connects to create tables or views and manipulate data.

This repository represents a fork of the dbt-presto with adaptations to make it work with Trino.

Compatibility

This dbt plugin has been tested against Trino version 445, Starburst Enterprise version 435-e.4 and Starburst Galaxy.

Setup & Configuration

For information on installing and configuring your profile to authenticate to Trino or Starburst, please refer to Starburst and Trino Setup in the dbt docs.

Trino- and Starburst-specific configuration

For Trino- and Starburst-specific configuration, you can refer to Starburst (Trino) configurations on the dbt docs site.

Contributing

Want to report a bug or request a feature? Let us know on Slack in the #db-starburst-and-trino channel, or on Trino slack in the #dbt channel, or open an issue
Want to help us build dbt-trino? Check out the Contributing Guide

Release process

First 5 steps are ONLY relevant for bumping minor version:

Create 1.x.latest branch from the latest tag corresponding to current minor version, e.g. git checkout -b 1.6.latest v1.6.2 (when bumping to 1.7). Push branch to remote. This branch will be used for potential backports.
Create new branch (Do not push below commits to 1.x.latest). Add a new entry in .changes/0.0.0.md that points to the newly created latest branch.
Run changie merge to update README.md. After that, remove changie files and folders related to current minor version. Commit.
Bump version of dbt-tests-adapter. Commit.
Merge these 2 commits into the master branch. Add a Skip Changlelog label to the PR.

Continue with the next steps for a minor version bump. Start from this point for a patch version bump:

Run Version Bump workflow. The major and minor part of the dbt version are used to associate dbt-trino's version with the dbt version.
Merge the bump PR. Make sure that test suite pass.
Run dbt-trino release workflow to release dbt-trino to PyPi and GitHub.

Backport process

Sometimes it is necessary to backport some changes to some older versions. In that case, create branch from x.x.latest branch. There is a x.x.latest for each minor version, e.g. 1.3.latest. Make a fix and open PR back to x.x.latest. Create changelog by changie new as ususal, as separate changlog for each minor version is kept on every x.x.latest branch. After merging, to make a release of that version, just follow instructions from Release process section, but run every workflow on x.x.latest branch.

Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct.

dbt-trino's People

Contributors

Stargazers

Watchers

dbt-trino's Issues

support JWT authentication

looks like trino's python client supports JWT. An easy lift would be that a user could pass a JWT token as a target field in the user's profiles.yml.

The user below could then use some script to store the token as an environment variable.

Happy to advise on how this might work, LMK!

relevant dbt Slack thread

I'm trying to connect dbt to presto as that is one of our data stores. The only way that we are able to connect to this is through okta verification using a token that lasts only 24 hours. From the looks of the documentation on the profile setup it looks like only ldap and kerberos is supported. Is there currently a way to fetch a token or is this something that would need to be done on my end to pass over to the presto profile?

Add Trino specific methods for casting types

As referenced in #35

In https://github.com/dbt-labs/dbt-utils/blob/73a65562bf3d9b31a1eca0b27118056ad048d8fc/macros/cross_db_utils/datatypes.sql#L3-L6 there is a utility method used when casting to string

Add Trino specific _type_xxx methods in the Trino adapter and add also a corresponding unit test which creates a model by using CAST for all these types. Maybe this testing thought is wrong - if so, please use https://github.com/dbt-labs/dbt-utils#union_relations-source to test this functionality.

Source Freshness is not implemented

When I add source freshness configuration and run dbt source freshness with dbt-trino, I get

Runtime Error in source mv_mantis_mfr (models/staging/sources.yml)
  ERROR: current_timestamp macro not implemented for adapter trino

Add incremental models

Opening this task to discuss the best approach to add incremental.

Trino is planning on adding the merge operation trinodb/trino#7708 but I'd say we can start by adding the feature with a delete+insert approach as stated here.

For now, I'd like to focus on adding support to Hive and Iceberg connector.

Any thoughts on this?

dbt_utils package union_relations not working with Trino

In our project we make extensive use of the dbt_utils library. the dbt_utils.union_relations doesn't work because it generates invalid sql

create table "dataproduct_levenslangleren"."zone_levenslangleren"."levenslang_leren__dbt_tmp"
    
  as (
    

        (
            select

                cast('"dataproduct_levenslangleren"."zone_levenslangleren"."levenslang_leren_ho"' as 
    string
) as _dbt_source_relation,
                
                    cast("laadoperatie" as character varying(256)) as "laadoperatie" ,
                    cast("record max laadoperatie" as character varying(256)) as "record max laadoperatie" ,

Two issues here:

one is with dbt core: the varying(256) is hardcoded in dbt core and doesn't use the overriden TrinoColumn class methods. see dbt-labs/dbt-core#4603 (PR submitted)
in dbt utils there is following method
https://github.com/dbt-labs/dbt-utils/blob/73a65562bf3d9b31a1eca0b27118056ad048d8fc/macros/cross_db_utils/datatypes.sql#L3-L6

This requires the adapter to provide specific implementation if the default ('string') is not supported:

{% macro trino__type_string() %}
    varchar
{% endmacro %}

Will post PR shortly.

Support setting client_tags in profile configuration

Describe the feature

We plan to use client_tags in Trino to set resource group policies. It would be really helpful if we can set client tags in the trino profile configuration

Are you interested in contributing this feature?

If someone can help me, where do we set/define profile configuration parameters, I would really like to contribute

Add testing for Starburst Galaxy

Describe the feature

Add testing for Starburst Galaxy to ensure dbt-trino compatibility.

Describe alternatives you've considered

No response

Who will benefit?

Starburst Galaxy users.

Are you willing to submit PR?

Yes I am willing to submit a PR!

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

migrate necessary cross-db macros into adapter and ensure they're tested accordingly
remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
add new basic tests BaseDocsGenerate and BaseDocsGenReferences
consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

implement method and tests for connection retry logic

oidc support for dbt-trino

A great and unique feature of Trino is that it allows users to authenticate through OIDC. This is not something commonly seen with databases (only Snowflake and Denodo support this feature). it's a much safer use as it doesn't require storing any local credentials.

I saw that this feature is implemented in the trino-python-client but not yet supported in the dbt adapter.

Some gotchas:

As you can specify # of threads and each model usually generates lots of queries, it might be annoying to see lots of browser windows opening. So ideally the access token is cached and reused for every query.
Some dbt jobs may take a long time, we need to ensure that queries don't fail because of token expiration while running a dbt project.

Migrate tests from memory connector to PostgreSQL

To test advanced features of dbt-trino, tests should be able to execute queries like ALTER TABLE ADD/DROP column on test catalog. Currently, all tests are executed upon memory connector, which has only limited support to SQL statements. PostgreSQL instance could be easily spin up in docker-compose used in tests alongside Trino/Starburst cluster. Migration process will include:

spinning up PostgreSQL db in docker comopse
create new catalogs and remove memory catalogs in Trino and Starburst
update profile to the new catalog

Fails with DBT 1.0.0

I think this needs an update to support the current version of DBT.

When running "dbt debug" the following error is returned:

dbt was unable to connect to the specified database.
The database returned the following error:

Runtime Error
'str' object has no attribute '_message'

DBT it set to use the trino driver which in turn is configured to talk to an AWS Data Catalog and S3

dbt 1.2 - Move default cross-database macros from dbt-trino-utils into dbt-trino

For dbt-core 1.2, we're going to move the default cross-database macros into dbt-core. As a maintainer, you'll move your cross_db_utils folder from dbt-trino-utils into dbt-trino at that time.

I'll be happy to help you accomplish those details when that time comes.

You can read a little about the background of this effort here and here.

See dbt-labs/hubcap#118 (review)

CVE: 2020-14343 found in PyYAML - Version: 5.3.1 [PYTHON]

Veracode Software Composition Analysis

Attribute	Details
Library	PyYAML
Description	YAML parser and emitter for Python
Language	PYTHON
Vulnerability	Arbitrary Code Execution
Vulnerability description	pyyaml is vulnerable to arbitrary code execution. The `.yaml` files that are parsed by `FullLoader` uses the unsafe `yaml.load()` by default. This vulnerability exists due to a bypass of the fix for CVE-2020-1747.
CVE	2020-14343
CVSS score	10
Vulnerability present in version/s	3.12-5.3.1
Found library version/s	5.3.1
Vulnerability fixed in version	5.4b1
Library latest version	6.0
Fix

Links:

Incremental loads don't work where schema is different. Implement on_schema_change

Hey guys,

If your destination table doesn't have the same columns as your model the whole model fails.
Example

If Table A has ID, Text1, Date1 in it's field list and the dbt model has TableA as ID, Text1 on dbt run the model will fail regardless of the on_schema_change setting. It seems the incremental macro retrieves the list of columns to insert into from the destination table and never checks them against the model for schema diffs?

Support for persist_docs

Table and column descriptions are not persisted with dbt-trino

Transaction timeout for longer queries

Background

I have been experiencing this issue for a long time. This issue happens when we try to execute the query which takes more than 5min. To reproduce this issue use dbt seed which can insert some rows which will take more than 5mins.

Reason

The reason for this issue is that, In Trino, the default value of transaction.idle-timeout is 5minutes and dbt executes commands in the following flow which causes this error

START TRANSACTION
Some queries which will take more than 5min
COMMIT <- This will throw an error

Error Log

trino.exceptions.DatabaseError: failed to commit transaction 73dc7443-ec53-4911-95be-a372792743b4: TrinoUserError(type=USER_ERROR, name=UNKNOWN_TRANSACTION, message="Unknown transaction ID: 73dc7443-ec53-4911-95be-a372792743b4. Possibly expired? Commands ignored until end of transaction block", query_id=20211121_145311_00013_7rttf)

Is there any specific reason why transactions are explicitly started in dbt-trino in spite of using auto-commit in python connection?

dbt-trino/dbt/adapters/trino/connections.py

Line 195 in 8088447

isolation_level=IsolationLevel.AUTOCOMMIT,

Other reference:

dbt-labs/dbt-presto#75

Glue Catalog compatible materialization

I have an idea for a feature and wonder if it makes sense to be part of dbt-trino or not.

Trino is commonly deployed on AWS with AWS Glue Catalog - you see people referencing it a lot. AWS Glue Catalog doesn't support renaming tables, so normal table materialization doesn't work. So anyone using Trino, dbt and Glue has to override the materialization macro to be able to materialize tables.

The way we do this is we just move things around so that instead of create new, rename old, rename new, drop old we just do drop old, create new. That causes table to be unavailable for a period of time, but works as a quick fix. Also we have to maintain our custom macro, and ship it with every dbt project.

Could this be looked at as part of dbt-trino? maybe creating official "table_norename" materialization, or looking into materialized views, insert overwrite or something? It would make lives of everyone using Trino with AWS much easier.

Impersonation is not working with JWT authentication

In the current version the impersonation is not working in combination of jwt authentication.

It seems that the authentication factory for jwt will reset the username if set in profile.

Migrate to new framework for adapter testing during v1.1 upgrade

Hey dbt-trino maintainers!

We've been developing a new framework for all integration/functional testing in dbt-core. One of our goals in the initiative has been to replace the suite of basic test cases for adapter plugins.

For now, the framework is comprised of:

Pytest utilities included in dbt-core==1.1.0b1
A set of "basic" test cases, at exact parity with the test cases built into the old .dbtspec / pytest-dbt-adapter

I have some draft docs (dbt-labs/docs.getdbt.com#1263) for using the new framework: https://deploy-preview-1263--docs-getdbt-com.netlify.app/docs/contributing/testing-a-new-adapter

It should be much easier to:

Override/reimplement test cases (as you've done here)
"Opt into" test cases for more-advanced functionality, which we'll be gradually adding into the same package of adapter-facing test cases
Add your own functional test cases for Trino-specific behavior, using the same built-in utilities, and reusing project fixtures if appropriate

If it proves helpful, you can also check out the way we've implemented the framework in some of our own adapter plugins:

Thanks for all the great work :)

Use Veracode SCA for security audit

issue with big seed file

error whle seeding a big csv file:

Runtime Error in seed com_event_log_info (data/<filename>csv)
  TrinoUserError(type=USER_ERROR, name=QUERY_TEXT_TOO_LARGE, message="Query text length (1380022) exceeds the maximum length (1000000)", query_id=20211116_121112_10193_2mbb4)

same file could be seeded without problem with dbt-presto driver.

dbt-core version pinning

Can dbt-trino be unpinned from the specific dbt-core patch version?

Per dbt's docs, this seems in line with the way they are managing dbt-core versioning:
https://docs.getdbt.com/docs/core-versions#upgrading-to-new-patch-versions

In my particular use case I want to take advantage of dbt-core's 1.0.4 patch.

discussion: dbt-core's feedback from initial review for adapter verification

** would have been a Discussion if they were enabled for this repo **

as part of our new adapter verification process, the dbt core adapters engineering team and I met today to do a high-level review of the dbt-trino codebase. good news, your adapter passes the sniff test! That means that while the engineers haven’t yet had the time to review every single line of code (though they plan to in the next few sprints), there are no glaring red flags that prevent the adapter from being verified.

That said, here's some notes of things we noticed:

it's great to see that Trino works without having to override hardly any SQLAdapter methods.

It took a moment to realize why TrinoCredentialsFactory needed to exist. But using dataclass parameter validation to guide users to the correct sets of profile fields is a great choice. Perhaps though there's a way we can work to make this easier.

dbt-trino/dbt/adapters/trino/connections.py

Lines 30 to 45 in e9194b1

 class TrinoCredentialsFactory: 

 @classmethod 

 def _create_trino_profile(cls, profile): 

 if "method" in profile: 

 method = profile["method"] 

 if method == "ldap": 

 return TrinoLdapCredentials 

 elif method == "certificate": 

 return TrinoCertificateCredentials 

 elif method == "kerberos": 

 return TrinoKerberosCredentials 

 elif method == "jwt": 

 return TrinoJwtCredentials 

 elif method == "oauth": 

 return TrinoOauthCredentials 

 return TrinoNoneCredentials

is there anything unique about Trino today w.r.t. prepared statements & bindings? or is that legacy code?

dbt-trino/dbt/adapters/trino/connections.py

Lines 333 to 353 in e9194b1

 @classmethod 

 def _escape_value(cls, value): 

 """A not very comprehensive system for escaping bindings. 

  I think "'" (a single quote) is the only character that matters. 

  """ 

 numbers = (decimal.Decimal, int, float) 

 if value is None: 

 return "NULL" 

 elif isinstance(value, str): 

 return "'{}'".format(value.replace("'", "''")) 

 elif isinstance(value, numbers): 

 return value 

 elif isinstance(value, datetime): 

 time_formatted = value.strftime("%Y-%m-%d %H:%M:%S.%f")[:-3] 

 return "TIMESTAMP '{}'".format(time_formatted) 

 elif isinstance(value, date): 

 date_formatted = value.strftime("%Y-%m-%d") 

 return "DATE '{}'".format(date_formatted) 

 else: 

 raise ValueError("Cannot escape {}".format(type(value)))

can you give more context on why you needed to parse the DatabaseError message?

dbt-trino/dbt/adapters/trino/connections.py

Lines 360 to 368 in e9194b1

 def exception_handler(self, sql): 

 try: 

 yield 

 # TODO: introspect into `DatabaseError`s and expose `errorName`, 

 # `errorType`, etc instead of stack traces full of garbage! 

 except Exception as exc: 

 logger.debug("Error while running:\n{}".format(sql)) 

 logger.debug(exc) 

 raise dbt.exceptions.RuntimeException(str(exc))

For example:

do you think that some of the logic in TrinoAdapter.add_query() could upstream into dbt-core?

dbt-trino/dbt/adapters/trino/connections.py

Lines 416 to 450 in e9194b1

 def add_query(self, sql, auto_begin=True, bindings=None, abridge_sql_log=False): 

 connection = None 

 cursor = None 

 # TODO: is this sufficient? Largely copy+pasted from snowflake, so 

 # there's some common behavior here we can maybe factor out into the 

 # SQLAdapter? 

 queries = [q.rstrip(";") for q in sqlparse.split(sql)] 

 for individual_query in queries: 

 # hack -- after the last ';', remove comments and don't run 

 # empty queries. this avoids using exceptions as flow control, 

 # and also allows us to return the status of the last cursor 

 without_comments = re.sub( 

 re.compile("^.*(--.*)$", re.MULTILINE), "", individual_query 

 ).strip() 

 if without_comments == "": 

 continue 

 parent = super(TrinoConnectionManager, self) 

 connection, cursor = parent.add_query( 

 individual_query, auto_begin, bindings, abridge_sql_log 

 ) 

 if cursor is None: 

 raise dbt.exceptions.RuntimeException( 

 "Tried to run an empty query on model '{}'. If you are " 

 "conditionally running\nsql, eg. in a model hook, make " 

 "sure your `else` clause contains valid sql!\n\n" 

 "Provided SQL:\n{}".format(connection.name, sql) 

 ) 

 return connection, cursor

please let us know if there any other particular areas you’d like us to review more closely. Looking forward to collaborating!

request for enhanced properties feature

The properties feature the implement recently is very useful but it would be better if it could support following use-cases:

combine the properties in general configuration (dbt_project.yml) and specific configuration (models). For now, if we want to set transactional=true for all models but partition just for few tables, we must also including the transactional=true in every models that we only need to change the partitions, because only one properties will be used here. What we want is set transactional=true in general level and only partition in specific models.
the properties should ignores the key/value pairs if the key or value is null (or both). The use-case is that sometimes we need to control the property dynamically depend on some other variables. It is easier to use the jinja2 for key/value in this case to control whether we should include the key/value into the properties or not. For example, we have 2 env that only one needs to use the transaction. Utilize the DBT_TRANSACTION environment variable will help us to use the same code on both environments.

+properties:
    transactional: "{{ env_var('DBT_TRANSACTION','true') }}"

Support delete+insert materialization

Describe the feature

Trino at this moment does not support merge statements, that's why this materialization delete+insert is can implemented by first using a delete statement to delete records in the target table that are to be updated, and then an insert statement.

Trino can't read table when there is no files on S3

Hi we have a scenario where the data is written on S3 and Glue metastore. There is a tricky situation where sometimes the tables are created with no data, ie the query returns 0 rows and therefore the table is created but there is no data.
When we try to query this table later we get the error:

SQL Error [16777220]: Query failed (#20220711_175341_09350_zawua): Partition location does not exist: s3://bucket/table...

Is there a way of pointing out to trino that the table is empty? I have no issues querying this table using Athena.
Thank you

CVE: 0000-0000 found in Jinja2 - Version: 2.11.3 [PYTHON]

Veracode Software Composition Analysis

Attribute	Details
Library	Jinja2
Description	A very fast and expressive template engine.
Language	PYTHON
Vulnerability	Cross-Site Scripting (XSS)
Vulnerability description	jinja2 is vulnerable to Cross Site Scripting. An attacker is able to inject and execute arbitrary Javascript through the `gettext` and `ngettext` function due to the lack of output sanitization.
CVE	null
CVSS score	5.8
Vulnerability present in version/s	2.5-3.0.0
Found library version/s	2.11.3
Vulnerability fixed in version
Library latest version	3.1.2
Fix	There is no fix version. Apply the following fix:

Links:

dbt-trino can't handle mixed case schema in profiles.yaml

Expected behavior

When using a schema with mixed case, I expect dbt-trino to be able to overwrite my existing tables.

Actual behavior

The first dbt run succeeds, however subsequent runs fail with

Compilation Error in model base_orders (models/base/base_orders.sql)
20:23:53    When searching for a relation, dbt found an approximate match. Instead of guessing 
20:23:53    which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_orders", or rename it to be less ambiguous.

Steps To Reproduce

use a mixed case profile eg

my_dbt_trino_project:
  outputs:

    dev:
      type: trino
      method: none  # optional, one of {none | ldap | kerberos}
      user: admin
      database: datalake
      host: localhost
      port: 8080
      schema: analyticsTEST
      threads: 1

  target: dev

Run dbt run multiple times.

Log output/Screenshots

❯ dbt run
20:23:44 Running with dbt=1.2.1
20:23:44 Unable to do partial parsing because profile has changed
20:23:45 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:

models.my_dbt_trino_project.example

20:23:45 Found 4 models, 0 tests, 1 snapshot, 0 analyses, 272 macros, 0 operations, 0 seed files, 3 sources, 0 exposures, 0 metrics
20:23:45
20:23:46 Concurrency: 1 threads (target='dev')
20:23:46
20:23:46 1 of 4 START view model analyticsTEST.base_customers ........................... [RUN]
20:23:47 1 of 4 OK created view model analyticsTEST.base_customers ...................... [SUCCESS in 0.30s]
20:23:47 2 of 4 START view model analyticsTEST.base_orders .............................. [RUN]
20:23:47 2 of 4 OK created view model analyticsTEST.base_orders ......................... [SUCCESS in 0.09s]
20:23:47 3 of 4 START view model analyticsTEST.base_payments ............................ [RUN]
20:23:47 3 of 4 OK created view model analyticsTEST.base_payments ....................... [SUCCESS in 0.11s]
20:23:47 4 of 4 START table model analyticsTEST.orders .................................. [RUN]
20:23:49 4 of 4 OK created table model analyticsTEST.orders ............................. [SUCCESS in 2.03s]
20:23:49
20:23:49 Finished running 3 view models, 1 table model in 0 hours 0 minutes and 4.24 seconds (4.24s).
20:23:49
20:23:49 Completed successfully
20:23:49
20:23:49 Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4
❯ dbt run
20:23:53 Running with dbt=1.2.1
20:23:53 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:

models.my_dbt_trino_project.example

20:23:53 Found 4 models, 0 tests, 1 snapshot, 0 analyses, 272 macros, 0 operations, 0 seed files, 3 sources, 0 exposures, 0 metrics
20:23:53
20:23:53 Concurrency: 1 threads (target='dev')
20:23:53
20:23:53 1 of 4 START view model analyticsTEST.base_customers ........................... [RUN]
20:23:53 1 of 4 ERROR creating view model analyticsTEST.base_customers .................. [ERROR in 0.01s]
20:23:53 2 of 4 START view model analyticsTEST.base_orders .............................. [RUN]
20:23:53 2 of 4 ERROR creating view model analyticsTEST.base_orders ..................... [ERROR in 0.00s]
20:23:53 3 of 4 START view model analyticsTEST.base_payments ............................ [RUN]
20:23:53 3 of 4 ERROR creating view model analyticsTEST.base_payments ................... [ERROR in 0.00s]
20:23:53 4 of 4 SKIP relation analyticsTEST.orders ...................................... [SKIP]
20:23:53
20:23:53 Finished running 3 view models, 1 table model in 0 hours 0 minutes and 0.25 seconds (0.25s).
20:23:53
20:23:53 Completed with 3 errors and 0 warnings:
20:23:53
20:23:53 Compilation Error in model base_customers (models/base/base_customers.sql)
20:23:53 When searching for a relation, dbt found an approximate match. Instead of guessing
20:23:53 which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_customers", or rename it to be less ambiguous.
20:23:53 Searched for: "datalake"."analyticsTEST"."base_customers"
20:23:53 Found: "datalake"."analyticstest"."base_customers"
20:23:53
20:23:53 > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
20:23:53 > called by macro materialization_view_trino (macros/materializations/view.sql)
20:23:53 > called by model base_customers (models/base/base_customers.sql)
20:23:53
20:23:53 Compilation Error in model base_orders (models/base/base_orders.sql)
20:23:53 When searching for a relation, dbt found an approximate match. Instead of guessing
20:23:53 which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_orders", or rename it to be less ambiguous.
20:23:53 Searched for: "datalake"."analyticsTEST"."base_orders"
20:23:53 Found: "datalake"."analyticstest"."base_orders"
20:23:53
20:23:53 > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
20:23:53 > called by macro materialization_view_trino (macros/materializations/view.sql)
20:23:53 > called by model base_orders (models/base/base_orders.sql)
20:23:53
20:23:53 Compilation Error in model base_payments (models/base/base_payments.sql)
20:23:53 When searching for a relation, dbt found an approximate match. Instead of guessing
20:23:53 which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_payments", or rename it to be less ambiguous.
20:23:53 Searched for: "datalake"."analyticsTEST"."base_payments"
20:23:53 Found: "datalake"."analyticstest"."base_payments"
20:23:53
20:23:53 > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
20:23:53 > called by macro materialization_view_trino (macros/materializations/view.sql)
20:23:53 > called by model base_payments (models/base/base_payments.sql)
20:23:53
20:23:53 Done. PASS=0 WARN=0 ERROR=3 SKIP=1 TOTAL=4

Operating System

macos

dbt version

1.2.2

Trino Server version

396

Python version

3.9.10

Are you willing to submit PR?

Yes I am willing to submit a PR!

Add support for `merge` incremental strategy

Describe the feature

Recently, Trino added support for MERGE statement https://trino.io/docs/current/sql/merge.html
dbt-trino can now benefit from it by adding merge incremental strategy.

failure when enable store_failures with boolean parameter

Expected behavior

by default, dbt works like this (ref from dbt-labs/dbt-core#1558)

When dbt creates a relation in the db, add a special flag to the relation (defaults to False)
When checking a node name match:

if that flag is present and quoting is disabled, do a lowercase compare
otherwise remain case sensitive

the default behavior will fail with following case:

when enable store_failures, if the test includes Boolean parameter, dbt will try to save a test model with name False. First save will be fine, but second will fail because dbt will compare False model with false model (the one that has been saved)

error will looks like this:

06:21:21  Compilation Error in test accepted_values_model____False__0__1 (models/xxx/model.yml)
06:21:21    When searching for a relation, dbt found an approximate match. Instead of guessing
06:21:21    which relation to use, dbt will move on. Please delete "hive"."lab_dbt_xxx"."accepted_values_model____false__0__1", or rename it to be less ambiguous.
06:21:21    Searched for: "hive"."lab_dbt_xxx"."accepted_values_model____False__0__1"   <-- False is in upper case
06:21:21    Found: "hive"."lab_dbt_xxx"."accepted_values_model____false__0__1"

Actual behavior

no error and the test model (result) should be saved correctly

Steps To Reproduce

set store_failures to true and use any test containing boolean value like this:

      - name: some_column_name
        quote: true
        tests:
          - accepted_values:
              values:
                - 0
                - 1
              quote: false

Log output/Screenshots

No response

Operating System

linux

dbt version

1.1.0/1.2.1

Trino Server version

1.1.0/1.2.2

Python version

3.8

Are you willing to submit PR?

Yes I am willing to submit a PR!

Ability to set session properties

Looks like using the trino (or presto) python client allows for setting session properties, however there is currently no way to set these and have them persist into the transaction when running dbt. Also it appears the python client needs this set in the actual connection. If a pre-hook is run, that also is run outside of a transaction of course and does not effect the running queries. Having the option to set session parameters can significantly increase the performance for pipelines through Trino.

Init from connect does include session_properties in trino-python-client.

def __init__(
        self,
        host,
        port=constants.DEFAULT_PORT,
        user=None,
        source=constants.DEFAULT_SOURCE,
        catalog=constants.DEFAULT_CATALOG,
        schema=constants.DEFAULT_SCHEMA,
        session_properties=None,
        http_headers=None,
        http_scheme=constants.HTTP,
        auth=constants.DEFAULT_AUTH,
        redirect_handler=None,
        max_attempts=constants.DEFAULT_MAX_ATTEMPTS,
        request_timeout=constants.DEFAULT_REQUEST_TIMEOUT,
        isolation_level=IsolationLevel.AUTOCOMMIT,
        verify=True
    ):

Remove the `trino__get_merge_sql` macro

Describe the feature

Following issues preventing us from using the default dbt merge macro:

Need to alias columns on WHEN not matched clause: trinodb/trino#14014
No support for WHEN false clause: trinodb/trino#13751

Once these are resolved we should remove our macro override.

Describe alternatives you've considered

No response

Who will benefit?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Enable snapshot and merge tests for Galaxy

Describe the feature

Enable snapshot and merge tests for Galaxy once Galaxy will be upgraded to 393.
Check TODO in the code in test_incremental_schema, test_incremental_merge and test_snapshot.

Describe alternatives you've considered

No response

Who will benefit?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Authentication with certificate

I'm transferring this issue over, as I previously incorrectly posted to the presto adapter repo.

As far as I can tell, there is no way to pass in a path to a certificate for an SSL connection to the API. I believe this needs to be set on the trino connection that is created.

dbt-trino/dbt/adapters/trino/connections.py

Line 185 in afa4091

trino_conn = trino.dbapi.connect(

Implement dbt snapshot

See https://docs.getdbt.com/docs/building-a-dbt-project/snapshots

Possible reference to implement this feature:
https://github.com/dbt-labs/dbt/tree/develop/plugins/postgres

https://github.com/dbt-labs/dbt-spark/blob/master/dbt/include/spark/macros/materializations/snapshot.sql

Support `Password file` authentication type?

I can only find ldap method to autentication trino server(With user and password), does it support password file configed trino server? If not, can you add this support?
I've tried but got error as following

$ dbt run
05:33:21  Running with dbt=1.0.1
05:33:21  Found 1 model, 2 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
05:33:21  
05:33:23  Encountered an error:
Runtime Error
  Runtime Error
    error 401: b'Unauthorized'

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

migrate necessary cross-db macros into adapter and ensure they're tested accordingly
remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
add new basic tests BaseDocsGenerate and BaseDocsGenReferences
consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

implement method and tests for connection retry logic

Add support for DECIMAL, REAL, DOUBLE, TIME in dbt seed

dbt seed command does not currently escape some data types supported by trino.

Perform the changes in the method dbt.adapters.trino.connections.ConnectionWrapper._escape_value

Make sure to add a corresponding model which contains all these types in the automated tests.

seeds: 
  jaffle_shop:
    seed_data_types:
      +column_types:
        c_timestamp: timestamp
        c_date: date
        c_time: time
        c_char: char(10)
        short_decimal: decimal(10,2)
        long_decimal: decimal(10,2)
        c_real: real
        c_float: float

Check https://docs.getdbt.com/docs/building-a-dbt-project/seeds (the FAQ section as well) to get a better understanding of seeds in dbt. Do perform also tests for the seeds (as documented in the " How do I test and document seeds?" from the FAQ section) to ensure compatibility with the functionality provided by dbt.

bump version

Is it possible to update the version for download through PyPi ? I am using some docker images for dbt and would like to be able to pull the updated version of this package during build to get some recent updates. Currently it pulls 0.20.0.

Add comments to catalog macro

Describe the feature

Add comments to catalog macro since now they are returned as NULLs. It will be a slight breaking change since it will require access to system catalog in Trino.

Who will this benefit?

Tools that use dbt metadata.

Implement Dependabot for Automated dependency updates

Available values for `session_properties` are not documented

In the README file, there are only examples for possible values of session_properties, but there is no list of what values are accepted or how these configs relate to Trino's session properties

Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3)

Describe the feature

The following issue in: Trino trinodb/trino#6658 requires us to override the current_timestap macro for the Iceberg catalog.

Trino uses 3 digits of subsecond precision in Date and time functions, but Iceberg expects timestamp with 3 digits of subsecond precision.

As a result we can't create Iceberg table from Trino

create table iceberg.examples.test
as select now() as n;
Exception:
Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.

Once this issue is resolved, we can remove this documented limitation.

Describe alternatives you've considered

No response

Who will benefit?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Potential issue with the incremental materialization failing when used with a custom schema

I tried using the materialization with presto (not trino) and found that it works perfectly except if a custom schema is used.

This line

dbt-trino/dbt/include/trino/macros/materializations/incremental.sql

Line 6 in 8088447

 select {{dest_cols_csv}} from {{ tmp_relation.include(database=false, schema=false) }}; 

strips-off the database and schema names from the temp relation causing the query to fail.

I haven't yet had the chance to test this with trino but I suspect that the behaviour would be the same.

Migrate outstanding tests to new testing framework

dbt-core v1.1 introduced a new testing framework #57.

In dbt-trino it was intoduced by #65 and some tests were already moved to new testing approach. Some tests from /docker/dbt/jaffle_shop/ are still using old testing. The purpose of the issue is to move all outstanding tests to a new testing framework to completely get rid of jaffle_shop. Some tests could be covered by actively evolving dbt-tests-adapter framework.

Useful links:

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

migrate necessary cross-db macros into adapter and ensure they're tested accordingly
remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
add new basic tests BaseDocsGenerate and BaseDocsGenReferences
consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

implement method and tests for connection retry logic

Upgrade to version 0.20.2 from upstream

As the title says.

Inserts not working with seeds and typed columns

When using seed files and running dbt seed, the generated insert sql is not working as types are not specified.

For example a seed file with following contents:

data_date_trunc.csv

updated_at,day,month
2018-01-05 12:00:00,2018-01-05,2018-01-01
,,

with following config in dbt_project.yml

seeds:
    data_date_trunc:
      +column_types:
        updated_at: timestamp(3)
        day: date
        month: date

Creates following create table statements

CREATE TABLE memory.default.data_date_trunc (
   updated_at timestamp(3),
   day date,
   month date
)

Following insert is generated

insert into "memory"."default"."data_date_trunc" (updated_at, day, month) values
          ('2018-01-05 12:00:00','2018-01-05','2018-01-01'),(NULL,NULL,NULL)

which fails with

io.trino.spi.TrinoException: Insert query has mismatched column types: Table: [timestamp(3), date, date], Query: [varchar(19), varchar(10), varchar(10)]
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48)
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitInsert(StatementAnalyzer.java:533)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitInsert(StatementAnalyzer.java:404)
	at io.trino.sql.tree.Insert.accept(Insert.java:68)
	at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:421)
	at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:384)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:79)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:71)
	at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:254)
	at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:179)
	at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:776)
	at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:132)
	at io.trino.$gen.Trino_369____20220206_204929_2.call(Unknown Source)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

More examples can be run through running the the tests on https://github.com/innoverio/dbt-trino-utils

Improve dbt-trino developer experience

Implement automated changelog tracking using Changie
Implement a pre-commit hook with automatic file formatting (black), linting (flake8), code analysis (mypy)

array_agg doesn't work

I use trino->postgresql and can execute

select account_id, array_agg(c_id) from ( select * from postgresql.public.temp_out limit 1000) group by account_id;

but when I run the same in DBT I get

TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Unsupported column type: array(bigint)"

pip freeze | grep dbt
dbt-core==0.21.0
dbt-extractor==0.4.0
dbt-trino==0.21.0

	class TrinoCredentialsFactory:
	@classmethod
	def _create_trino_profile(cls, profile):
	if "method" in profile:
	method = profile["method"]
	if method == "ldap":
	return TrinoLdapCredentials
	elif method == "certificate":
	return TrinoCertificateCredentials
	elif method == "kerberos":
	return TrinoKerberosCredentials
	elif method == "jwt":
	return TrinoJwtCredentials
	elif method == "oauth":
	return TrinoOauthCredentials
	return TrinoNoneCredentials

	@classmethod
	def _escape_value(cls, value):
	"""A not very comprehensive system for escaping bindings.

	I think "'" (a single quote) is the only character that matters.
	"""
	numbers = (decimal.Decimal, int, float)
	if value is None:
	return "NULL"
	elif isinstance(value, str):
	return "'{}'".format(value.replace("'", "''"))
	elif isinstance(value, numbers):
	return value
	elif isinstance(value, datetime):
	time_formatted = value.strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
	return "TIMESTAMP '{}'".format(time_formatted)
	elif isinstance(value, date):
	date_formatted = value.strftime("%Y-%m-%d")
	return "DATE '{}'".format(date_formatted)
	else:
	raise ValueError("Cannot escape {}".format(type(value)))

	def exception_handler(self, sql):
	try:
	yield
	# TODO: introspect into `DatabaseError`s and expose `errorName`,
	# `errorType`, etc instead of stack traces full of garbage!
	except Exception as exc:
	logger.debug("Error while running:\n{}".format(sql))
	logger.debug(exc)
	raise dbt.exceptions.RuntimeException(str(exc))

	def add_query(self, sql, auto_begin=True, bindings=None, abridge_sql_log=False):

	connection = None
	cursor = None

	# TODO: is this sufficient? Largely copy+pasted from snowflake, so
	# there's some common behavior here we can maybe factor out into the
	# SQLAdapter?
	queries = [q.rstrip(";") for q in sqlparse.split(sql)]

	for individual_query in queries:
	# hack -- after the last ';', remove comments and don't run
	# empty queries. this avoids using exceptions as flow control,
	# and also allows us to return the status of the last cursor
	without_comments = re.sub(
	re.compile("^.(--.)$", re.MULTILINE), "", individual_query
	).strip()

	if without_comments == "":
	continue

	parent = super(TrinoConnectionManager, self)
	connection, cursor = parent.add_query(
	individual_query, auto_begin, bindings, abridge_sql_log
	)

	if cursor is None:
	raise dbt.exceptions.RuntimeException(
	"Tried to run an empty query on model '{}'. If you are "
	"conditionally running\nsql, eg. in a model hook, make "
	"sure your `else` clause contains valid sql!\n\n"
	"Provided SQL:\n{}".format(connection.name, sql)
	)

	return connection, cursor

starburstdata / dbt-trino Goto Github PK

dbt-trino's Introduction

dbt-trino

Introduction

Compatibility

Setup & Configuration

Trino- and Starburst-specific configuration

Contributing

Release process

Backport process

Code of Conduct

dbt-trino's People

Contributors

Stargazers

Watchers

Forkers

dbt-trino's Issues

Describe the feature

Are you interested in contributing this feature?

Describe the feature

Describe alternatives you've considered

Who will benefit?

Are you willing to submit PR?

Veracode Software Composition Analysis

Background

Reason

Error Log

Describe the feature

Veracode Software Composition Analysis

Expected behavior

Actual behavior

Steps To Reproduce

Log output/Screenshots

Operating System

dbt version

Trino Server version

Python version

Are you willing to submit PR?

Describe the feature

Expected behavior

Actual behavior

Steps To Reproduce

Log output/Screenshots

Operating System

dbt version

Trino Server version

Python version

Are you willing to submit PR?

Describe the feature

Describe alternatives you've considered

Who will benefit?

Are you willing to submit PR?

Describe the feature

Describe alternatives you've considered

Who will benefit?

Are you willing to submit PR?

Describe the feature

Who will this benefit?

Describe the feature

Describe alternatives you've considered

Who will benefit?

Are you willing to submit PR?

Recommend Projects

Recommend Topics

Recommend Org