Giter Club home page Giter Club logo

dbt-trino's Introduction

dbt-trino

Starburst dbt         trino

Build Status db-starburst-and-trino Slack

Introduction

dbt is a data transformation workflow tool that lets teams quickly and collaboratively deploy analytics code, following software engineering best practices like modularity, CI/CD, testing, and documentation. It enables anyone who knows SQL to build production-grade data pipelines.

One frequently asked question in the context of using dbt tool is:

Can I connect my dbt project to two databases?

(see the answered question on the dbt website).

TL;DR dbt stands for transformation as in T within ELT pipelines, it doesn't move data from source to a warehouse.

dbt-trino adapter uses Trino as a underlying query engine to perform query federation across disperse data sources. Trino connects to multiple and diverse data sources (available connectors) via one dbt connection and process SQL queries at scale. Transformations defined in dbt are passed to Trino which handles these SQL transformation queries and translates them to queries specific to the systems it connects to create tables or views and manipulate data.

This repository represents a fork of the dbt-presto with adaptations to make it work with Trino.

Compatibility

This dbt plugin has been tested against Trino version 445, Starburst Enterprise version 435-e.4 and Starburst Galaxy.

Setup & Configuration

For information on installing and configuring your profile to authenticate to Trino or Starburst, please refer to Starburst and Trino Setup in the dbt docs.

Trino- and Starburst-specific configuration

For Trino- and Starburst-specific configuration, you can refer to Starburst (Trino) configurations on the dbt docs site.

Contributing

Release process

First 5 steps are ONLY relevant for bumping minor version:

  1. Create 1.x.latest branch from the latest tag corresponding to current minor version, e.g. git checkout -b 1.6.latest v1.6.2 (when bumping to 1.7). Push branch to remote. This branch will be used for potential backports.
  2. Create new branch (Do not push below commits to 1.x.latest). Add a new entry in .changes/0.0.0.md that points to the newly created latest branch.
  3. Run changie merge to update README.md. After that, remove changie files and folders related to current minor version. Commit.
  4. Bump version of dbt-tests-adapter. Commit.
  5. Merge these 2 commits into the master branch. Add a Skip Changlelog label to the PR.

Continue with the next steps for a minor version bump. Start from this point for a patch version bump:

  1. Run Version Bump workflow. The major and minor part of the dbt version are used to associate dbt-trino's version with the dbt version.
  2. Merge the bump PR. Make sure that test suite pass.
  3. Run dbt-trino release workflow to release dbt-trino to PyPi and GitHub.

Backport process

Sometimes it is necessary to backport some changes to some older versions. In that case, create branch from x.x.latest branch. There is a x.x.latest for each minor version, e.g. 1.3.latest. Make a fix and open PR back to x.x.latest. Create changelog by changie new as ususal, as separate changlog for each minor version is kept on every x.x.latest branch. After merging, to make a release of that version, just follow instructions from Release process section, but run every workflow on x.x.latest branch.

Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct.

dbt-trino's People

Contributors

aezomz avatar austenlacy avatar bachng2017 avatar beckjake avatar bendettasd avatar brunomurino avatar clausherther avatar cmcarthur avatar damian3031 avatar dataders avatar dependabot[bot] avatar drewbanin avatar findepi avatar findinpath avatar fishtownbuildbot avatar fokko avatar haunfelder avatar hovaesco avatar iknox-fa avatar jtcohen6 avatar kalvinnchau avatar kateshopify avatar leahwicz avatar mdesmet avatar michelleark avatar posulliv avatar rahulj51 avatar robbertdm avatar snyk-bot avatar starburstdata-automation avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbt-trino's Issues

support JWT authentication

looks like trino's python client supports JWT. An easy lift would be that a user could pass a JWT token as a target field in the user's profiles.yml.

The user below could then use some script to store the token as an environment variable.

Happy to advise on how this might work, LMK!

relevant dbt Slack thread

I'm trying to connect dbt to presto as that is one of our data stores. The only way that we are able to connect to this is through okta verification using a token that lasts only 24 hours. From the looks of the documentation on the profile setup it looks like only ldap and kerberos is supported. Is there currently a way to fetch a token or is this something that would need to be done on my end to pass over to the presto profile?

Add Trino specific methods for casting types

As referenced in #35

In https://github.com/dbt-labs/dbt-utils/blob/73a65562bf3d9b31a1eca0b27118056ad048d8fc/macros/cross_db_utils/datatypes.sql#L3-L6 there is a utility method used when casting to string

Add Trino specific _type_xxx methods in the Trino adapter and add also a corresponding unit test which creates a model by using CAST for all these types. Maybe this testing thought is wrong - if so, please use https://github.com/dbt-labs/dbt-utils#union_relations-source to test this functionality.

Source Freshness is not implemented

When I add source freshness configuration and run dbt source freshness with dbt-trino, I get

Runtime Error in source mv_mantis_mfr (models/staging/sources.yml)
  ERROR: current_timestamp macro not implemented for adapter trino

Add incremental models

Opening this task to discuss the best approach to add incremental.

Trino is planning on adding the merge operation trinodb/trino#7708 but I'd say we can start by adding the feature with a delete+insert approach as stated here.

For now, I'd like to focus on adding support to Hive and Iceberg connector.

Any thoughts on this?

dbt_utils package union_relations not working with Trino

In our project we make extensive use of the dbt_utils library. the dbt_utils.union_relations doesn't work because it generates invalid sql

create table "dataproduct_levenslangleren"."zone_levenslangleren"."levenslang_leren__dbt_tmp"
    
  as (
    

        (
            select

                cast('"dataproduct_levenslangleren"."zone_levenslangleren"."levenslang_leren_ho"' as 
    string
) as _dbt_source_relation,
                
                    cast("laadoperatie" as character varying(256)) as "laadoperatie" ,
                    cast("record max laadoperatie" as character varying(256)) as "record max laadoperatie" ,

Two issues here:

This requires the adapter to provide specific implementation if the default ('string') is not supported:

{% macro trino__type_string() %}
    varchar
{% endmacro %}

Will post PR shortly.

Support setting client_tags in profile configuration

Describe the feature

We plan to use client_tags in Trino to set resource group policies. It would be really helpful if we can set client tags in the trino profile configuration

Are you interested in contributing this feature?

If someone can help me, where do we set/define profile configuration parameters, I would really like to contribute

Add testing for Starburst Galaxy

Describe the feature

Add testing for Starburst Galaxy to ensure dbt-trino compatibility.

Describe alternatives you've considered

No response

Who will benefit?

Starburst Galaxy users.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • migrate necessary cross-db macros into adapter and ensure they're tested accordingly
  • remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
  • add new basic tests BaseDocsGenerate and BaseDocsGenReferences
  • consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

  • implement method and tests for connection retry logic

oidc support for dbt-trino

A great and unique feature of Trino is that it allows users to authenticate through OIDC. This is not something commonly seen with databases (only Snowflake and Denodo support this feature). it's a much safer use as it doesn't require storing any local credentials.

I saw that this feature is implemented in the trino-python-client but not yet supported in the dbt adapter.

Some gotchas:

  • As you can specify # of threads and each model usually generates lots of queries, it might be annoying to see lots of browser windows opening. So ideally the access token is cached and reused for every query.
  • Some dbt jobs may take a long time, we need to ensure that queries don't fail because of token expiration while running a dbt project.

Migrate tests from memory connector to PostgreSQL

To test advanced features of dbt-trino, tests should be able to execute queries like ALTER TABLE ADD/DROP column on test catalog. Currently, all tests are executed upon memory connector, which has only limited support to SQL statements. PostgreSQL instance could be easily spin up in docker-compose used in tests alongside Trino/Starburst cluster. Migration process will include:

  • spinning up PostgreSQL db in docker comopse
  • create new catalogs and remove memory catalogs in Trino and Starburst
  • update profile to the new catalog

Fails with DBT 1.0.0

I think this needs an update to support the current version of DBT.

When running "dbt debug" the following error is returned:

dbt was unable to connect to the specified database.
The database returned the following error:

Runtime Error
'str' object has no attribute '_message'

DBT it set to use the trino driver which in turn is configured to talk to an AWS Data Catalog and S3

CVE: 2020-14343 found in PyYAML - Version: 5.3.1 [PYTHON]

Veracode Software Composition Analysis

Attribute Details
Library PyYAML
Description YAML parser and emitter for Python
Language PYTHON
Vulnerability Arbitrary Code Execution
Vulnerability description pyyaml is vulnerable to arbitrary code execution. The .yaml files that are parsed by FullLoader uses the unsafe yaml.load() by default. This vulnerability exists due to a bypass of the fix for CVE-2020-1747.
CVE 2020-14343
CVSS score 10
Vulnerability present in version/s 3.12-5.3.1
Found library version/s 5.3.1
Vulnerability fixed in version 5.4b1
Library latest version 6.0
Fix

Links:

Incremental loads don't work where schema is different. Implement on_schema_change

Hey guys,

If your destination table doesn't have the same columns as your model the whole model fails.
Example

If Table A has ID, Text1, Date1 in it's field list and the dbt model has TableA as ID, Text1 on dbt run the model will fail regardless of the on_schema_change setting. It seems the incremental macro retrieves the list of columns to insert into from the destination table and never checks them against the model for schema diffs?

Transaction timeout for longer queries

Background

I have been experiencing this issue for a long time. This issue happens when we try to execute the query which takes more than 5min. To reproduce this issue use dbt seed which can insert some rows which will take more than 5mins.

Reason

The reason for this issue is that, In Trino, the default value of transaction.idle-timeout is 5minutes and dbt executes commands in the following flow which causes this error

  1. START TRANSACTION
  2. Some queries which will take more than 5min
  3. COMMIT <- This will throw an error

Error Log

trino.exceptions.DatabaseError: failed to commit transaction 73dc7443-ec53-4911-95be-a372792743b4: TrinoUserError(type=USER_ERROR, name=UNKNOWN_TRANSACTION, message="Unknown transaction ID: 73dc7443-ec53-4911-95be-a372792743b4. Possibly expired? Commands ignored until end of transaction block", query_id=20211121_145311_00013_7rttf)

Is there any specific reason why transactions are explicitly started in dbt-trino in spite of using auto-commit in python connection?

isolation_level=IsolationLevel.AUTOCOMMIT,

Other reference:

dbt-labs/dbt-presto#75

Glue Catalog compatible materialization

I have an idea for a feature and wonder if it makes sense to be part of dbt-trino or not.

Trino is commonly deployed on AWS with AWS Glue Catalog - you see people referencing it a lot. AWS Glue Catalog doesn't support renaming tables, so normal table materialization doesn't work. So anyone using Trino, dbt and Glue has to override the materialization macro to be able to materialize tables.

The way we do this is we just move things around so that instead of create new, rename old, rename new, drop old we just do drop old, create new. That causes table to be unavailable for a period of time, but works as a quick fix. Also we have to maintain our custom macro, and ship it with every dbt project.

Could this be looked at as part of dbt-trino? maybe creating official "table_norename" materialization, or looking into materialized views, insert overwrite or something? It would make lives of everyone using Trino with AWS much easier.

Migrate to new framework for adapter testing during v1.1 upgrade

Hey dbt-trino maintainers!

We've been developing a new framework for all integration/functional testing in dbt-core. One of our goals in the initiative has been to replace the suite of basic test cases for adapter plugins.

For now, the framework is comprised of:

  • Pytest utilities included in dbt-core==1.1.0b1
  • A set of "basic" test cases, at exact parity with the test cases built into the old .dbtspec / pytest-dbt-adapter

I have some draft docs (dbt-labs/docs.getdbt.com#1263) for using the new framework: https://deploy-preview-1263--docs-getdbt-com.netlify.app/docs/contributing/testing-a-new-adapter

It should be much easier to:

  • Override/reimplement test cases (as you've done here)
  • "Opt into" test cases for more-advanced functionality, which we'll be gradually adding into the same package of adapter-facing test cases
  • Add your own functional test cases for Trino-specific behavior, using the same built-in utilities, and reusing project fixtures if appropriate

If it proves helpful, you can also check out the way we've implemented the framework in some of our own adapter plugins:

Thanks for all the great work :)

issue with big seed file

error whle seeding a big csv file:

Runtime Error in seed com_event_log_info (data/<filename>csv)
  TrinoUserError(type=USER_ERROR, name=QUERY_TEXT_TOO_LARGE, message="Query text length (1380022) exceeds the maximum length (1000000)", query_id=20211116_121112_10193_2mbb4)

same file could be seeded without problem with dbt-presto driver.

discussion: dbt-core's feedback from initial review for adapter verification

** would have been a Discussion if they were enabled for this repo **

as part of our new adapter verification process, the dbt core adapters engineering team and I met today to do a high-level review of the dbt-trino codebase. good news, your adapter passes the sniff test! That means that while the engineers haven’t yet had the time to review every single line of code (though they plan to in the next few sprints), there are no glaring red flags that prevent the adapter from being verified.

That said, here's some notes of things we noticed:

  • it's great to see that Trino works without having to override hardly any SQLAdapter methods.
  • It took a moment to realize why TrinoCredentialsFactory needed to exist. But using dataclass parameter validation to guide users to the correct sets of profile fields is a great choice. Perhaps though there's a way we can work to make this easier.
    class TrinoCredentialsFactory:
    @classmethod
    def _create_trino_profile(cls, profile):
    if "method" in profile:
    method = profile["method"]
    if method == "ldap":
    return TrinoLdapCredentials
    elif method == "certificate":
    return TrinoCertificateCredentials
    elif method == "kerberos":
    return TrinoKerberosCredentials
    elif method == "jwt":
    return TrinoJwtCredentials
    elif method == "oauth":
    return TrinoOauthCredentials
    return TrinoNoneCredentials
  • is there anything unique about Trino today w.r.t. prepared statements & bindings? or is that legacy code?
    @classmethod
    def _escape_value(cls, value):
    """A not very comprehensive system for escaping bindings.
    I think "'" (a single quote) is the only character that matters.
    """
    numbers = (decimal.Decimal, int, float)
    if value is None:
    return "NULL"
    elif isinstance(value, str):
    return "'{}'".format(value.replace("'", "''"))
    elif isinstance(value, numbers):
    return value
    elif isinstance(value, datetime):
    time_formatted = value.strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
    return "TIMESTAMP '{}'".format(time_formatted)
    elif isinstance(value, date):
    date_formatted = value.strftime("%Y-%m-%d")
    return "DATE '{}'".format(date_formatted)
    else:
    raise ValueError("Cannot escape {}".format(type(value)))
  • can you give more context on why you needed to parse the DatabaseError message?
    def exception_handler(self, sql):
    try:
    yield
    # TODO: introspect into `DatabaseError`s and expose `errorName`,
    # `errorType`, etc instead of stack traces full of garbage!
    except Exception as exc:
    logger.debug("Error while running:\n{}".format(sql))
    logger.debug(exc)
    raise dbt.exceptions.RuntimeException(str(exc))

For example:

  • do you think that some of the logic in TrinoAdapter.add_query() could upstream into dbt-core?
    def add_query(self, sql, auto_begin=True, bindings=None, abridge_sql_log=False):
    connection = None
    cursor = None
    # TODO: is this sufficient? Largely copy+pasted from snowflake, so
    # there's some common behavior here we can maybe factor out into the
    # SQLAdapter?
    queries = [q.rstrip(";") for q in sqlparse.split(sql)]
    for individual_query in queries:
    # hack -- after the last ';', remove comments and don't run
    # empty queries. this avoids using exceptions as flow control,
    # and also allows us to return the status of the last cursor
    without_comments = re.sub(
    re.compile("^.*(--.*)$", re.MULTILINE), "", individual_query
    ).strip()
    if without_comments == "":
    continue
    parent = super(TrinoConnectionManager, self)
    connection, cursor = parent.add_query(
    individual_query, auto_begin, bindings, abridge_sql_log
    )
    if cursor is None:
    raise dbt.exceptions.RuntimeException(
    "Tried to run an empty query on model '{}'. If you are "
    "conditionally running\nsql, eg. in a model hook, make "
    "sure your `else` clause contains valid sql!\n\n"
    "Provided SQL:\n{}".format(connection.name, sql)
    )
    return connection, cursor

please let us know if there any other particular areas you’d like us to review more closely. Looking forward to collaborating!

request for enhanced properties feature

The properties feature the implement recently is very useful but it would be better if it could support following use-cases:

  1. combine the properties in general configuration (dbt_project.yml) and specific configuration (models). For now, if we want to set transactional=true for all models but partition just for few tables, we must also including the transactional=true in every models that we only need to change the partitions, because only one properties will be used here. What we want is set transactional=true in general level and only partition in specific models.
  2. the properties should ignores the key/value pairs if the key or value is null (or both). The use-case is that sometimes we need to control the property dynamically depend on some other variables. It is easier to use the jinja2 for key/value in this case to control whether we should include the key/value into the properties or not. For example, we have 2 env that only one needs to use the transaction. Utilize the DBT_TRANSACTION environment variable will help us to use the same code on both environments.
+properties:
    transactional: "{{ env_var('DBT_TRANSACTION','true') }}"

Support delete+insert materialization

Describe the feature

Trino at this moment does not support merge statements, that's why this materialization delete+insert is can implemented by first using a delete statement to delete records in the target table that are to be updated, and then an insert statement.

Trino can't read table when there is no files on S3

Hi we have a scenario where the data is written on S3 and Glue metastore. There is a tricky situation where sometimes the tables are created with no data, ie the query returns 0 rows and therefore the table is created but there is no data.
When we try to query this table later we get the error:

SQL Error [16777220]: Query failed (#20220711_175341_09350_zawua): Partition location does not exist: s3://bucket/table...

Is there a way of pointing out to trino that the table is empty? I have no issues querying this table using Athena.
Thank you

CVE: 0000-0000 found in Jinja2 - Version: 2.11.3 [PYTHON]

Veracode Software Composition Analysis

Attribute Details
Library Jinja2
Description A very fast and expressive template engine.
Language PYTHON
Vulnerability Cross-Site Scripting (XSS)
Vulnerability description jinja2 is vulnerable to Cross Site Scripting. An attacker is able to inject and execute arbitrary Javascript through the gettext and ngettext function due to the lack of output sanitization.
CVE null
CVSS score 5.8
Vulnerability present in version/s 2.5-3.0.0
Found library version/s 2.11.3
Vulnerability fixed in version
Library latest version 3.1.2
Fix There is no fix version. Apply the following fix:

Links:

dbt-trino can't handle mixed case schema in profiles.yaml

Expected behavior

When using a schema with mixed case, I expect dbt-trino to be able to overwrite my existing tables.

Actual behavior

The first dbt run succeeds, however subsequent runs fail with

Compilation Error in model base_orders (models/base/base_orders.sql)
20:23:53    When searching for a relation, dbt found an approximate match. Instead of guessing 
20:23:53    which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_orders", or rename it to be less ambiguous.

Steps To Reproduce

use a mixed case profile eg

my_dbt_trino_project:
  outputs:

    dev:
      type: trino
      method: none  # optional, one of {none | ldap | kerberos}
      user: admin
      database: datalake
      host: localhost
      port: 8080
      schema: analyticsTEST
      threads: 1

  target: dev

Run dbt run multiple times.

Log output/Screenshots

❯ dbt run
20:23:44 Running with dbt=1.2.1
20:23:44 Unable to do partial parsing because profile has changed
20:23:45 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:

  • models.my_dbt_trino_project.example

20:23:45 Found 4 models, 0 tests, 1 snapshot, 0 analyses, 272 macros, 0 operations, 0 seed files, 3 sources, 0 exposures, 0 metrics
20:23:45
20:23:46 Concurrency: 1 threads (target='dev')
20:23:46
20:23:46 1 of 4 START view model analyticsTEST.base_customers ........................... [RUN]
20:23:47 1 of 4 OK created view model analyticsTEST.base_customers ...................... [SUCCESS in 0.30s]
20:23:47 2 of 4 START view model analyticsTEST.base_orders .............................. [RUN]
20:23:47 2 of 4 OK created view model analyticsTEST.base_orders ......................... [SUCCESS in 0.09s]
20:23:47 3 of 4 START view model analyticsTEST.base_payments ............................ [RUN]
20:23:47 3 of 4 OK created view model analyticsTEST.base_payments ....................... [SUCCESS in 0.11s]
20:23:47 4 of 4 START table model analyticsTEST.orders .................................. [RUN]
20:23:49 4 of 4 OK created table model analyticsTEST.orders ............................. [SUCCESS in 2.03s]
20:23:49
20:23:49 Finished running 3 view models, 1 table model in 0 hours 0 minutes and 4.24 seconds (4.24s).
20:23:49
20:23:49 Completed successfully
20:23:49
20:23:49 Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4
❯ dbt run
20:23:53 Running with dbt=1.2.1
20:23:53 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:

  • models.my_dbt_trino_project.example

20:23:53 Found 4 models, 0 tests, 1 snapshot, 0 analyses, 272 macros, 0 operations, 0 seed files, 3 sources, 0 exposures, 0 metrics
20:23:53
20:23:53 Concurrency: 1 threads (target='dev')
20:23:53
20:23:53 1 of 4 START view model analyticsTEST.base_customers ........................... [RUN]
20:23:53 1 of 4 ERROR creating view model analyticsTEST.base_customers .................. [ERROR in 0.01s]
20:23:53 2 of 4 START view model analyticsTEST.base_orders .............................. [RUN]
20:23:53 2 of 4 ERROR creating view model analyticsTEST.base_orders ..................... [ERROR in 0.00s]
20:23:53 3 of 4 START view model analyticsTEST.base_payments ............................ [RUN]
20:23:53 3 of 4 ERROR creating view model analyticsTEST.base_payments ................... [ERROR in 0.00s]
20:23:53 4 of 4 SKIP relation analyticsTEST.orders ...................................... [SKIP]
20:23:53
20:23:53 Finished running 3 view models, 1 table model in 0 hours 0 minutes and 0.25 seconds (0.25s).
20:23:53
20:23:53 Completed with 3 errors and 0 warnings:
20:23:53
20:23:53 Compilation Error in model base_customers (models/base/base_customers.sql)
20:23:53 When searching for a relation, dbt found an approximate match. Instead of guessing
20:23:53 which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_customers", or rename it to be less ambiguous.
20:23:53 Searched for: "datalake"."analyticsTEST"."base_customers"
20:23:53 Found: "datalake"."analyticstest"."base_customers"
20:23:53
20:23:53 > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
20:23:53 > called by macro materialization_view_trino (macros/materializations/view.sql)
20:23:53 > called by model base_customers (models/base/base_customers.sql)
20:23:53
20:23:53 Compilation Error in model base_orders (models/base/base_orders.sql)
20:23:53 When searching for a relation, dbt found an approximate match. Instead of guessing
20:23:53 which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_orders", or rename it to be less ambiguous.
20:23:53 Searched for: "datalake"."analyticsTEST"."base_orders"
20:23:53 Found: "datalake"."analyticstest"."base_orders"
20:23:53
20:23:53 > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
20:23:53 > called by macro materialization_view_trino (macros/materializations/view.sql)
20:23:53 > called by model base_orders (models/base/base_orders.sql)
20:23:53
20:23:53 Compilation Error in model base_payments (models/base/base_payments.sql)
20:23:53 When searching for a relation, dbt found an approximate match. Instead of guessing
20:23:53 which relation to use, dbt will move on. Please delete "datalake"."analyticstest"."base_payments", or rename it to be less ambiguous.
20:23:53 Searched for: "datalake"."analyticsTEST"."base_payments"
20:23:53 Found: "datalake"."analyticstest"."base_payments"
20:23:53
20:23:53 > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
20:23:53 > called by macro materialization_view_trino (macros/materializations/view.sql)
20:23:53 > called by model base_payments (models/base/base_payments.sql)
20:23:53
20:23:53 Done. PASS=0 WARN=0 ERROR=3 SKIP=1 TOTAL=4

Operating System

macos

dbt version

1.2.2

Trino Server version

396

Python version

3.9.10

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

failure when enable store_failures with boolean parameter

Expected behavior

by default, dbt works like this (ref from dbt-labs/dbt-core#1558)

When dbt creates a relation in the db, add a special flag to the relation (defaults to False)
When checking a node name match:

if that flag is present and quoting is disabled, do a lowercase compare
otherwise remain case sensitive

the default behavior will fail with following case:

when enable store_failures, if the test includes Boolean parameter, dbt will try to save a test model with name False. First save will be fine, but second will fail because dbt will compare False model with false model (the one that has been saved)

error will looks like this:

06:21:21  Compilation Error in test accepted_values_model____False__0__1 (models/xxx/model.yml)
06:21:21    When searching for a relation, dbt found an approximate match. Instead of guessing
06:21:21    which relation to use, dbt will move on. Please delete "hive"."lab_dbt_xxx"."accepted_values_model____false__0__1", or rename it to be less ambiguous.
06:21:21    Searched for: "hive"."lab_dbt_xxx"."accepted_values_model____False__0__1"   <-- False is in upper case
06:21:21    Found: "hive"."lab_dbt_xxx"."accepted_values_model____false__0__1"

Actual behavior

no error and the test model (result) should be saved correctly

Steps To Reproduce

set store_failures to true and use any test containing boolean value like this:

      - name: some_column_name
        quote: true
        tests:
          - accepted_values:
              values:
                - 0
                - 1
              quote: false

Log output/Screenshots

No response

Operating System

linux

dbt version

1.1.0/1.2.1

Trino Server version

1.1.0/1.2.2

Python version

3.8

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Ability to set session properties

Looks like using the trino (or presto) python client allows for setting session properties, however there is currently no way to set these and have them persist into the transaction when running dbt. Also it appears the python client needs this set in the actual connection. If a pre-hook is run, that also is run outside of a transaction of course and does not effect the running queries. Having the option to set session parameters can significantly increase the performance for pipelines through Trino.

Init from connect does include session_properties in trino-python-client.

def __init__(
        self,
        host,
        port=constants.DEFAULT_PORT,
        user=None,
        source=constants.DEFAULT_SOURCE,
        catalog=constants.DEFAULT_CATALOG,
        schema=constants.DEFAULT_SCHEMA,
        session_properties=None,
        http_headers=None,
        http_scheme=constants.HTTP,
        auth=constants.DEFAULT_AUTH,
        redirect_handler=None,
        max_attempts=constants.DEFAULT_MAX_ATTEMPTS,
        request_timeout=constants.DEFAULT_REQUEST_TIMEOUT,
        isolation_level=IsolationLevel.AUTOCOMMIT,
        verify=True
    ):

Remove the `trino__get_merge_sql` macro

Describe the feature

Following issues preventing us from using the default dbt merge macro:

Once these are resolved we should remove our macro override.

Describe alternatives you've considered

No response

Who will benefit?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Enable snapshot and merge tests for Galaxy

Describe the feature

Enable snapshot and merge tests for Galaxy once Galaxy will be upgraded to 393.
Check TODO in the code in test_incremental_schema, test_incremental_merge and test_snapshot.

Describe alternatives you've considered

No response

Who will benefit?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Authentication with certificate

I'm transferring this issue over, as I previously incorrectly posted to the presto adapter repo.

As far as I can tell, there is no way to pass in a path to a certificate for an SSL connection to the API. I believe this needs to be set on the trino connection that is created.

trino_conn = trino.dbapi.connect(

Support `Password file` authentication type?

I can only find ldap method to autentication trino server(With user and password), does it support password file configed trino server? If not, can you add this support?
I've tried but got error as following

$ dbt run
05:33:21  Running with dbt=1.0.1
05:33:21  Found 1 model, 2 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
05:33:21  
05:33:23  Encountered an error:
Runtime Error
  Runtime Error
    error 401: b'Unauthorized'

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • migrate necessary cross-db macros into adapter and ensure they're tested accordingly
  • remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
  • add new basic tests BaseDocsGenerate and BaseDocsGenReferences
  • consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

  • implement method and tests for connection retry logic

Add support for DECIMAL, REAL, DOUBLE, TIME in dbt seed

dbt seed command does not currently escape some data types supported by trino.

Perform the changes in the method dbt.adapters.trino.connections.ConnectionWrapper._escape_value

Make sure to add a corresponding model which contains all these types in the automated tests.

seeds: 
  jaffle_shop:
    seed_data_types:
      +column_types:
        c_timestamp: timestamp
        c_date: date
        c_time: time
        c_char: char(10)
        short_decimal: decimal(10,2)
        long_decimal: decimal(10,2)
        c_real: real
        c_float: float

Check https://docs.getdbt.com/docs/building-a-dbt-project/seeds (the FAQ section as well) to get a better understanding of seeds in dbt. Do perform also tests for the seeds (as documented in the " How do I test and document seeds?" from the FAQ section) to ensure compatibility with the functionality provided by dbt.

bump version

Is it possible to update the version for download through PyPi ? I am using some docker images for dbt and would like to be able to pull the updated version of this package during build to get some recent updates. Currently it pulls 0.20.0.

Add comments to catalog macro

Describe the feature

Add comments to catalog macro since now they are returned as NULLs. It will be a slight breaking change since it will require access to system catalog in Trino.

Who will this benefit?

Tools that use dbt metadata.

Remove docs about overriding current_timestamp type when Trino's iceberg implementation supports TIMESTAMP(3)

Describe the feature

The following issue in: Trino trinodb/trino#6658 requires us to override the current_timestap macro for the Iceberg catalog.

Trino uses 3 digits of subsecond precision in Date and time functions, but Iceberg expects timestamp with 3 digits of subsecond precision.

As a result we can't create Iceberg table from Trino

create table iceberg.examples.test
as select now() as n;
Exception:
Timestamp precision (3) not supported for Iceberg. Use "timestamp(6) with time zone" instead.

Once this issue is resolved, we can remove this documented limitation.

Describe alternatives you've considered

No response

Who will benefit?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Potential issue with the incremental materialization failing when used with a custom schema

I tried using the materialization with presto (not trino) and found that it works perfectly except if a custom schema is used.

This line

select {{dest_cols_csv}} from {{ tmp_relation.include(database=false, schema=false) }};

strips-off the database and schema names from the temp relation causing the query to fail.

I haven't yet had the chance to test this with trino but I suspect that the behaviour would be the same.

Migrate outstanding tests to new testing framework

dbt-core v1.1 introduced a new testing framework #57.

In dbt-trino it was intoduced by #65 and some tests were already moved to new testing approach. Some tests from /docker/dbt/jaffle_shop/ are still using old testing. The purpose of the issue is to move all outstanding tests to a new testing framework to completely get rid of jaffle_shop. Some tests could be covered by actively evolving dbt-tests-adapter framework.

Useful links:

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • migrate necessary cross-db macros into adapter and ensure they're tested accordingly
  • remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
  • add new basic tests BaseDocsGenerate and BaseDocsGenReferences
  • consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

  • implement method and tests for connection retry logic

Inserts not working with seeds and typed columns

When using seed files and running dbt seed, the generated insert sql is not working as types are not specified.

For example a seed file with following contents:

data_date_trunc.csv

updated_at,day,month
2018-01-05 12:00:00,2018-01-05,2018-01-01
,,

with following config in dbt_project.yml

seeds:
    data_date_trunc:
      +column_types:
        updated_at: timestamp(3)
        day: date
        month: date

Creates following create table statements

CREATE TABLE memory.default.data_date_trunc (
   updated_at timestamp(3),
   day date,
   month date
)

Following insert is generated

insert into "memory"."default"."data_date_trunc" (updated_at, day, month) values
          ('2018-01-05 12:00:00','2018-01-05','2018-01-01'),(NULL,NULL,NULL)

which fails with

io.trino.spi.TrinoException: Insert query has mismatched column types: Table: [timestamp(3), date, date], Query: [varchar(19), varchar(10), varchar(10)]
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48)
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitInsert(StatementAnalyzer.java:533)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitInsert(StatementAnalyzer.java:404)
	at io.trino.sql.tree.Insert.accept(Insert.java:68)
	at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:421)
	at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:384)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:79)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:71)
	at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:254)
	at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:179)
	at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:776)
	at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:132)
	at io.trino.$gen.Trino_369____20220206_204929_2.call(Unknown Source)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

More examples can be run through running the the tests on https://github.com/innoverio/dbt-trino-utils

array_agg doesn't work

I use trino->postgresql and can execute

select account_id, array_agg(c_id) from ( select * from postgresql.public.temp_out limit 1000) group by account_id;

but when I run the same in DBT I get

TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Unsupported column type: array(bigint)"
pip freeze | grep dbt
dbt-core==0.21.0
dbt-extractor==0.4.0
dbt-trino==0.21.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.