Giter Club home page Giter Club logo

Comments (19)

bachng2017 avatar bachng2017 commented on June 8, 2024 1

it might be a trino question than a dbt question, @findinpath, do you know if this kind of transaction works in trino ?

(for example seeding sequence looks like this):
transaction begin
insert 1
insert 2
insert 3
commit or rollback

as far as we've tested, if insert 3 fails, the result of insert 1 and 2 still in the seed table.

Maybe the transaction start/commit in dbt-presto/trino is just a place-holder or remains of a copy from other drivers. (btw, with snowflake, the rollback in previous sample seems to clear up the result of all inserts, as expected)

from dbt-trino.

findinpath avatar findinpath commented on June 8, 2024 1

@bachng2017

as far as we've tested, if insert 3 fails, the result of insert 1 and 2 still in the seed table.

Indeed, Trino doesn't support fully ACID semantics.
Trino makes sure that ONE INSERT statement doesn't create corrupt data by inserting data in one or more staging table(s) and then doing on the target database INSERT INTO target_table SELECT * FROM staging_table(s).

See https://www.trinoforum.org/t/insert-performance-on-trino-jdbc-connectors/99/3 for a more detailed explanation about the way that INSERT works on Trino.

from dbt-trino.

bachng2017 avatar bachng2017 commented on June 8, 2024 1

@hashhar , thanks for confirmation. Will file a issue for this.

And yes, that is what we are facing

If I understand correctly any query ran within a transaction that takes > transaction timeout will exhibit this issue?

from dbt-trino.

findinpath avatar findinpath commented on June 8, 2024

dbt is responsible for performing the changes in a transactional fashion.
If you think about it, this is meaningful in order to avoid having corrupt data after running dbt .

@aakashnand the PR trinodb/trino#9846 may be linked to your issue. It will be available only from Trino version 365.

from dbt-trino.

aakashnand avatar aakashnand commented on June 8, 2024

this is meaningful in order to avoid having corrupt data after running dbt

@findinpath even when we are connecting using AUTOCOMMIT?

isolation_level=IsolationLevel.AUTOCOMMIT,

My understanding here is that when we use AUTOCOMMIT, every query is committed as soon as it is finished. In this sense, even if we execute 1000s of queries the end result will not lead to data corruption.

Also, what is the correct solution to solve this issue?

from dbt-trino.

findinpath avatar findinpath commented on June 8, 2024

dbt can execute more than one statement corresponding to a materialization.
Maybe I am wrong, please double check this on dbt slack, but this is the reason why dbt works with transactions.

from dbt-trino.

aakashnand avatar aakashnand commented on June 8, 2024

@findinpath Do you have any solution in mind for this or should we discuss this on slack channel #dbt-presto-trino?
The temporary solution we have is to increase transaction.idle-timeout=8h in config.properties

from dbt-trino.

findinpath avatar findinpath commented on June 8, 2024

@aakashnand I have no better solution than what you just posted.

A slightly different approach is to set the sesssion properties on dbt and affect therefore only the dbt project and not all Trino clients with the transaction idle timeout setting.

my-trino-db:
  target: dev
  outputs:
    dev:
      type: trino
      user: commander
      host: 127.0.0.1
      port: 8080
      database: analytics
      schema: public
      threads: 8
      http_scheme: http
      session_properties:
        ....

from dbt-trino.

hovaesco avatar hovaesco commented on June 8, 2024

I believe transaction.idle-timeout cannot be set as a session property.

from dbt-trino.

aakashnand avatar aakashnand commented on June 8, 2024

As suggested by @hovaesco this property cannot be set using session property so we need to set it in the config.properties 🙁

from dbt-trino.

bachng2017 avatar bachng2017 commented on June 8, 2024

we might have understood what happens but want to find more evidences.

btw @findinpath , could you run this example ?
https://github.com/trinodb/trino-python-client#transactions

somehow it did not work in our env with TWO inserts. ONE insert is fine.
just want to make sure our env. is setup correctly.

from dbt-trino.

bachng2017 avatar bachng2017 commented on June 8, 2024

I think the reason the transaction does not work is on our side. Anyway, we think the reason the time-out occurred at the 1st place might be this:

dbt makes the connection with IsolationLevel.AUTOCOMMIT as default. So when the sql is executed, it creates a request (#a). And when the start_transaction is execute, #b create the transaction in another request instance.
The query and the transaction happens independently. This answers the question why we had the error at the COMMIT but the query has been succeed. And also a simple test with start_transaction -> wait long enough -> commit yields the same error.

We need more test to see how to improve this but there might be 2 options now:

  • use a different isolation level when dbt-trino/presto initializes the connection
  • change the way trino client implements #a to care about current transaction instance, not only isolation level

from dbt-trino.

findinpath avatar findinpath commented on June 8, 2024

@hashhar / @ebyhr can you please take a look at the comment #20 (comment) and help out with feedback?

from dbt-trino.

hashhar avatar hashhar commented on June 8, 2024

@bachng2017 Good catch. that indeed looks like a bug in how transactions are handled in the Trino python client.

Can you file the issue against the Python client with a simple reproduction code so that we can verify that the fix that gets implemented actually works?

If I understand correctly any query ran within a transaction that takes > transaction timeout will exhibit this issue?

from dbt-trino.

bachng2017 avatar bachng2017 commented on June 8, 2024

@hashhar created an issue for this at trino-python-client, pls check
trinodb/trino-python-client#129

from dbt-trino.

hovaesco avatar hovaesco commented on June 8, 2024

PR to address the issue: #30

from dbt-trino.

hovaesco avatar hovaesco commented on June 8, 2024

Merged #30

from dbt-trino.

bachng2017 avatar bachng2017 commented on June 8, 2024

hello @hovaesco
I would like to confirm about the #30 merge (sorry that I should contact earlier but did not have time)

the fix seems to eliminate (pass) all start_transaction/commit/rollback in the old codes with this reason

dbt-trino adapter uses connection in auto-commit mode, hence there is no need to separately start a new transaction in each query since it makes no effect. In auto-commit mode, the client will create a new transaction for each query.

with the old codes, the typical sequence happens like this:

start transaction -> create tmp table -> create real table ->  commit

in new codes, it just

create tmp table -> create real table

so, autocommit for each query might not be sufficient . What do you think ?

the affect is not too big but the concept is very different from the old codes.

from dbt-trino.

hovaesco avatar hovaesco commented on June 8, 2024

Could you please elaborate more on autocommit for each query might not be sufficient ?
Each query is executed with autocommit so each operation is committed immediately after performing it.

from dbt-trino.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.