decodableco / dbt-decodable Goto Github PK
View Code? Open in Web Editor NEWA dbt adapter for Decodable
Home Page: https://decodable.co
License: Apache License 2.0
A dbt adapter for Decodable
Home Page: https://decodable.co
License: Apache License 2.0
As observed in #35, the current error classification could use improvement. All 409 errors are described as ResourceAlreadyExists
, which is not precise (see linked issue for an example).
I wonder why other error codes don't appear here either (e.g. 403, 415, ...)
Trying to run a materialized test leads to the following error:
12:09:48 Completed with 1 error and 0 warnings:
12:09:48
12:09:48 Compilation Error in macro decodable__create_table_as (macros/materializations/table/create_table_as.sql)
12:09:48 macro 'dbt_macro__decodable__create_table_as' takes not more than 3 argument(s)
12:09:48
12:09:48 > in macro create_table_as (macros/materializations/models/table/create_table_as.sql)
12:09:48 > called by macro statement (macros/etc/statement.sql)
12:09:48 > called by macro materialize_test_as_table (macros/materializations/test/test_as_table.sql)
12:09:48 > called by macro materialization_test_decodable (macros/materializations/test/test.sql)
12:09:48 > called by macro decodable__create_table_as (macros/materializations/table/create_table_as.sql)
Looking at the code, I see that the create_table_as
macro takes only 3 arguments, while the test_as_table
macro tries to call it with 4
There is currently no example for a seed. A basic .csv should be added to the existing example project or a new one.
I have the following basic test_seed.csv
file in the seeds folder:
field1,field2,field3
value11,value21,1
value12,value22,2
I ran dbt seed
twice (testing for issue 33). The second time, I received this error:
15:50:53 Unhandled error while executing seed.example.test_seed
Decodable: ResourceAlreadyExists: {'timestamp': '2024-01-14T15:50:53.806+00:00', 'message': 'Could not change the job state from [STARTING] to [STOPPING], probably because an existing operation is in progress. Please wait for the operation to complete.'}
At the same time, the old records were gone and the stream was populated with two new records. So it looks like:
(Unexpected behavior starts here)
STARTING
stateWhile it is certainly unexpected that the connection already produced records while still being in STARTING
state, the dbt client should wait until it observes RUNNING
state before deactivation.
Side note: Looks like all 409 errors are classified as ResourceAlreadyExists. That's not precise, as this example shows. Created #36 to cover that.
This fails:
dbt run
16:35:10 Running with dbt=1.3.3
16:35:10 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 273 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
16:35:10
16:35:11 Encountered an error:
Runtime Error
Status code: 403. Decodable connection failed. Try running 'decodable login' first
But I am definitely logged in, e.g. decodable connection list
shows me all connections.
I created test_seed.csv
in the seeds folder with the following content:
field1,field2,field3
value11,value21,1
value12,value22,2
Then I ran dbt seed
, which created and populated the stream dbt_demo__test_seed
as expected (where dbt_demo
is my local namespace).
However, when running dbt seed --full-refresh
afterwards, I get this error:
15:41:12 Unhandled error while executing seed.example.test_seed
Decodable: ResourceAlreadyExists: {'timestamp': '2024-01-14T15:41:12.112+00:00', 'message': 'Unable to delete stream id [cad36ab9] as it is referenced by connection [ff748cbd].'}
Non-materialized tests currently run a SQL preview and only output anything once the preview is complete - which, as of now, can take a while and appear unresponsive (75s in some basic tests I did). It would be good to periodically output something to inform the user of the current state ("Submitted preview query", "Waiting for results", ...)
Similar to #26, stream clear should now be handled via the data plane.
If
dbt run
, andthen both the target stream as well as the pipeline are deleted and recreated - even if the stream schema matches the sql. There should be no need to delete and recreate the stream if the schema matches.
(*) Changes can be as trivial as added whitespace. Maybe this could be an issue in itself, though I am not sure how hard it would be to implement a good solution.
Import the internal code from the dbt adapter to this public repository.
When adding the following pipeline via dbt:
SELECT p.category, sum(o.price)
FROM gunnar_orders o
LEFT JOIN gunnar_products p ON o.product_id = p.id
LEFT JOIN gunnar_shipments s ON s.order_id = o.order_id
GROUP BY p.category
I'm getting this error:
(dbt-venv) ➜ example git:(main) ✗ dbt run
12:01:14 Running with dbt=1.3.3
12:01:14 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 273 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
12:01:14
12:01:16 Concurrency: 1 threads (target='dev')
12:01:16
12:01:16 1 of 1 START sql table model gunnar_orders_aggregated .......................... [RUN]
12:01:19 1 of 1 ERROR creating sql table model gunnar_orders_aggregated ................. [ERROR in 2.71s]
12:01:19
12:01:19 Finished running 1 table model in 0 hours 0 minutes and 5.28 seconds (5.28s).
12:01:19
12:01:19 Completed with 1 error and 0 warnings:
12:01:19
12:01:19 Compilation Error in model gunnar_orders_aggregated (models/example/gunnar_orders_aggregated.sql)
12:01:19 Error checking changes to the 'gunnar_orders_aggregated' stream: Compilation Error
12:01:19 Type 'STRING PRIMARY KEY' not recognized
12:01:19
12:01:19 > in macro materialization_table_decodable (macros/materializations/table/table.sql)
12:01:19 > called by model gunnar_orders_aggregated (models/example/gunnar_orders_aggregated.sql)
12:01:19
12:01:19 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
Currently, the pipeline and stream corresponding to a dbt table model always share the same name (as mentioned in the docs: https://github.com/decodableco/dbt-decodable#materializations). While this is fine as default behavior, it would be good to make the stream name configurable.
According to the docs, pipelines should be activated as part of dbt run
: https://github.com/decodableco/dbt-decodable#materializations. However, this doesn't happen if the pipeline already exists, but is in STOPPED
state
In order to prevent accidental deletion of existing streams, and to make users aware of the namespace feature, we should add a namespace to the Getting Started guide.
The adapter currently supports tests, but the example project does not contain any. We should add both singular and generic tests to the example project (if generic tests are possible without major effort). See https://docs.getdbt.com/docs/build/tests.
Getting this error:
dbt run
16:20:57 Running with dbt=1.3.3
16:20:57 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 273 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
16:20:57
16:20:58 Encountered an error:
Undefined 'default in decodable profile file ~/.decodable/auth
16:20:58 Traceback (most recent call last):
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/main.py", line 135, in main
results, succeeded = handle_and_check(args)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/main.py", line 198, in handle_and_check
task, res = run_from_args(parsed)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/main.py", line 245, in run_from_args
results = task.run()
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/task/runnable.py", line 472, in run
result = self.execute_with_hooks(selected_uids)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/task/runnable.py", line 434, in execute_with_hooks
self.before_run(adapter, selected_uids)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/task/run.py", line 426, in before_run
self.populate_adapter_cache(adapter, required_schemas)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/task/runnable.py", line 409, in populate_adapter_cache
adapter.set_relations_cache(self.manifest)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/adapters/base/impl.py", line 437, in set_relations_cache
self._relations_cache_for_schemas(manifest, required_schemas)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/adapters/base/impl.py", line 414, in _relations_cache_for_schemas
for relation in future.result():
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 438, in result
return self.__get_result()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
raise self._exception
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/utils.py", line 480, in connected
return func(*args, **kwargs)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/adapters/decodable/impl.py", line 348, in list_relations_without_caching
stream_list: List[Dict[str, Any]] = self._client().list_streams().items
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/adapters/decodable/impl.py", line 641, in _client
self.get_thread_connection().handle
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/contracts/connection.py", line 94, in handle
self._handle.resolve(self)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/contracts/con
nection.py", line 116, in resolve
return self.opener(connection)
File "demo/dbt-venv/lib/python3.9/site-packages/dbt/adapters/decodable/connections.py", line 114, in open
client = DecodableClientFactory.create_client(
File "demo/dbt-venv/lib/python3.9/site-packages/decodable/client/client_factory.py", line 34, in create_client
raise Exception(
Exception: Undefined 'default in decodable profile file ~/.decodable/auth
My ~/.decodable/config looks like this (note the active profile is "decodable-demo"):
version: 1.0.0
active-profile: decodable-demo
profiles:
decodable-demo:
account: decodable-demo
default:
account: decodable-test
I triggered a full cleanup using dbt run-operation cleanup
. It failed because I had a seed and a materialized test that depended on the stream created by that seed.
Maybe simply switching the order in the cleanup operation will fix this issue.
Query in dbt:
select o.order_id, o.price, p.name, p.category, s.*
from gunnar_orders o
inner join gunnar_products p on o.product_id = p.product_id
left join gunnar_shipments s on s.order_id = o.order_id
Failure:
15:58:29 Decodable: InvalidRequest: {'timestamp': '2023-03-24T15:58:29.641+00:00', 'status': 400, 'error': 'Bad Request', 'message': "Invalid pipeline. Reason: Table sink 'default_catalog.default_database.gunnar_orders_joined' doesn't support consuming update and delete changes which is produced by node Join(joinType=[LeftOuterJoin], where=[(order_id0 = order_id)], select=[order_id, price, name, category, shipment_id, order_id0, origin, destination, is_arrived], leftInputSpec=[NoUniqueKey], rightInputSpec=[HasUniqueKey])", 'path': '/v1alpha2/pipelines'}
Example:
10:25:49 Error creating the mystream stream: Compilation Error
10:25:49 Type 'ROW<`id` STRING NOT NULL, `created` TIMESTAMP(3) NOT NULL> NOT NULL' not recognize
This one works in the UI but not via dbt:
select
after.product_id,
before.name as old_name,
after.name as new_name,
to_timestamp_ltz(ts_ms, 3) as update_time
from table(to_append(`gunnar_products`))
where op = 'u'
and before.name <> after.name
17:02:53 Completed with 1 error and 0 warnings:
17:02:53
17:02:53 Compilation Error in model gunnar_product_name_changes (models/example/gunnar_product_name_changes.sql)
17:02:53 Error creating the gunnar_product_name_changes stream: Compilation Error
17:02:53 Type 'TIMESTAMP(3) WITH LOCAL TIME ZONE' not recognized
17:02:53
17:02:53 > in macro decodable__create_table_as (macros/materializations/table/create_table_as.sql)
17:02:53 > called by macro create_table_as (macros/materializations/models/table/create_table_as.sql)
17:02:53 > called by macro statement (macros/etc/statement.sql)
17:02:53 > called by macro materialization_table_decodable (macros/materializations/table/table.sql)
17:02:53 > called by model gunnar_product_name_changes (models/example/gunnar_product_name_changes.sql)
The introduction of BYOC to Decodable brought along a change in architecture - a much clearer split between the control plane and data plane.
Previously, previews were started and fetched directly from the control plane API. After this change, authorization tokens have to be obtained from the control plane, while the preview is started and fetched from the data plane API.
See https://docs.decodable.co/reference/createpreviewtokens, https://docs.decodable.co/reference/createsqlpreview, https://docs.decodable.co/reference/getsqlpreview.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.