Giter Club home page Giter Club logo

elementary-data / dbt-data-reliability Goto Github PK

View Code? Open in Web Editor NEW
345.0 5.0 78.0 7.63 MB

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Home Page: https://www.elementary-data.com/

License: Apache License 2.0

Python 99.90% Shell 0.10%
data dbt-packages data-reliability data-pipeline-monitoring data-pipelines analytics analytics-engineering data-observability data-lineage dbt

dbt-data-reliability's People

Contributors

arun-kc avatar avivzikeldy avatar civitaspo avatar dapollak avatar ecalzo avatar ekatia avatar ellakz avatar elongl avatar frannydelaney avatar github-actions[bot] avatar haritamar avatar idoneshaveit avatar kkprab avatar kouridis avatar maayan-s avatar mossyyy avatar mossyyyy avatar noakurman avatar noyaarie avatar ofek1weiss avatar oravi avatar pei0804 avatar roitabach avatar smitsrr avatar suelai avatar syou6162 avatar tc-chrisbui avatar theromsh avatar tosborne-slalom avatar web-flow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dbt-data-reliability's Issues

Incorrect name for : athena__generate_elementary_profile_args in macros/utils/cross_db_utils/generate_elementary_profile_args.sql

Testing out the athena update for elementary.
when running the dbt run-operation elementary.generate_elementary_cli_profile command an error is returned:
Adapter "athena" is not supported on Elementary.

Zooming in on the macro which causes this: athena__generate_elementary_profile_args shows that the athena macro is named athena__generate_elementary_cli_profile , which I think is incorrect and should be athena__generate_elementary_profile_args

dbt_invocations model > job_name column showing up as 'null'

Issue description
Hello, we have a client who is using elementary dbt package in dbt Cloud and reported that they are not getting a value returned for the job_name column in the dbt_invocations model. Entire column just have 'null' values.

We tested and it seemed to be the case for us as well. Are we missing a config somewhere in order to have this value returned?

Steps to reproduce the behavior:

  1. Install the package in dbt Cloud's IDE as per Quick Start instructions here
  2. Run a production job
  3. Query in the datawarehouse (in this case Snowflake) for the dbt_invocations model
  4. See that the job_name column has no value

Screenshots
dbt_invocations   job_name column

Other documents
I have also attached the csv file download of the model here - dbt_invocations (test).csv, for reference.

Additional comments
On elementary docs here, it seems to suggest that these data should automatically populate if we are using an orchestrator that Elementary natively supports (which we are with dbt Cloud).

Version
dbt package version: 0.14.1

Thank you!

[FEATURE] support postgresql

Thanks for all the work involved in creating a cool tool!

I would love to see support for "postgres".

  • OR -

I have setup my dbt profiles.yml to type "postgres". Elementary should check the current/active dbt profile for the configured type and "threw an exception"/"log an error" if the current type is unsupported.

repoduce

  1. install postgres 14, configure your dbt profiles.yml to point to the postgres
  2. run dbt run -s elementary
    OR
  3. run dbt test. I have a test set-up which uses the 'accepted_values' built-in test:
      - name: c_mktsegment
        tests:
          - not_null
          - accepted_values:
              values: ['MACHINERY', 'AUTOMOBILE', 'BUILDING', 'FURNITURE', 'HOUSEHOLD' ]

log of 2.

11:46:03  8 of 21 START table model dev_elementary.dbt_tests ............................. [RUN]
11:46:04  8 of 21 ERROR creating table model dev_elementary.dbt_tests .................... [ERROR in 0.51s]
11:46:04  9 of 21 START incremental model dev_elementary.elementary_test_results ......... [RUN]
[...]
11:46:05  13 of 21 START view model dev_elementary.metrics_anomaly_score ................. [RUN]
11:46:05  Warning: the `current_timestamp` macro is deprecated and will be removed in a future version of the package, once equivalent functionality is implemented in dbt Core. The elementary.metrics_anomaly_score model triggered this warning.
11:46:05  13 of 21 ERROR creating view model dev_elementary.metrics_anomaly_score ........ [ERROR in 0.11s]
[...]
11:46:05  Running 2 on-run-end hooks
11:46:05  1 of 2 START hook: elementary.on-run-end.0 ..................................... [RUN]
11:46:05  1 of 2 OK hook: elementary.on-run-end.0 ........................................ [OK in 0.00s]
11:46:06  Database error while running on-run-end
11:46:06  Encountered an error:
Database Error
  syntax error at or near "dummy_string"
  LINE 6: ...t\n            \n                \n        cast(\'dummy_stri...
[...]
dbt.exceptions.DatabaseException: Database Error
  syntax error at or near "dummy_string"
  LINE 6: ...t\n            \n                \n        cast(\'dummy_stri...

log for 3.

11:48:03  Running 2 on-run-end hooks
11:48:04  Database error while running on-run-end
11:48:04  Encountered an error:
Database Error
  syntax error at or near "MACHINERY"
  LINE 6: ...rom all_values\nwhere value_field not in (\n    \'MACHINERY\...
                                                               ^
11:48:04  Traceback (most recent call last):
  File ".conda\envs\dbt-play\lib\site-packages\dbt\adapters\postgres\connections.py", line 65, in exception_handler
    yield
  File ".conda\envs\dbt-play\lib\site-packages\dbt\adapters\sql\connections.py", line 70, in add_query
    cursor.execute(sql, bindings)
psycopg2.errors.SyntaxError: syntax error at or near "MACHINERY"
LINE 6: ...rom all_values\nwhere value_field not in (\n    \'MACHINERY\...

dbt_utils dependency is too strict

Hi! In reviewing this repo before adding it to the dbt hub, I had a look at your packages.yml and dbt_project.yml files.

Since you're pinning your require-dbt-version to [>= 0.20, < 1.1], your dependency being locked to dbt-utils 0.7.6 is too strict. Earlier 0.7.x versions are compatible with 0.20.x of dbt Core, and v0.8.x are compatible with 1.x

I'd suggest changing your require-dbt-version to [">0.20.0", "<2.0.0"] and dbt_utils version to [">=0.7.0", "<0.9.0"], otherwise some users who install your package may run into dependency issues.

Let me know if you've got any questions πŸš€

on-run-end failed, error: value too long for type character varying(65535)

Hi all! πŸ‘‹

While running dbt test we got an error saying:

on-run-end failed, error:
 value too long for type character varying(65535)

After digging a bit, I found that the culprit is one of our unit-test for some macros, where we generate the test code using jinja to assert several conditions.
The compiled code ends up in a really long file. I found that your run-end hook calls several macros, but I got a bit lost in there. I guess that one of these macros is trying to upload the compiled code into a VARCHAR or TEXT variable (in Redshift) causing it to overflow (since the max size for a VARCHAR in Redshift is 65535).

Some more details about my env:

I'm running on Redshift using dbt core.

dbt version: 1.7.4
dbt-redshift: 1.7.1
elementary dbt-package: 0.13.0
edr version: 0.13.2

I'm just posting the issue, so I don't want to lead to any solution before is checked that it's not just a singular thing, but an easy solution would be to truncate every string saved to the max size or the TEXT variable based on the adaptor.

On-run-end macros for storing dbt_tests throws an error

Hello,

Recently decided to try and use elemenary-data for observability, but stumbled upon problem that BigQuery throws me an error Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.

Which could be expected, because as of now we have ~1600 tests and trying to insert all that metadata about each of them seems to be too much for BigQuery.

I was wondering maybe we could retrieve test resources from manifest.json based on name thats provided after -s flag. There should be multiple checks I suppose, based on different possibilities for model selection (concrete name, tag, etc.).

Do you think that could be a viable solution for monorepo lovers out here?

Many thanks,
Justas

Test failing to run since upgrading to 0.13

Hey,

We're seeing a couple of tests failing to run since upgrading from 0.11.2 to 0.13. The tests in question have to use the . notation to look at the volume anomalies but this has been failing since the upgrade - FYI we're using databricks as our destination in dbt.

The test is displayed below.

      - name: transcription_jobs
        tests:
        - elementary.volume_anomalies:
            timestamp_column: updated_at.member0
            backfill_days: 1
            config:
              severity: warn
        - dbt_expectations.expect_table_row_count_to_be_between:
            min_value: 400000

I've also tried wrapping the timestamp column like this = timestamp_column: updated_at.member0. When doing this I receive a unicode error, which may or may not be related to this recent change on the python version of elementary.

elementary column anomalies tests fail on redshift with error 'Relation name is longer than 127 characters'

Describe the bug
elementary column_anomalies tests are unable to run on Redshift due to table naming length restrictions

To Reproduce
Steps to reproduce the behavior:

  1. add elementary.column_anomalies tests to columns in a dbt model that builds in Redshift
columns:
      - name: id
        tests:
          - elementary.column_anomalies:
              column_anomalies:
                - null_count
                - missing_count
                - average
        description: ''
        meta:
          hidden: true
  1. test the model with dbt test -s my_model

error occurs:

Runtime Error in test elementary_column_anomalies_fact_table_null_count__missing_count__average__success_count (models/marts/fact_table.yml)
Relation name '"test_7ffaec620b_elementary_column_anomalies_fact_table_null_count__missing_count__av__metrics__tmp_20240423000522692548"' is longer than 127 characters

Expected behavior
Tests run successfully and elementary is able to create temp tables

Environment (please complete the following information):

  • dbt package Version: 0.14.1

Additional context
dbt running in prod redshift

Instruct users to explicitly override `view` and `incremental` materializations

Hey Elementary team!

Change to materialization search order

We're making a change to how dbt searches for implementations of the "builtin" materializations (view, table, etc). Previously, dbt would (implicitly) prefer adapter-specific implementations of materializations defined in packages over the out-of-the-box implementations within dbt, without the user necessarily knowing. We're changing this to be more explicit and less surprising.

I see that this package reimplements the table and incremental materializations. Starting with latest patch releases of dbt Core v1.6.14 and v1.7.14, users will see a deprecation message like the following:

$ dbt run -s my_table_model
16:41:27  Running with dbt=1.7.14
16:41:27  Registered adapter: snowflake=1.7.3
16:41:27  Found 31 models, 2 operations, 0 sources, 0 exposures, 0 metrics, 1232 macros, 0 groups, 0 semantic models
16:41:27
16:41:31
16:41:31  Running 1 on-run-start hook
16:41:31  1 of 1 START hook: elementary.on-run-start.0 ................................... [RUN]
16:41:31  1 of 1 OK hook: elementary.on-run-start.0 ...................................... [OK in 0.00s]
16:41:31
16:41:31  Concurrency: 8 threads (target='dev')
16:41:31
16:41:31  1 of 1 START sql table model dbt_jcohen_dev.my_table_model ..................... [RUN]
16:41:31  [WARNING]: Installed package 'elementary' is overriding the built-in
materialization 'table'. Overrides of built-in materializations from installed
packages will be deprecated in future versions of dbt. Please refer to https://d
ocs.getdbt.com/reference/global-configs/legacy-behaviors#require_explicit_packag
e_overrides_for_builtin_materializations for detailed documentation and
suggested workarounds.
16:41:33  1 of 1 OK created sql table model dbt_jcohen_dev.my_table_model ................ [SUCCESS 1 in 2.32s]

Proposal

I'd recommend updating the "Quickstart" guide in your documentation & package README, adding a step where users explicitly opt into the materialization overrides by adding the following code to a macros/ file in their root projects:

{% materialization table, adapter='snowflake' %}
  {{ return(elementary.materialization_table_snowflake()) }}
{% endmaterialization %}

{% materialization incremental, adapter='incremental' %}
  {{ return(elementary.materialization_table_snowflake()) }}
{% endmaterialization %}

(Replacing snowflake with bigquery, default, etc as appropriate)

In the meantime, users can opt out of this behavior by setting this flag explicitly in their project files:

# dbt_project.yml
flags:
  require_explicit_package_overrides_for_builtin_materializations: False

They will continue to see the deprecation warning.

Timeline

This flag was added (opt-in, disabled by default) in dbt Core v1.6.14 + v1.7.14.

The default value of this flag will be switching from False to True in dbt Core v1.8.0 (release candidate, final released planned for ~May 8) and dbt Cloud (~May 15).

References

Error `The query is too large. The maximum standard SQL query length is 1024.00K characters` when using insert_rows_method `chunk`

Hi guys!

I've added arg chunk_size when calling insert_rows to fix the error Maximum number of resources referenced per query are 1,000 resources serval days ago. #669

However, a new error has occurred The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.

It appears that the compiled_code was too long when running the following query, even though we only have 400 records to insert.

insert into `my-project`.`elementary`.`dbt_run_results`
         (model_execution_id,unique_id,invocation_id,generated_at,created_at,name,message,status,resource_type,execution_time,execute_started_at,execute_completed_at,compile_started_at,compile_completed_at,rows_affected,full_refresh,compiled_code,failures,query_id,thread_id,materialization,adapter_response) values ...

Is there a way to make max_query_size work even when using insert_rows_method chunk?
If you have any ideas, please let me know, I can submit a PR.

Thank you

Environment

- dbt=1.7.8
- adapter: bigquery=1.7.4
- elementary-data/elementary=0.14.1

Variables

elementary:
    "insert_rows_method": "chunk"
    "dbt_artifacts_chunk_size": 500

Feature request: Add installed packages information

I'm not sure where this best fits, maybe in the invocation model, maybe as a new model, but I would find it useful to know what versions of packages I have installed on a given run of my project. Since v1.7 dbt has produced the package-lock file which may contain the relevant information, but there may be a way to identity this in older versions as well.

[BUG] elementary does not appear to work if Unity Catalog not configured

Context

Recently we were given the option to enable UC in our Databricks workspaces. I haven't enabled it in our workspace (yet) and thus am technically still using the hive_metastore. I started seeing this error - not sure if its correlated.

Note: probably figured by now, this is being done on Databricks.

Steps to reproduce

  1. Set up Databricks SQL warehouse
  2. Follow steps here to: set up App Registration; set up profiles.yml in your dbt project
  3. Set up elementary as per normal
    1. packages.yml
    packages:
      - package: dbt-labs/dbt_utils
        version: 1.1.1
      - package: dbt-labs/dbt_external_tables
        version: 0.8.5
      - package: elementary-data/elementary
        version: 0.9.3
        ## Docs: https://docs.elementary-data.com
      - package: dbt-labs/audit_helper
        version: 0.9.0
    
    1. dbt_project.yml
    name: 'dbt_demo'
    version: '1.0.0'
    config-version: 2
    
    #Global Variables for the Project
    vars:
      unknown_key: -1
    
    # This setting configures which "profile" dbt uses for this project.
    profile: 'dbt_demo'
    
    # These configurations specify where dbt should look for different types of files.
    # The `model-paths` config, for example, states that models in this project can be
    # found in the "models/" directory. You probably won't need to change these!
    model-paths: ["dbt_demo/models"]
    analysis-paths: ["dbt_demo/analyses"]
    test-paths: ["dbt_demo/tests"]
    seed-paths: ["dbt_demo/seeds"]
    macro-paths: ["dbt_demo/macros"]
    snapshot-paths: ["dbt_demo/snapshots"]
    
    clean-targets:         # directories to be removed by `dbt clean`
      - "dbt_demo/target"
      - "dbt_demo/dbt_packages"
    
    
    # Configuring models
    # Full documentation: https://docs.getdbt.com/docs/configuring-models
    
    # In this example config, we tell dbt to build all models in the example/
    # directory as views. These settings can be overridden in the individual model
    # files using the `{{ config(...) }}` macro.
    models:
      dbt_demo:
        # Config indicated by + and applies to all files under models/example/
        some_project:
          +materialized: view
          +file_format: delta
          
          intermediate:
            some_model_type:
              schema: "some_schema"
          
          datamarts:
            +materialized: table
            some_datamart_model:
              schema: "some_mart_schema"
              location_root: "abfss://[email protected]/loc/some_mart_schema"
      elementary:
        +schema: "test_stats"
    
    snapshots:
      dbt_demo:
        +target_schema: 'some_snapshot_schema'
    
    
  4. Set up dbt project, deps etc
  5. Run dbt build -f -s elementary

Expected outcome: runs successfully

Actual outcome

Most tests pass, but one doesn't. Snippet of output after it runs:

01:34:56  Finished running 15 incremental models, 1 table model, 14 view models, 2 hooks in 0 hours 1 minutes and 33.93 seconds (93.93s).
01:34:56  
01:34:56  Completed with 1 error and 0 warnings:
01:34:56  
01:34:56    Runtime Error in model information_schema_columns (models/edr/dbt_artifacts/information_schema_columns.sql)
  [UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.

System

  • OS: MacOS 14.3 (23D56) Sonoma
  • Python: 3.10.2
  • dbt: 1.7.9
  • dbt-databricks: 1.7.8
  • elementary: 0.14.1

Workaround

I'm just not using it, but would be nice if we could. So disabled getting information_schema_columns and the dependent model enriched_columns generated by adding this to dbt_project.yml

  elementary:
    +schema: "elementary"
    # NOTE: disabling the below because they are empty tables but are erroring for projects that don't have unity catalog enabled
    edr:
        dbt_artifacts:
            information_schema_columns:
                enabled: false
            enriched_columns:
                enabled: false

[BUG] elementary.dbt_columns merge 0 rows, execution takes > 2500s

Hi Elementary team.

We have been using Elementary for a long time and love the work you do. I do not always look through our daily runs, but sometimes I do and i noticed some heavy models. One of them was elementary.dbt_columns which seems to be an incremental model, it runs with our daily job at 05:00 together with a bunch of other models and tests. This incremental model (according to the logs, see screenshot) merges 0 rows, but takes over 2500seconds (~42 minutes ) to build. Which is kinda insane, so I think something is off here.
image

our setup:

elementary 0.13.0
dbt-core 1.7.0
dbt-bigquery 1.7.0

I also tried bumping up to elementary 0.14.1 to test out the Alerts functionality for Teams, did a test run for the whole setup locally against our dev environment which holds around 100 models/tests. It also had another similiar model which also took tremendously long amount of time.

image

I haven't deepdived into the logs in the VM's running these daily jobs. So not sure exactly what is happening here. But when I ran the edr monitor for alerting locally i got this folder called edr_target/ which had a edr.log file that I looked inside. It gave me some indications of some errors, could they be correlated? I'll attach the logfile to this issue.

edr.log

Thanks in advance.

// Christopher

PostgreSQL Replica Issue when Using Elementary

The issue happens when trying to set up elementary on PostgreSQL with replicas.
We have AWS Aurora PostgreSQL. When I try itialize elementary the errors like "cannot delete from table "elementary_test_results" because it does not have a replica identity and publishes deletes" are thrown. Some elemetary models are not even created because of this issue. I'll try to create them manually and then set replica identity to full for all of them but that is not very convenient.
Is there the way to set all these models for my use case?

dbt Version 1.6.1
elemetary version 0.10.0

error "Pass a string for project" or "Column created_at is not present in table my_project.dbt_elementary.data_monitoring_metrics" if upgrade dbt package version from 0.7.3 to 0.14.0

  • Describe the bug:

Currently I upgrade dbt elementary package version from 0.7.3 to 0.14.0
I got an error when run dbt test for elementary anomalies test:

dbt test

'''

08:19:43 Unhandled error while executing
Pass a string for project
08:19:43 4 of 26 ERROR elementary_source_table_anomalies_ileyshop_digdag_lobby_src_transaction_master_row_count [ERROR in 0.41s]

'''

If my command add flag
'''
dbt test --target prod
'''

the error will disappear and error will become
''' 08:41:26 on-run-end failed, error:
Column created_at is not present in table my_project.dbt_elementary.data_monitoring_metrics at [18:9]

'''

my dbt_project.yml setting for elementary:
'''

models:
elementary:
+database: my_project
+schema: dbt_elementary
+enabled: "{{ (target.name == 'prod') | as_bool }}"

'''

  • To Reproduce
    Steps to reproduce the behavior:

1.upgrade dbt package from 0.7.3 to 0.14.0
2. run elementary anomalies dbt test or dbt test --target prod

Expected behavior
All tests pass and got No error

  • Environment (please complete the following information):

dbt package Version: 0.14.0
dbt core Version: 1.7.8

  • Additional context:
    I don't change any config in dbt_project.yml.

Should I do any thing like drop "data_monitoring_metrics" and other tmp tables for upgrading to v0.14.0?

Error creating ELEMENTARY.DBT_METRICS model

10:41:52  Compilation Error in model dbt_metrics (models/edr/dbt_artifacts/dbt_metrics.sql)
10:41:52    macro 'dbt_macro__get_flatten_metric_callback' takes not more than 0 argument(s)
10:41:52
10:41:52    > in macro upload_artifacts_to_table (macros/edr/dbt_artifacts/upload_artifacts_to_table.sql)
10:41:52    > called by macro upload_dbt_metrics (macros/edr/dbt_artifacts/upload_dbt_metrics.sql)
10:41:52    > called by macro get_flatten_metric_callback (macros/edr/dbt_artifacts/upload_dbt_metrics.sql)
10:41:52    > called by macro run_hooks (macros/materializations/hooks.sql)
10:41:52    > called by macro materialization_table_snowflake (macros/materializations/table.sql)
10:41:52    > called by model dbt_metrics (models/edr/dbt_artifacts/dbt_metrics.sql)

when running dbt run --select elementary

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.