Giter Club home page Giter Club logo

Comments (11)

dbrtly avatar dbrtly commented on May 30, 2024 1

FWIW, I think is a bit too clever to be relevant at this phase of metriql. The proposal isn't in harmony with the python principle of “prefer explicit over implicit”.

from metriql.

buremba avatar buremba commented on May 30, 2024

@dbrtly thanks for the input. The rationale behind this feature is making the switch to metriql for the existing dbt projects seamless. The power users usually have a convention for the metrics inside dbt models (here is an example: https://emilyriederer.netlify.app/post/convo-dbt/) and I'm trying to provide a way for them to use metriql with changing only a few lines of their existing files.

I agree with you on prefer explicit over implicit but reducing the friction is also important in this context. We're unlikely to implement such a solution without having a consensus anyways. :)

from metriql.

dbrtly avatar dbrtly commented on May 30, 2024

I hadn’t seen this post before. I get dbt errors if I leave out the yml files.
edit: I thought I would get errors. Turns out dbt doesn't validate like that. :(

Is it feasible to go further than a naming convention and get better reach? Users don't necessarily having a naming convention in common but the ansi sql is common. Could metricql automatically add meticql config to a yml file after scanning the code for the full pattern in the second last cte:

select … sum(…) as column_name

and automatically classify all of those columns as measures?

from metriql.

buremba avatar buremba commented on May 30, 2024

I could not find a reliable way to do that yet and I don't think that I can. Here are the limitations:

  1. The first option is parsing SQL queries but every database has a different dialect so it's not easy as it seems. The parsers need to understand the full syntax of the database and I don't know any fault-tolerant parser that just skips the unknown parts of the query and just looks at the column definitions of the actual query.

  2. The other solution would be running some metadata queries in the target database to find out the aggregation function of the metrics. If your materialization is view, a few databases provide a way to inspect the views but if it's table or incremental, then all the metrics become regular columns and even the database doesn't know if they're metric or not.

from metriql.

muskirac avatar muskirac commented on May 30, 2024

Hey @buremba,

To me the metrics should be part of metadata and YML files, so that analysts who don't want to break the data flows can easily inject and manage their metrics. From this aspect I like how cube.js defines complex metrics within "sql" meta (https://cube.dev/docs/direction-of-joins#transitive-join-pitfall). Maybe metriql can introduce a new dbt macro that generates/propagates metric SQL defined in YAML into model sql files. For e.g., in select section of a model so that all auto-generated metrics are strictly managed in YAML files:

select
  .... some source to propagated expression,
  {{ metriql_add_metrics(model_name, metric_group, materialize='metriql_only', put_comma_before=False, ... ) }}
  ...
from ... some_source ...

Moreover, analysts probably wouldn't want to define metrics only on the sources but also on the model outputs. This will require introduction of model yaml definitions. Maybe metriql should not place metrics in source YAML files, but introduce its own YAML directory to add metrics to sources or models, and developers can integrate such metrics into data flow by calling metriql macros within their transformation queries.

from metriql.

dbrtly avatar dbrtly commented on May 30, 2024

from metriql.

buremba avatar buremba commented on May 30, 2024

Hey @muskirac, thanks for the insight. Indeed, metriql lets you create datasets from all your dbt resources including models, sources, and seeds. We also support sql relations and generate JOIN relations if you query a field from the using the dot notation just like Cube.js.

metriql generates all the SQL queries at runtime and all the consumers connect their data warehouse through our JDBC, REST API, etc. so I'm not sure if generating dbt models from metriql has a potential use-case with the metriql_add_metrics macro. How do plan to use the generated dbt model?

@dbrtly, clever idea! I agree with both of you, this approach seems to be a "black box" for the end-users and potentially be confusing since there will be multiple ways to define metrics. If people define the metrics using column names, it will be hard for them to switch to YML later on as the logic will be separated into two different parts of the codebase.

For now, I'm in favor of generating a macro just like dbt's codegen package as it will be a native dbt approach.

from metriql.

dbrtly avatar dbrtly commented on May 30, 2024

A less verbose syntax:

Eg macro sum_when_boolean

select
blah_blah,
{{ metrics.sum_when_boolean(‘my_value’, ‘bool_col’) }}

from my_table
group by 1

select
blah_blah,
sum(if(bool_col, my_value, null)
as total_my_value_when_my_boolean

from my_table
group by 1

from metriql.

dbrtly avatar dbrtly commented on May 30, 2024

Thinking further is the dbt docs feature extensible? Can you do your own implementation of exposures and make it dance with the documentation website? Maybe:

metriql docs generate
metriql docs serve

from metriql.

buremba avatar buremba commented on May 30, 2024

@dbrtly, metriql CLI comes with a dashboard that lets you connect to BI tools and audit all the queries at the moment. We're planning to add a new page that visualizes all the data marts and lets you run queries in a simple interface for testing purposes. I believe that rather than introducing separate static documentation, it can be included in the dashboard. What do you think?

from metriql.

buremba avatar buremba commented on May 30, 2024

We support dbt's native metric definitions at the moment so so we don't need to introduce an alternative for serving docs, running tests, or defining the metrics itself.

from metriql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.