Comments (11)
FWIW, I think is a bit too clever to be relevant at this phase of metriql. The proposal isn't in harmony with the python principle of “prefer explicit over implicit”.
from metriql.
@dbrtly thanks for the input. The rationale behind this feature is making the switch to metriql for the existing dbt projects seamless. The power users usually have a convention for the metrics inside dbt models (here is an example: https://emilyriederer.netlify.app/post/convo-dbt/) and I'm trying to provide a way for them to use metriql with changing only a few lines of their existing files.
I agree with you on prefer explicit over implicit
but reducing the friction is also important in this context. We're unlikely to implement such a solution without having a consensus anyways. :)
from metriql.
I hadn’t seen this post before. I get dbt errors if I leave out the yml files.
edit: I thought I would get errors. Turns out dbt doesn't validate like that. :(
Is it feasible to go further than a naming convention and get better reach? Users don't necessarily having a naming convention in common but the ansi sql is common. Could metricql automatically add meticql config to a yml file after scanning the code for the full pattern in the second last cte:
select … sum(…) as column_name
and automatically classify all of those columns as measures?
from metriql.
I could not find a reliable way to do that yet and I don't think that I can. Here are the limitations:
-
The first option is parsing SQL queries but every database has a different dialect so it's not easy as it seems. The parsers need to understand the full syntax of the database and I don't know any fault-tolerant parser that just skips the unknown parts of the query and just looks at the column definitions of the actual query.
-
The other solution would be running some metadata queries in the target database to find out the aggregation function of the metrics. If your
materialization
isview
, a few databases provide a way to inspect the views but if it'stable
orincremental
, then all the metrics become regular columns and even the database doesn't know if they're metric or not.
from metriql.
Hey @buremba,
To me the metrics should be part of metadata and YML files, so that analysts who don't want to break the data flows can easily inject and manage their metrics. From this aspect I like how cube.js defines complex metrics within "sql" meta (https://cube.dev/docs/direction-of-joins#transitive-join-pitfall). Maybe metriql can introduce a new dbt macro that generates/propagates metric SQL defined in YAML into model sql files. For e.g., in select section of a model so that all auto-generated metrics are strictly managed in YAML files:
select
.... some source to propagated expression,
{{ metriql_add_metrics(model_name, metric_group, materialize='metriql_only', put_comma_before=False, ... ) }}
...
from ... some_source ...
Moreover, analysts probably wouldn't want to define metrics only on the sources but also on the model outputs. This will require introduction of model yaml definitions. Maybe metriql should not place metrics in source YAML files, but introduce its own YAML directory to add metrics to sources or models, and developers can integrate such metrics into data flow by calling metriql macros within their transformation queries.
from metriql.
from metriql.
Hey @muskirac, thanks for the insight. Indeed, metriql lets you create datasets from all your dbt resources including models, sources, and seeds. We also support sql relations and generate JOIN relations if you query a field from the using the dot notation just like Cube.js.
metriql generates all the SQL queries at runtime and all the consumers connect their data warehouse through our JDBC, REST API, etc. so I'm not sure if generating dbt models from metriql has a potential use-case with the metriql_add_metrics
macro. How do plan to use the generated dbt model?
@dbrtly, clever idea! I agree with both of you, this approach seems to be a "black box" for the end-users and potentially be confusing since there will be multiple ways to define metrics. If people define the metrics using column names, it will be hard for them to switch to YML later on as the logic will be separated into two different parts of the codebase.
For now, I'm in favor of generating a macro just like dbt's codegen package as it will be a native dbt approach.
from metriql.
A less verbose syntax:
Eg macro sum_when_boolean
select
blah_blah,
{{ metrics.sum_when_boolean(‘my_value’, ‘bool_col’) }}
from my_table
group by 1
select
blah_blah,
sum(if(bool_col, my_value, null)
as total_my_value_when_my_boolean
from my_table
group by 1
from metriql.
Thinking further is the dbt docs feature extensible? Can you do your own implementation of exposures and make it dance with the documentation website? Maybe:
metriql docs generate
metriql docs serve
from metriql.
@dbrtly, metriql CLI comes with a dashboard that lets you connect to BI tools and audit all the queries at the moment. We're planning to add a new page that visualizes all the data marts and lets you run queries in a simple interface for testing purposes. I believe that rather than introducing separate static documentation, it can be included in the dashboard. What do you think?
from metriql.
We support dbt's native metric definitions at the moment so so we don't need to introduce an alternative for serving docs, running tests, or defining the metrics itself.
from metriql.
Related Issues (20)
- Installation Error
- Does Metriql support connection with DBT-Vertica?
- Invalid query aliases when `materialize` is enabled
- Building Metriql With Local, Proprietary JAR File HOT 1
- How to eliminate double quotes from identifier names in metriql's automatically generated SELECT queries HOT 2
- Support `HAVING (COUNT(1) > 0)` in MQL
- Support for accessing Trino functions in SQL Context not working as expected
- Allow user-supplied names for precomputed tables (roll-up tables) HOT 3
- Unknown data type: Array(String) HOT 1
- unable to compile 0.6-SNAPSHOT HOT 2
- Does not support count_distinct dbt metric type HOT 3
- Metriql cannot find parent models of dbt metrics HOT 2
- Error Date-DateTime column in ClickHouse-Metriql
- Read timed out | socket_timeout does not work
- Repeated Alias HOT 1
- Neo4j support
- dbt 1.3.0 has changed raw_sql to raw_code HOT 1
- how to use pivots propertie?
- Superset Connection HOT 1
- profiles.yml does not exist in /root/.dbt/profiles.yml HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metriql.