Giter Club home page Giter Club logo

Comments (3)

Jstein77 avatar Jstein77 commented on August 24, 2024

in progress

from metricflow.

Jstein77 avatar Jstein77 commented on August 24, 2024

@tomkit.lento is this one still in progress?

from metricflow.

Jstein77 avatar Jstein77 commented on August 24, 2024

Predicate pushdown update - tl,dr; 1 week spike on post-plan-building optimizer approach, followed by a decision on whether to take another to finish up the optimizer-based solution or to cut it off and move on to other priorities.
The problem here is categorical dimension pushdown does some seriously stupid shit with simple metric queries. Example SQL post-pushdown:

select sum(bookings) as instant_bookings
from (
  select bookings, bookings__is_instant
  from (
    select 1 as bookings, is_instant as bookings__is_instant
    from bookings
  ) a
  where bookings__is_instant
) b
where bookings__is_instant
  

There's an easy hack we can put in place to disable it for the most obvious scenarios (mf query --metrics instant_bookings), which is to skip pushdown for queries sourced out of a single semantic model, but that won't cover slightly more complex but still fairly obvious cases (mf query -metrics instant_bookings,listings).
A complete solution, which involves moving the predicates instead of replicating them, can be done via a DataflowPlanOptimizer. Doing this requires the following:

  1. Robust test coverage of existing join+filter scenarios (in progress, must happen no matter what)
  2. Consolidation of all pre-existing time-related pushdown operations (90% finished)
  3. A new optimizer that replaces our pushdown operations with a centralized handler (long pole)

After talking to @Jordan we decided to do the following:

  1. I will spike on the optimizer approach with a tentative plan to present a prototype at Tuesday's MetricFlow team meeting. If it's still in progress I'll present what I've got and push to Thursday standups
  2. After Thursday standups we will make a call based on this experience about whether to finish up the optimizer now or add the single-semantic-model hack to the existing pushdown operation and roll it out and put the complete solution on our backlog in favor of focusing on custom calendar work

We've decided to spike on this because:

  1. The optimizer approach will handle pushdown correctly - no more weird duplicate where filters, etc.
  2. The pushdown consolidation required for the optimizer to work will encapsulate our filter granularity adjustment logic, which will help speed custom calendar development as well. Note this consolidation will happen whether we are able to do the optimizer or not so part of this week is effectively custom calendar pre-work.
  3. The optimizer approach will make it easier to expand our pushdown operations to more input types (entities, time dimensions), and, when we get around to enabling more robust filter expressions, will make it easier to be more aggressive about what types of filters we push down

If you have questions or concerns, fire away!

from metricflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.