Giter Club home page Giter Club logo

Comments (5)

jeremyfowers avatar jeremyfowers commented on June 21, 2024 1

Design Study

Eliminate: --build-only, --analyze-only, --rebuild, --skip, and turnkey cache benchmark. Then introduce the following:

The evaluation commands would be:

  • turnkey discover: discover the models in a file (using any operations necessary)
  • turnkey build: build the models in a file (using any operations necessary)
  • turnkey benchmark (default): benchmark the models in a file (using any operations necessary)

The big rule here is that if you ask for something, by default it always happens regardless of the cache state. turnkey build bert.py will always build BERT (ie, rebuild=always) because you the user are asking for a build.

  • turnkey build INPUT: always build INPUT
  • turnkey benchmark INPUT: build INPUT if needed, always benchmark

Ok now we just need a way to change the default policy to achieve some of our desired scenarios from the requirements.

  • Typical use case: just call discover, build, or benchmark
    • Bonus: benchmark a potentially stale build (current behavior of --rebuild never)
  • Debugging use case:
    • Retry a successful or failed build: turnkey build INPUT
    • Retry a successful or failed benchmark, without rebuilding: turnkey benchmark INPUT
    • Retry a model that both successfully built and successfully benchmarked (this is not one of our scenarios... just for the sake of demonstrating flexibility):
      • turnkey cache delete BUILD would do it
      • turnkey build INPUT followed by turnkey benchmark INPUT would also do it
      • Perhaps some POLICY would be helpful
  • Mass evaluation use case:
    • Attempt to build and benchmark each model exactly once (even when resuming): needs a POLICY
    • Attempt to build each model exactly once (even when resuming): needs a POLICY
    • Attempt to benchmark each [cloud] build exactly once (even when resuming): needs a POLICY
    • Run debugging workflow on one of these models: the debugging commands above would work fine.

Ok so based on this, our POLICY needs to change the default behavior in two main ways:

  • Alway re-attempt the specific action
  • Never re-attempt any action
  • Bonus: re-attempt both build and benchmark

Based on that requirement, the following seems reasonably intuitive to me (especially since it is only really relevant to mass-evaluation): a new flag named --retry which can have the following values:

  • --retry nothing: never re-attempt any action; this covers all the mass-evaluation scenarios
  • --retry benchmark: default behavior of turnkey benchmark (not available as a choice in other commands); loads build from cache if available
  • --retry build: default behavior of turnkey build; in turnkey benchmark it rebuilds and then re-benchmarks (bonus objective achieved!)
  • --retry no_builds: never build or rebuild, benchmark only if the build is cached (this is a bit confusing, but AFAIK this is a super niche usecase. I would be happy deprecating this use case as well).

How it works in practice

I really like this because it feels like everything is in plain english, and the behavior is pretty obvious in all cases:

I want to... I use the command... Under the hood, the default value of retry is...
Discover the models in some files turnkey disocover FILES -
Build the models in some files turnkey build FILES builds
Benchmark the models in some files turnkey benchmark FILES benchmarks
Benchmark the models in some files (with no risk of rebuilding) turnkey benchmark FILES --retry no_builds -
Debug: rebuild a model that already built turnkey build FILES builds
Debug: rebenchmark a model that built but failed benchmarking (without rebuilding) turnkey benchmark FILES benchmarks
Debug: rebenchmark a model that already built and benchmarked (without rebuilding) turnkey benchmark FILES benchmarks
Debug: rebuild and rebenchmark a model that already succeeded turnkey benchmark FILES --retry builds -
Mass evaluation: attempt to build each model exactly once (even when resuming) turnkey build FILES --retry nothing -
Mass evaluation: attempt to benchmark builds exactly once (even when resuming) turnkey benchmark cache/*.tkb --retry nothing -
Mass evaluation: attempt to build and benchmark each model exactly once (even when resuming) turnkey benchmark FILES --retry nothing -

from turnkeyml.

danielholanda avatar danielholanda commented on June 21, 2024

Agree that skip and rebuild should be combined. This proposal, however, is still not very intuitive and potentially misses a few important scenarios:

Example:
What if a model compiled, it failed to execute the first time because the HW was in a bad state, and the user simply wants to reattempt execution? Is that not possible anymore without recompiling the model?

This is a very challenging UI decision.

from turnkeyml.

danielholanda avatar danielholanda commented on June 21, 2024

We almost need something like:

turnkey build::if_needed -> Build if we never tried or failed (our current default behavior --build-only) [default]
turnkey build::always -> Build always (never benchmark)
turnkey build::if_not_attempted -> Build if we never tried (never benchmark)

turnkey benchmark::if_needed -> Benchmark if we never tried or failed benchmarking (never rebuild) [default]
turnkey benchmark::always -> Benchmark always (never rebuild)
turnkey benchmark::if_not_attempted -> Benchmark if we never tried benchmarking (never rebuild)

We can also allow any combination of the above

Example:
turnkey build::if_needed benchmark::if_needed (same as turnkey build benchmark) -> Same as our current default behavior

from turnkeyml.

jeremyfowers avatar jeremyfowers commented on June 21, 2024

@danielholanda maybe not as bad as you thought? Posting a full "truth table" below. It's a big table, but its intuitive to me what is supposed to happen in each case, and when I would use each setting.

Example: What if a model compiled, it failed to execute the first time because the HW was in a bad state, and the user simply wants to reattempt execution? Is that not possible anymore without recompiling the model?

In this specific example, the build state is S(uccessful), the benchmark state is F(ailed), and the user can apply skip=successful to load the build from cache and retry the benchmark. This would be the default behavior, since it is what is wanted in the general case.

Legend:

  • S = successful state
  • F = failed state
  • NA = action Not previously Attempted
Skip Policy Build State Benchmark State Build Action Benchmark Action Same as "rebuild=X" Useful in this common scenario
Successful (default) S S Load Load - Demos
Successful (default) S F Load Run if_needed (default) Debugging benchmarking
Successful (default) F NA Run Run if_needed (default) Debugging building
Successful (default) NA NA Run Run if_needed (default) Typical use
Successful (default) S NA Load Run if_needed (default)
Failed S S Run Run always
Failed S F Load Skip -
Failed F F Skip Skip never
Attempted S S Load Load - Benchmarking a cloud compile
Attempted S F Load Skip - Benchmarking a cloud compile
Attempted F NA Skip Skip - Benchmarking a cloud compile
Attempted NA NA Run Run if_needed (default)
Attempted S NA Load Run if_needed (default) Benchmarking a cloud compile
None X X Run Run always Debugging

PS. a significant flaw with the above is: there is no way to re-run a benchmark that already succeeded, if the build succeeded. AKA the current behavior of turnkey benchmark. So granting fine grained control, as you suggested, is ultimately the right thing.

from turnkeyml.

jeremyfowers avatar jeremyfowers commented on June 21, 2024

We almost need something like:

turnkey build::if_needed -> Build if we never tried or failed (our current default behavior --build-only) [default] turnkey build::always -> Build always (never benchmark) turnkey build::if_not_attempted -> Build if we never tried (never benchmark)

turnkey benchmark::if_needed -> Benchmark if we never tried or failed benchmarking (never rebuild) [default] turnkey benchmark::always -> Benchmark always (never rebuild) turnkey benchmark::if_not_attempted -> Benchmark if we never tried benchmarking (never rebuild)

We can also allow any combination of the above

Example: turnkey build::if_needed benchmark::if_needed (same as turnkey build benchmark) -> Same as our current default behavior

This would work too, and I like that it grants more fine-grained control to the user. It's sort of related to #20 so it would be nice if we could get two birds with one stone. However, I just tried to figure out how all the assumed commands would work (e.g., "discover" is always assumed unless its explicit, "build" is assumed when "benchmark" is used) and I got quite confused with respect to the details of your turnkey benchmark behavior above.

I'm also not super happy with the if_needed and if_not_attempted terms since I don't think our users have ever intuitively understood if_needed and if_not_attempted seems even less intuitive.

Requirements

Going back to first principles, the use cases we really want to support are:

  • Typical interactive use: cache as much as possible, require minimal user input / args
  • Debugging: grant the user fine grained control of what gets loaded from cache and what re-runs
  • Mass benchmarking: enable "resume" behavior (do not re-attempt anything)

In typical mode, commands like turnkey discover, turnkey build, and turnkey benchmark make a lot of sense to me. As in "I called turnkey benchmark because I want a benchmark and a tool named turnkey should just do whatever it takes to make that happen for me."

In debugging mode, something has gone wrong and I want control over everything so that I can quickly diagnose the problem. Something like turnkey build::load_cache benchmark::always makes a lot of sense here since it grants fine grained control and is very explicit. The specific scenarios, which are not covered by "typical use" are simply:

  • Retry a successful build
  • Retry a successful benchmark without rebuilding
  • Most other debugging scenarios, such as retrying a benchmark without rebuilding, are covered by typical use!

In mass evaluation mode, we need these behaviors:

  • Attempt to build and benchmark each model exactly once. If the command crashes, resume only the models that haven't been attempted yet.
  • Attempt to build each model exactly once. Same resume requirement as above.
  • Attempt to benchmark each [cloud] build exactly once. Same resume requirement as above.
  • Run debugging workflows on specific models (e.g., retry a failed benchmark to see if it fails deterministically).

And, finally, I never want to type discover except in the --analyze-only scenario.

Proposed solution

TBD...

from turnkeyml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.