Problem statement: Turnkey currently has a <code class="notran

Design Study Eliminate: --build-only</co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

We almost need something like: <code class="notranslate

Change `--rebuild` to `--skip` about turnkeyml HOT 5 OPEN

jeremyfowers commented on June 21, 2024

Change `--rebuild` to `--skip`

from turnkeyml.

Comments (5)

jeremyfowers commented on June 21, 2024 1

Design Study

Eliminate: --build-only, --analyze-only, --rebuild, --skip, and turnkey cache benchmark. Then introduce the following:

The evaluation commands would be:

turnkey discover: discover the models in a file (using any operations necessary)
turnkey build: build the models in a file (using any operations necessary)
turnkey benchmark (default): benchmark the models in a file (using any operations necessary)

The big rule here is that if you ask for something, by default it always happens regardless of the cache state. turnkey build bert.py will always build BERT (ie, rebuild=always) because you the user are asking for a build.

turnkey build INPUT: always build INPUT
turnkey benchmark INPUT: build INPUT if needed, always benchmark

Ok now we just need a way to change the default policy to achieve some of our desired scenarios from the requirements.

Typical use case: just call discover, build, or benchmark
- Bonus: benchmark a potentially stale build (current behavior of --rebuild never)
Debugging use case:
- Retry a successful or failed build: turnkey build INPUT
- Retry a successful or failed benchmark, without rebuilding: turnkey benchmark INPUT
- Retry a model that both successfully built and successfully benchmarked (this is not one of our scenarios... just for the sake of demonstrating flexibility):
  - turnkey cache delete BUILD would do it
  - turnkey build INPUT followed by turnkey benchmark INPUT would also do it
  - Perhaps some POLICY would be helpful
Mass evaluation use case:
- Attempt to build and benchmark each model exactly once (even when resuming): needs a POLICY
- Attempt to build each model exactly once (even when resuming): needs a POLICY
- Attempt to benchmark each [cloud] build exactly once (even when resuming): needs a POLICY
- Run debugging workflow on one of these models: the debugging commands above would work fine.

Ok so based on this, our POLICY needs to change the default behavior in two main ways:

Alway re-attempt the specific action
Never re-attempt any action
Bonus: re-attempt both build and benchmark

Based on that requirement, the following seems reasonably intuitive to me (especially since it is only really relevant to mass-evaluation): a new flag named --retry which can have the following values:

--retry nothing: never re-attempt any action; this covers all the mass-evaluation scenarios
--retry benchmark: default behavior of turnkey benchmark (not available as a choice in other commands); loads build from cache if available
--retry build: default behavior of turnkey build; in turnkey benchmark it rebuilds and then re-benchmarks (bonus objective achieved!)
--retry no_builds: never build or rebuild, benchmark only if the build is cached (this is a bit confusing, but AFAIK this is a super niche usecase. I would be happy deprecating this use case as well).

How it works in practice

I really like this because it feels like everything is in plain english, and the behavior is pretty obvious in all cases:

I want to...	I use the command...	Under the hood, the default value of retry is...
Discover the models in some files	turnkey disocover FILES	-
Build the models in some files	turnkey build FILES	builds
Benchmark the models in some files	turnkey benchmark FILES	benchmarks
Benchmark the models in some files (with no risk of rebuilding)	turnkey benchmark FILES --retry no_builds	-
Debug: rebuild a model that already built	turnkey build FILES	builds
Debug: rebenchmark a model that built but failed benchmarking (without rebuilding)	turnkey benchmark FILES	benchmarks
Debug: rebenchmark a model that already built and benchmarked (without rebuilding)	turnkey benchmark FILES	benchmarks
Debug: rebuild and rebenchmark a model that already succeeded	turnkey benchmark FILES --retry builds	-
Mass evaluation: attempt to build each model exactly once (even when resuming)	turnkey build FILES --retry nothing	-
Mass evaluation: attempt to benchmark builds exactly once (even when resuming)	turnkey benchmark cache/*.tkb --retry nothing	-
Mass evaluation: attempt to build and benchmark each model exactly once (even when resuming)	turnkey benchmark FILES --retry nothing	-

from turnkeyml.

danielholanda commented on June 21, 2024

Agree that skip and rebuild should be combined. This proposal, however, is still not very intuitive and potentially misses a few important scenarios:

Example:
What if a model compiled, it failed to execute the first time because the HW was in a bad state, and the user simply wants to reattempt execution? Is that not possible anymore without recompiling the model?

This is a very challenging UI decision.

from turnkeyml.

danielholanda commented on June 21, 2024

We almost need something like:

turnkey build::if_needed -> Build if we never tried or failed (our current default behavior --build-only) [default]
turnkey build::always -> Build always (never benchmark)
turnkey build::if_not_attempted -> Build if we never tried (never benchmark)

turnkey benchmark::if_needed -> Benchmark if we never tried or failed benchmarking (never rebuild) [default]
turnkey benchmark::always -> Benchmark always (never rebuild)
turnkey benchmark::if_not_attempted -> Benchmark if we never tried benchmarking (never rebuild)

We can also allow any combination of the above

Example:
turnkey build::if_needed benchmark::if_needed (same as turnkey build benchmark) -> Same as our current default behavior

from turnkeyml.

jeremyfowers commented on June 21, 2024

@danielholanda maybe not as bad as you thought? Posting a full "truth table" below. It's a big table, but its intuitive to me what is supposed to happen in each case, and when I would use each setting.

Example: What if a model compiled, it failed to execute the first time because the HW was in a bad state, and the user simply wants to reattempt execution? Is that not possible anymore without recompiling the model?

In this specific example, the build state is S(uccessful), the benchmark state is F(ailed), and the user can apply skip=successful to load the build from cache and retry the benchmark. This would be the default behavior, since it is what is wanted in the general case.

Legend:

S = successful state
F = failed state
NA = action Not previously Attempted

Skip Policy	Build State	Benchmark State	Build Action	Benchmark Action	Same as "rebuild=X"	Useful in this common scenario
Successful (default)	S	S	Load	Load	-	Demos
Successful (default)	S	F	Load	Run	if_needed (default)	Debugging benchmarking
Successful (default)	F	NA	Run	Run	if_needed (default)	Debugging building
Successful (default)	NA	NA	Run	Run	if_needed (default)	Typical use
Successful (default)	S	NA	Load	Run	if_needed (default)
Failed	S	S	Run	Run	always
Failed	S	F	Load	Skip	-
Failed	F	F	Skip	Skip	never
Attempted	S	S	Load	Load	-	Benchmarking a cloud compile
Attempted	S	F	Load	Skip	-	Benchmarking a cloud compile
Attempted	F	NA	Skip	Skip	-	Benchmarking a cloud compile
Attempted	NA	NA	Run	Run	if_needed (default)
Attempted	S	NA	Load	Run	if_needed (default)	Benchmarking a cloud compile
None	X	X	Run	Run	always	Debugging

PS. a significant flaw with the above is: there is no way to re-run a benchmark that already succeeded, if the build succeeded. AKA the current behavior of turnkey benchmark. So granting fine grained control, as you suggested, is ultimately the right thing.

from turnkeyml.

jeremyfowers commented on June 21, 2024

We almost need something like:

turnkey build::if_needed -> Build if we never tried or failed (our current default behavior --build-only) [default] turnkey build::always -> Build always (never benchmark) turnkey build::if_not_attempted -> Build if we never tried (never benchmark)

turnkey benchmark::if_needed -> Benchmark if we never tried or failed benchmarking (never rebuild) [default] turnkey benchmark::always -> Benchmark always (never rebuild) turnkey benchmark::if_not_attempted -> Benchmark if we never tried benchmarking (never rebuild)

We can also allow any combination of the above

Example: turnkey build::if_needed benchmark::if_needed (same as turnkey build benchmark) -> Same as our current default behavior

This would work too, and I like that it grants more fine-grained control to the user. It's sort of related to #20 so it would be nice if we could get two birds with one stone. However, I just tried to figure out how all the assumed commands would work (e.g., "discover" is always assumed unless its explicit, "build" is assumed when "benchmark" is used) and I got quite confused with respect to the details of your turnkey benchmark behavior above.

I'm also not super happy with the if_needed and if_not_attempted terms since I don't think our users have ever intuitively understood if_needed and if_not_attempted seems even less intuitive.

Requirements

Going back to first principles, the use cases we really want to support are:

Typical interactive use: cache as much as possible, require minimal user input / args
Debugging: grant the user fine grained control of what gets loaded from cache and what re-runs
Mass benchmarking: enable "resume" behavior (do not re-attempt anything)

In typical mode, commands like turnkey discover, turnkey build, and turnkey benchmark make a lot of sense to me. As in "I called turnkey benchmark because I want a benchmark and a tool named turnkey should just do whatever it takes to make that happen for me."

In debugging mode, something has gone wrong and I want control over everything so that I can quickly diagnose the problem. Something like turnkey build::load_cache benchmark::always makes a lot of sense here since it grants fine grained control and is very explicit. The specific scenarios, which are not covered by "typical use" are simply:

Retry a successful build
Retry a successful benchmark without rebuilding
Most other debugging scenarios, such as retrying a benchmark without rebuilding, are covered by typical use!

In mass evaluation mode, we need these behaviors:

Attempt to build and benchmark each model exactly once. If the command crashes, resume only the models that haven't been attempted yet.
Attempt to build each model exactly once. Same resume requirement as above.
Attempt to benchmark each [cloud] build exactly once. Same resume requirement as above.
Run debugging workflows on specific models (e.g., retry a failed benchmark to see if it fails deterministically).

And, finally, I never want to type discover except in the --analyze-only scenario.

Proposed solution

TBD...

from turnkeyml.

Change `--rebuild` to `--skip` about turnkeyml HOT 5 OPEN

Comments (5)

Design Study

How it works in practice

Requirements

Proposed solution

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent