Comments (5)
Design Study
Eliminate: --build-only
, --analyze-only
, --rebuild
, --skip
, and turnkey cache benchmark
. Then introduce the following:
The evaluation commands would be:
turnkey discover
: discover the models in a file (using any operations necessary)turnkey build
: build the models in a file (using any operations necessary)turnkey benchmark
(default): benchmark the models in a file (using any operations necessary)
The big rule here is that if you ask for something, by default it always happens regardless of the cache state. turnkey build bert.py
will always build BERT (ie, rebuild=always
) because you the user are asking for a build.
turnkey build INPUT
: always build INPUTturnkey benchmark INPUT
: build INPUT if needed, always benchmark
Ok now we just need a way to change the default policy to achieve some of our desired scenarios from the requirements.
- Typical use case: just call
discover
,build
, orbenchmark
- Bonus: benchmark a potentially stale build (current behavior of
--rebuild never
)
- Bonus: benchmark a potentially stale build (current behavior of
- Debugging use case:
- Retry a successful or failed build:
turnkey build INPUT
- Retry a successful or failed benchmark, without rebuilding:
turnkey benchmark INPUT
- Retry a model that both successfully built and successfully benchmarked (this is not one of our scenarios... just for the sake of demonstrating flexibility):
turnkey cache delete BUILD
would do itturnkey build INPUT
followed byturnkey benchmark INPUT
would also do it- Perhaps some
POLICY
would be helpful
- Retry a successful or failed build:
- Mass evaluation use case:
- Attempt to build and benchmark each model exactly once (even when resuming): needs a
POLICY
- Attempt to build each model exactly once (even when resuming): needs a
POLICY
- Attempt to benchmark each [cloud] build exactly once (even when resuming): needs a
POLICY
- Run debugging workflow on one of these models: the debugging commands above would work fine.
- Attempt to build and benchmark each model exactly once (even when resuming): needs a
Ok so based on this, our POLICY
needs to change the default behavior in two main ways:
- Alway re-attempt the specific action
- Never re-attempt any action
- Bonus: re-attempt both build and benchmark
Based on that requirement, the following seems reasonably intuitive to me (especially since it is only really relevant to mass-evaluation): a new flag named --retry
which can have the following values:
--retry nothing
: never re-attempt any action; this covers all the mass-evaluation scenarios--retry benchmark
: default behavior ofturnkey benchmark
(not available as a choice in other commands); loads build from cache if available--retry build
: default behavior ofturnkey build
; inturnkey benchmark
it rebuilds and then re-benchmarks (bonus objective achieved!)--retry no_builds
: never build or rebuild, benchmark only if the build is cached (this is a bit confusing, but AFAIK this is a super niche usecase. I would be happy deprecating this use case as well).
How it works in practice
I really like this because it feels like everything is in plain english, and the behavior is pretty obvious in all cases:
I want to... | I use the command... | Under the hood, the default value of retry is... |
---|---|---|
Discover the models in some files | turnkey disocover FILES | - |
Build the models in some files | turnkey build FILES | builds |
Benchmark the models in some files | turnkey benchmark FILES | benchmarks |
Benchmark the models in some files (with no risk of rebuilding) | turnkey benchmark FILES --retry no_builds | - |
Debug: rebuild a model that already built | turnkey build FILES | builds |
Debug: rebenchmark a model that built but failed benchmarking (without rebuilding) | turnkey benchmark FILES | benchmarks |
Debug: rebenchmark a model that already built and benchmarked (without rebuilding) | turnkey benchmark FILES | benchmarks |
Debug: rebuild and rebenchmark a model that already succeeded | turnkey benchmark FILES --retry builds | - |
Mass evaluation: attempt to build each model exactly once (even when resuming) | turnkey build FILES --retry nothing | - |
Mass evaluation: attempt to benchmark builds exactly once (even when resuming) | turnkey benchmark cache/*.tkb --retry nothing | - |
Mass evaluation: attempt to build and benchmark each model exactly once (even when resuming) | turnkey benchmark FILES --retry nothing | - |
from turnkeyml.
Agree that skip and rebuild should be combined. This proposal, however, is still not very intuitive and potentially misses a few important scenarios:
Example:
What if a model compiled, it failed to execute the first time because the HW was in a bad state, and the user simply wants to reattempt execution? Is that not possible anymore without recompiling the model?
This is a very challenging UI decision.
from turnkeyml.
We almost need something like:
turnkey build::if_needed
-> Build if we never tried or failed (our current default behavior --build-only) [default]
turnkey build::always
-> Build always (never benchmark)
turnkey build::if_not_attempted
-> Build if we never tried (never benchmark)
turnkey benchmark::if_needed
-> Benchmark if we never tried or failed benchmarking (never rebuild) [default]
turnkey benchmark::always
-> Benchmark always (never rebuild)
turnkey benchmark::if_not_attempted
-> Benchmark if we never tried benchmarking (never rebuild)
We can also allow any combination of the above
Example:
turnkey build::if_needed benchmark::if_needed
(same as turnkey build benchmark
) -> Same as our current default behavior
from turnkeyml.
@danielholanda maybe not as bad as you thought? Posting a full "truth table" below. It's a big table, but its intuitive to me what is supposed to happen in each case, and when I would use each setting.
Example: What if a model compiled, it failed to execute the first time because the HW was in a bad state, and the user simply wants to reattempt execution? Is that not possible anymore without recompiling the model?
In this specific example, the build state is S(uccessful), the benchmark state is F(ailed), and the user can apply skip=successful
to load the build from cache and retry the benchmark. This would be the default behavior, since it is what is wanted in the general case.
Legend:
- S = successful state
- F = failed state
- NA = action Not previously Attempted
Skip Policy | Build State | Benchmark State | Build Action | Benchmark Action | Same as "rebuild=X" | Useful in this common scenario |
---|---|---|---|---|---|---|
Successful (default) | S | S | Load | Load | - | Demos |
Successful (default) | S | F | Load | Run | if_needed (default) | Debugging benchmarking |
Successful (default) | F | NA | Run | Run | if_needed (default) | Debugging building |
Successful (default) | NA | NA | Run | Run | if_needed (default) | Typical use |
Successful (default) | S | NA | Load | Run | if_needed (default) | |
Failed | S | S | Run | Run | always | |
Failed | S | F | Load | Skip | - | |
Failed | F | F | Skip | Skip | never | |
Attempted | S | S | Load | Load | - | Benchmarking a cloud compile |
Attempted | S | F | Load | Skip | - | Benchmarking a cloud compile |
Attempted | F | NA | Skip | Skip | - | Benchmarking a cloud compile |
Attempted | NA | NA | Run | Run | if_needed (default) | |
Attempted | S | NA | Load | Run | if_needed (default) | Benchmarking a cloud compile |
None | X | X | Run | Run | always | Debugging |
PS. a significant flaw with the above is: there is no way to re-run a benchmark that already succeeded, if the build succeeded. AKA the current behavior of turnkey benchmark
. So granting fine grained control, as you suggested, is ultimately the right thing.
from turnkeyml.
We almost need something like:
turnkey build::if_needed
-> Build if we never tried or failed (our current default behavior --build-only) [default]turnkey build::always
-> Build always (never benchmark)turnkey build::if_not_attempted
-> Build if we never tried (never benchmark)
turnkey benchmark::if_needed
-> Benchmark if we never tried or failed benchmarking (never rebuild) [default]turnkey benchmark::always
-> Benchmark always (never rebuild)turnkey benchmark::if_not_attempted
-> Benchmark if we never tried benchmarking (never rebuild)We can also allow any combination of the above
Example:
turnkey build::if_needed benchmark::if_needed
(same asturnkey build benchmark
) -> Same as our current default behavior
This would work too, and I like that it grants more fine-grained control to the user. It's sort of related to #20 so it would be nice if we could get two birds with one stone. However, I just tried to figure out how all the assumed commands would work (e.g., "discover" is always assumed unless its explicit, "build" is assumed when "benchmark" is used) and I got quite confused with respect to the details of your turnkey benchmark
behavior above.
I'm also not super happy with the if_needed
and if_not_attempted
terms since I don't think our users have ever intuitively understood if_needed
and if_not_attempted
seems even less intuitive.
Requirements
Going back to first principles, the use cases we really want to support are:
- Typical interactive use: cache as much as possible, require minimal user input / args
- Debugging: grant the user fine grained control of what gets loaded from cache and what re-runs
- Mass benchmarking: enable "resume" behavior (do not re-attempt anything)
In typical mode, commands like turnkey discover
, turnkey build
, and turnkey benchmark
make a lot of sense to me. As in "I called turnkey benchmark
because I want a benchmark and a tool named turnkey
should just do whatever it takes to make that happen for me."
In debugging mode, something has gone wrong and I want control over everything so that I can quickly diagnose the problem. Something like turnkey build::load_cache benchmark::always
makes a lot of sense here since it grants fine grained control and is very explicit. The specific scenarios, which are not covered by "typical use" are simply:
- Retry a successful build
- Retry a successful benchmark without rebuilding
- Most other debugging scenarios, such as retrying a benchmark without rebuilding, are covered by typical use!
In mass evaluation mode, we need these behaviors:
- Attempt to build and benchmark each model exactly once. If the command crashes, resume only the models that haven't been attempted yet.
- Attempt to build each model exactly once. Same resume requirement as above.
- Attempt to benchmark each [cloud] build exactly once. Same resume requirement as above.
- Run debugging workflows on specific models (e.g., retry a failed benchmark to see if it fails deterministically).
And, finally, I never want to type discover
except in the --analyze-only
scenario.
Proposed solution
TBD...
from turnkeyml.
Related Issues (20)
- Turnkey stats evaluation ID collisions HOT 1
- Provide an option to use a fully-standardized conda environment HOT 2
- Add error messages to report.csv
- Use full-sized GPT-J
- Create an ORT base class and refactor run/onnxrt on to it
- Configurable verbosity in Analyzer Status HOT 1
- Integrate mass-benchmarking into the Files API HOT 3
- Validate `benchmark_files()` arguments
- turnkey hangs on querying OEM system information
- Input size is only saved to stats in ONNX tool flows
- Proposal: check model path in `--model_path` scripts
- Enhance conda creation methodology for onnxrt
- Reporting places results for the same model on multiple CSV lines HOT 1
- Add support for pipelines of models HOT 1
- Proposal: Store tkml CLI command and timestamp as part of stats and report
- Proposal: A way to skip LLMs
- Issue with converting to ONNX format for LLM models HOT 3
- Stage durations are not being correctly recorded
- Fix CI and enable TKML to work with torch >= 2.2.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from turnkeyml.