mlcommons / policies Goto Github PK
View Code? Open in Web Editor NEWGeneral policies for MLPerf™ including submission rules, coding standards, etc.
Home Page: https://mlcommons.org/en/get-involved
License: Apache License 2.0
General policies for MLPerf™ including submission rules, coding standards, etc.
Home Page: https://mlcommons.org/en/get-involved
License: Apache License 2.0
The results table has a Notes column. However, at the moment it is filled in manually.
For populating this column automatically, I propose allowing to create a Tweet-sized file called NOTES.txt
under measurements/
. Multiple files for different benchmarks and scenarios could be concatenated.
Generating the trace can bottleneck loadgen, The log can be gigabytes - or when compressed, hundreds of megabytes - for a single benchmark/scenario combination.
We suggest it be removed from the required results data for inference submissions.
The rules appear want to the submitter's name as the top level directory. The checking script wants "open", "closed", ... as the top level.
"Inference benchmark directory names must be one of { mobilenet, ssd-small, resnet, ssd-large, gnmt }."
This needs to be updated for v0.7. Note that we are using '-99.9' and '-99' suffixes for the benchmarks where those apply.
For Next Inference submission, Intel cannot disclose host_processor_model_name and host_processor_core_count in system description json mentioned in submission rules for a product that is not launched yet. Intel is requesting for exemption to not to include host_processor_model_name and host_processor_core_count for preview submission only.
We want everything loadgen leaves there so we can check for errors in the accuracy run:
mlperf_log_summary.txt
mlperf_log_detail.txt
mlperf_log_accuracy.json
This is one of the suggested improvements from this doc
Among minigo hyperparameters, min_games_per_iteration (=8192) only restrict the minimum number of games should be played before a new training iteration can sample from.
https://github.com/mlperf/training/blob/master/reinforcement/tensorflow/minigo/ml_perf/flags/19/train_loop.flags#L9
However the reference implementation allows continuation of selfplay on the same model that creates more games until training iteration is finished. So the real number of games played for each model is an open HP [8192, +inf), which would affect the varity of samples.
Thus the real number of games played should be printed in the log to make reproduce a result possible, and should be considered a tunnable HP that fits HP stealing rule. The output in the logging could be something like:
"X games played for model 18"
"Y games played for model 19"
"Z games played for model 20"
...
The current submission process is inadequate in two ways:
Proposal for next round:
Shouldn't be blank. @petermattson would be good to discuss on Thursday.
Please see PR based on the WG proposal.
We have this software requirement for submission.
"source code must be sufficient to reproduce the results of the submission"
Here does the "results of the submission" imply
While the first one is enough to ensure a check of accuracy/performance, second one provides an easy way for anyone to replicate the results and ensures submission compliance.
It would be good to clarify the exact requirement in the rules, as the effort required on the side of the submitter is different for the two cases.
In submission rules, we need to update the link to
https://github.com/mlperf/policies/blob/master/TERMS%20OF%20USE.md
Moving Training Policies issue #250 here:
Example language for rules:
Every submission gets a review by another submitter. If you submit, you may have to do a review as assigned by the review committee. The review committee will strive to match the amount of review work to the amount of work put into the submission. Any submitter is free to review any other submitter, in addition to any assigned reviews.
Example Algorithm: the review committee will:
Stack rank by number of submissions
Assign reviewers in pairs walking down the stack rank
If an odd number of reviewers, the bottom 3 in the stack rank will review each other.
We need to update to reflect current instructions, CLAs, etc.
e.g., using https://mlcommons.org/en/get-involved/
Propose replacing
When comparing Open and Closed division results any ways in which the Open result would not qualify as a Closed result must be identified.
with
When comparing Open and Closed division results any ways in which the Open result would not qualify as a Closed result must be identified. When quoting an Open division result, any reduction in accuracy vs the requirements for a Closed result must be identified even if not making a comparison vs other submissions.
We should probably update this ahead of inference v0.7.
We should remove the 2019 and 2020 schedules and update with 2021 (or remove the section).
The current MLCommons rules do not allow a preview submission just because a software component used is in "preview" (not available) stage. This restriction is not desirable as
One option currently is to submit such a system under RDI category. But the rules are a bit ambiguous on Software RDI components
. So, my proposal is to restrict RDI components to only hardware components
and allow "Preview software" in Preview category.
@guschmue @christ1ne @tjablin
Supposing an open submission with a given SUT/benchmark/scenario/ combination (it's a SW submission, so the SUTs are the same), what directory is appropriate for results?
I suggest another level in results, so it's results/closed/system_desc_id/... and then one for open. This leaves closed results uncluttered, which seems to me useful.
MLPerf's public-facing material (i.e. press releases, slide decks for press etc) should be available for review by submitters as part of the submission schedule. Let's add this for future rounds.
https://github.com/mlcommons/policies/blob/master/MLPerf_Compatibility_Table.adoc needs to be updated for retinanet! Need to do ASAP prior to release of results.
As a first time user I am profoundly confused by the structure of the submission directories. It appears to me that there are situations where there may be name clashes - in particular if a participant has run the same benchmark on the same system using different implementations.
Also I am confused about the allowed benchmark names for the training benchmarks. In particular bert is missing?
I submitted some pull request for possible changes to avoid this: 129, 130, 131
As we are in the middle of a submission period changes are probably not good to make now, but it would perhaps be a good idea to reach out to the current participants in case you think there is reason to clarify these matters?
Some entities (e.g., system OEMs) rely on 3rd party software (e.g., from HW/SW vendors). In some cases, the third party software release may not align exactly to benchmark submission deadlines. Do we want to allow "software" borrowing, similar to hyperparameter borrowing. This would have the benefit of setting a level playing field between vendors and their OEMs.
Should contain compliance checker log plus Tom's checklist
https://github.com/mlperf/policies/blob/master/submission_rules.adoc appears to apply to both inference and training, but should be updated, e.g., renumbered and relationship to inference and training clarified.
Training rules https://github.com/mlperf/training_policies/blob/master/training_rules.adoc
Inference rules https://github.com/mlperf/inference_policies/blob/master/inference_rules.adoc
We have rules for submitting training results using patched frameworks. Can we write them down clearly?
https://github.com/mlperf/policies/blob/master/submission_rules.adoc#schedule
The schedule here has not been updated for COVID.
Potential submitters are emailing me in confusion.
Section 2 contains the review committee details, many are outdated. We should revise.
It looks like we were not clear on this, as I don't see it in the rules doc, but I believe the goal in the past was that software must be publicly available on submission day (not publication day).
I see the submission data is August, first Friday for training in rules, but I am not sure is it correct or not update.
https://github.com/mlperf/policies/blob/master/submission_rules.adoc
At the moment, we place user.conf
under measurements/
e.g. closed/Intel/measurements/clx_9282-2s_openvino-linux/ssd-small/Server
. It can be argued, however, that it belongs under results
e.g. closed/Intel/results/clx_9282-2s_openvino-linux/ssd-small/Server
.
I don't think we should strictly require all performance and accuracy runs to happen with exactly the same LoadGen parameters. For example, the submitter may decide to do an accuracy Server run under a higher QPS than any of the corresponding performance runs to increase the throughput. (We don't measure the latency for accuracy runs, therefore latency constraints don't have to be obeyed.)
As another example, the submitter may have a bunch of VALID performance Server runs at slightly different QPS values e.g. [ 99.1, 99.2, 99.5, 99.3, 99.1 ]. In principle, they should have no problem to obtain 3 more VALID runs at QPS=99.1, but it would be just contributing to global warming. Instead, they should be allowed to submit these 5 runs and claim QPS=99.1 as achieved.
In such cases, the user.conf
files may be slightly different. It's probably undecidable which one should be stored under measurements
then. However, we won't be loosing any information if we allow to store them next to the LoadGen logs for each run.
Currently just TODO. This includes FFs.
To help keep track of how MLPerf is being used, we should ask people to cite our work. I already created PRs for the training and inference repos.
Similar PRs would also be good for the following:
You could even include the "MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance" paper on the policies repo.
Do we want to update TOU to mention availability classes (available, preview, research)?
The rules state: "accuracy.txt" is the only required file in the accuracy folder...
The script wants 4 files: ["mlperf_log_accuracy.json", "mlperf_log_summary.txt", "mlperf_log_detail.txt", "accuracy.txt"]
The rules call for "accuracy/mlperf_log_accuracy.json" while the submission script calls for "accuracy/accuracy.txt"
The rules look for a file of: "<system_desc_id><implementation_id>.json" then submission checker looks for a file of "<system_desc_id>_<implementation_id>.json":
impl = system_file[len(system_desc) + 1:-5]
"audit" renamed to "compliance"
also moving directory one level deeper
Where should implementation_id
go in results/? I can find implementation_id
under code/ and under measurements/. Where should we put the results of the corresponding implementation_id
in measurements?
The benchmark names: {"resnet", "ssd-small", "ssd-large"} or {"resnet50", "ssd-mobilenet", "ssd-resnet34"}
In v0.5, these three benchmarks were called {"resnet", "ssd-small", "ssd-large"}. In v0.7, both the reference implementation and the mlperf.conf.
I think we should settle on one set of names since
I will create an MR to update the benchmark names for these three benchmarks to the new names, and we can discuss in the WG if this is agreed upon the WG.
please add the steps on when and how to use
https://github.com/mlperf/inference/blob/master/tools/submission/truncate_accuracy_log.py
The JSON file fields names listed in the table here are formatted incorrectly:
https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#system_desc_id_implementation_id_scenario-json-metadata
A submitter must have the field names formatted correctly for the submission checker to not throw an error. Please update the table to use the correct field names, which are:
starting_weights_filename
weight_transformations
weight_data_types
input_data_types
retraining
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.