mlcommons / policies Goto Github PK

General policies for MLPerf™ including submission rules, coding standards, etc.

Home Page: https://mlcommons.org/en/get-involved

License: Apache License 2.0

C++ 49.18% Python 50.82%

policies's Introduction

This repo contains MLPerf™ policies, e.g., rules for submitting benchmark results for training and inference (see inference/training policies for the specific rules around inference/training).

policies's People

Contributors

Stargazers

Watchers

policies's Issues

Add results field list to doc

Currently just TODO. This includes FFs.

Update table's field names in measurements JSON file

The JSON file fields names listed in the table here are formatted incorrectly:
https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#system_desc_id_implementation_id_scenario-json-metadata

A submitter must have the field names formatted correctly for the submission checker to not throw an error. Please update the table to use the correct field names, which are:

starting_weights_filename
weight_transformations
weight_data_types
input_data_types
retraining

Inference Submission: required files in accuracy

The rules state: "accuracy.txt" is the only required file in the accuracy folder...

The script wants 4 files: ["mlperf_log_accuracy.json", "mlperf_log_summary.txt", "mlperf_log_detail.txt", "accuracy.txt"]

Where do open submission results go

@guschmue @christ1ne @tjablin
Supposing an open submission with a given SUT/benchmark/scenario/ combination (it's a SW submission, so the SUTs are the same), what directory is appropriate for results?

I suggest another level in results, so it's results/closed/system_desc_id/... and then one for open. This leaves closed results uncluttered, which seems to me useful.

Inference Submission: accuracy.txt/.json

The rules call for "accuracy/mlperf_log_accuracy.json" while the submission script calls for "accuracy/accuracy.txt"

Allowing Software Preview Submissions

The current MLCommons rules do not allow a preview submission just because a software component used is in "preview" (not available) stage. This restriction is not desirable as

The available category requires the software to be available as on the submission date. But there can be late modifications on the software which may not get sufficient time to be released even as beta

One option currently is to submit such a system under RDI category. But the rules are a bit ambiguous on Software RDI components. So, my proposal is to restrict RDI components to only hardware components and allow "Preview software" in Preview category.

Add compliance dir

Should contain compliance checker log plus Tom's checklist

Minigo real games played for each model should be logged and consider a tunnable hyperparameter

Among minigo hyperparameters, min_games_per_iteration (=8192) only restrict the minimum number of games should be played before a new training iteration can sample from.
https://github.com/mlperf/training/blob/master/reinforcement/tensorflow/minigo/ml_perf/flags/19/train_loop.flags#L9

However the reference implementation allows continuation of selfplay on the same model that creates more games until training iteration is finished. So the real number of games played for each model is an open HP [8192, +inf), which would affect the varity of samples.

Thus the real number of games played should be printed in the log to make reproduce a result possible, and should be considered a tunnable HP that fits HP stealing rule. The output in the logging could be something like:
"X games played for model 18"
"Y games played for model 19"
"Z games played for model 20"
...

Software updates for submission

Some entities (e.g., system OEMs) rely on 3rd party software (e.g., from HW/SW vendors). In some cases, the third party software release may not align exactly to benchmark submission deadlines. Do we want to allow "software" borrowing, similar to hyperparameter borrowing. This would have the benefit of setting a level playing field between vendors and their OEMs.

Where should `implementation_id` go in results/?

Where should implementation_id go in results/? I can find implementation_id under code/ and under measurements/. Where should we put the results of the corresponding implementation_id in measurements?

[URGENT] Is today (06/26) or August, first Friday to submit Training Result？

I see the submission data is August, first Friday for training in rules, but I am not sure is it correct or not update.

https://github.com/mlperf/policies/blob/master/submission_rules.adoc

Require package verifier, no more human readable summaries.

The benchmark names: {"resnet", "ssd-small", "ssd-large"} or {"resnet50", "ssd-mobilenet", "ssd-resnet34"}

In v0.5, these three benchmarks were called {"resnet", "ssd-small", "ssd-large"}. In v0.7, both the reference implementation and the mlperf.conf.

I think we should settle on one set of names since

The model name is crucial when LoadGen loads the mlperf.conf and user.conf files. If the input model name is different from what LoadGen expects, LoadGen silently ignores the errors, and that leads to invalid submission configurations.
Having two names for the same benchmark in the same version (v0.7 in this case) is confusing to the external readers.

I will create an MR to update the benchmark names for these three benchmarks to the new names, and we can discuss in the WG if this is agreed upon the WG.

Proposal for improved submission process

The current submission process is inadequate in two ways:

It allows submitters who have not yet submitted to access the submission repo and preview other submitters' results. This gives potential submitters an unfair competitive advantage if they wait until the last minute before deciding to submit. Unfairness is further compounded by the submission deadline not being friendly for all time zones. Companies in Asia typically submit the night before. Also, it is difficult for companies that coordinate with worldwide partners to synchronize submission time, putting them at a further disadvantage .
It allows companies who have no interest in submitting early access to results.

Proposal for next round:

Submitters mirror the submission repo (https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/duplicating-a-repository) and push the submission to their mirror. Before submission deadline, submitter must notify a designated non-submitting chair (Vijay/David?) of their intention to submit along with a link and access to their submission mirror.
After submission deadline, the non-submitting chairs will then merge all of the submission mirrors' contents into a new 'results_preview' mirror of the submission repo. This ensures submitters will not be able to preview other submitters' submissions.
After the 'results_preview' mirror has been populated, access is then simultaneously granted to only those submitters that have actually submitted.

@DilipSequeira @petermattson @TheKanter

Remove the requirement to submit trace.json

Generating the trace can bottleneck loadgen, The log can be gigabytes - or when compressed, hundreds of megabytes - for a single benchmark/scenario combination.

We suggest it be removed from the required results data for inference submissions.

Update TOU?

Do we want to update TOU to mention availability classes (available, preview, research)?

Update inference submission directory structure

"audit" renamed to "compliance"
also moving directory one level deeper

The benchmark names: {"resnet", "ssd-small", "ssd-large"} or {}

Contributing instructions are wrong

We need to update to reflect current instructions, CLAs, etc.

e.g., using https://mlcommons.org/en/get-involved/

submission rules did not specify the need of truncating the accuracy logs

please add the steps on when and how to use

https://github.com/mlperf/inference/blob/master/tools/submission/truncate_accuracy_log.py

Update compatibility table

https://github.com/mlcommons/policies/blob/master/MLPerf_Compatibility_Table.adoc needs to be updated for retinanet! Need to do ASAP prior to release of results.

clarify how to handle objections after publication

Please see PR based on the WG proposal.

Submission time, change back from 4pm San Jose to 1pm San Jose

add few required files to accuracy run

We want everything loadgen leaves there so we can check for errors in the accuracy run:

mlperf_log_summary.txt
mlperf_log_detail.txt
mlperf_log_accuracy.json

Updating submission rules

We should remove the 2019 and 2020 schedules and update with 2021 (or remove the section).

Allow placing user.conf under results/

At the moment, we place user.conf under measurements/ e.g. closed/Intel/measurements/clx_9282-2s_openvino-linux/ssd-small/Server. It can be argued, however, that it belongs under results e.g. closed/Intel/results/clx_9282-2s_openvino-linux/ssd-small/Server.

I don't think we should strictly require all performance and accuracy runs to happen with exactly the same LoadGen parameters. For example, the submitter may decide to do an accuracy Server run under a higher QPS than any of the corresponding performance runs to increase the throughput. (We don't measure the latency for accuracy runs, therefore latency constraints don't have to be obeyed.)

As another example, the submitter may have a bunch of VALID performance Server runs at slightly different QPS values e.g. [ 99.1, 99.2, 99.5, 99.3, 99.1 ]. In principle, they should have no problem to obtain 3 more VALID runs at QPS=99.1, but it would be just contributing to global warming. Instead, they should be allowed to submit these 5 runs and claim QPS=99.1 as achieved.

In such cases, the user.conf files may be slightly different. It's probably undecidable which one should be stored under measurements then. However, we won't be loosing any information if we allow to store them next to the LoadGen logs for each run.

Require disclosure of Accuracy when quoting Open division results

Propose replacing

When comparing Open and Closed division results any ways in which the Open result would not qualify as a Closed result must be identified.

with

When comparing Open and Closed division results any ways in which the Open result would not qualify as a Closed result must be identified. When quoting an Open division result, any reduction in accuracy vs the requirements for a Closed result must be identified even if not making a comparison vs other submissions.

Link to TERMS OF USE is broken

In submission rules, we need to update the link to
https://github.com/mlperf/policies/blob/master/TERMS%20OF%20USE.md

Section 5.11 in submission_rules is blank

https://github.com/mlperf/policies/blob/master/submission_rules.adoc#511-source-code-requirements-for-training-inspection

Shouldn't be blank. @petermattson would be good to discuss on Thursday.

Clarify dates at which software must be available for the available category

It looks like we were not clear on this, as I don't see it in the rules doc, but I believe the goal in the past was that software must be publicly available on submission day (not publication day).

For preview submission - requesting exemption to not include processor model name and core count

For Next Inference submission, Intel cannot disclose host_processor_model_name and host_processor_core_count in system description json mentioned in submission rules for a product that is not launched yet. Intel is requesting for exemption to not to include host_processor_model_name and host_processor_core_count for preview submission only.

Update review committee

Section 2 contains the review committee details, many are outdated. We should revise.

Update section 5.7 (system description json)

We should probably update this ahead of inference v0.7.

adding visualization to the results table

This is one of the suggested improvements from this doc

The benchmark names: {"resnet", "ssd-small", "ssd-large"} or {"resnet50", "ssd-mobilenet", "ssd-resnet"}

Wrong schedule in public-facing documentation

https://github.com/mlperf/policies/blob/master/submission_rules.adoc#schedule

The schedule here has not been updated for COVID.

Potential submitters are emailing me in confusion.

Sync MLPerf Submission rules

https://github.com/mlperf/policies/blob/master/submission_rules.adoc appears to apply to both inference and training, but should be updated, e.g., renumbered and relationship to inference and training clarified.

Training rules https://github.com/mlperf/training_policies/blob/master/training_rules.adoc
Inference rules https://github.com/mlperf/inference_policies/blob/master/inference_rules.adoc

Clarity regarding source code requirement in Available category

We have this software requirement for submission.
"source code must be sufficient to reproduce the results of the submission"

Here does the "results of the submission" imply

loadgen logs or
the submission tarball?

While the first one is enough to ensure a check of accuracy/performance, second one provides an easy way for anyone to replicate the results and ensures submission compliance.

It would be good to clarify the exact requirement in the rules, as the effort required on the side of the submitter is different for the two cases.

Write down rules for patching training frameworks

We have rules for submitting training results using patched frameworks. Can we write them down clearly?

Update section 5.6.2

"Inference benchmark directory names must be one of { mobilenet, ssd-small, resnet, ssd-large, gnmt }."
This needs to be updated for v0.7. Note that we are using '-99.9' and '-99' suffixes for the benchmarks where those apply.

training submission directory structure in benchmarks/

As a first time user I am profoundly confused by the structure of the submission directories. It appears to me that there are situations where there may be name clashes - in particular if a participant has run the same benchmark on the same system using different implementations.

Also I am confused about the allowed benchmark names for the training benchmarks. In particular bert is missing?

I submitted some pull request for possible changes to avoid this: 129, 130, 131

As we are in the middle of a submission period changes are probably not good to make now, but it would perhaps be a good idea to reach out to the current participants in case you think there is reason to clarify these matters?

Add Paper Citations to Repos

To help keep track of how MLPerf is being used, we should ask people to cite our work. I already created PRs for the training and inference repos.

Training paper (Please review and approve)
Inference paper

Similar PRs would also be good for the following:

You could even include the "MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance" paper on the policies repo.

allowing compliance testing results to be submitted one week after submission deadline

Reviewer assignment guarantee #250

Moving Training Policies issue #250 here:

Example language for rules:

Every submission gets a review by another submitter. If you submit, you may have to do a review as assigned by the review committee. The review committee will strive to match the amount of review work to the amount of work put into the submission. Any submitter is free to review any other submitter, in addition to any assigned reviews.

Example Algorithm: the review committee will:

Stack rank by number of submissions
Assign reviewers in pairs walking down the stack rank
If an odd number of reviewers, the bottom 3 in the stack rank will review each other.

Inference Submission: name of <system_desc_id>_<implementation_id>_<scenario>.json

The rules look for a file of: "<system_desc_id><implementation_id>.json" then submission checker looks for a file of "<system_desc_id>_<implementation_id>.json":

impl = system_file[len(system_desc) + 1:-5]

Allow placing NOTES.txt under measurements/

The results table has a Notes column. However, at the moment it is filled in manually.

For populating this column automatically, I propose allowing to create a Tweet-sized file called NOTES.txt under measurements/. Multiple files for different benchmarks and scenarios could be concatenated.

how to generate submission_checker_log.txt for training v1.0 submission

From information as below.

It says:
"The last two lines of submission_checker_log.txt as cursory evidence of a valid submission"

How did we generate this file?

mlcommons / policies Goto Github PK

policies's Introduction

policies's People

Contributors

Stargazers

Watchers

Forkers

policies's Issues

Recommend Projects

Recommend Topics

Recommend Org