Giter Club home page Giter Club logo

policies's Introduction

This repo contains MLPerf™ policies, e.g., rules for submitting benchmark results for training and inference (see inference/training policies for the specific rules around inference/training).

policies's People

Contributors

anirban-ghosh avatar bitfort avatar christ1ne avatar dilipsequeira avatar erichan1 avatar georgelyuan avatar guschmue avatar hiwotadese avatar johntran-nv avatar liorkhe avatar matthew-frank avatar morphine00 avatar mrmhodak avatar nathanw-mlc avatar nv-rborkar avatar nvpaulius avatar nvpohanh avatar peladodigital avatar petermattson avatar pgmpablo157321 avatar pmattson avatar profvjreddi avatar rnaidu02 avatar s-idgunji avatar sgpyc avatar shriyapalsamudram avatar sparticlesteve avatar thekanter avatar tjablin avatar willc2010 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

policies's Issues

Update table's field names in measurements JSON file

The JSON file fields names listed in the table here are formatted incorrectly:
https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#system_desc_id_implementation_id_scenario-json-metadata

A submitter must have the field names formatted correctly for the submission checker to not throw an error. Please update the table to use the correct field names, which are:

starting_weights_filename
weight_transformations
weight_data_types
input_data_types
retraining

Inference Submission: required files in accuracy

The rules state: "accuracy.txt" is the only required file in the accuracy folder...

The script wants 4 files: ["mlperf_log_accuracy.json", "mlperf_log_summary.txt", "mlperf_log_detail.txt", "accuracy.txt"]

Where do open submission results go

@guschmue @christ1ne @tjablin
Supposing an open submission with a given SUT/benchmark/scenario/ combination (it's a SW submission, so the SUTs are the same), what directory is appropriate for results?

I suggest another level in results, so it's results/closed/system_desc_id/... and then one for open. This leaves closed results uncluttered, which seems to me useful.

Allowing Software Preview Submissions

The current MLCommons rules do not allow a preview submission just because a software component used is in "preview" (not available) stage. This restriction is not desirable as

  • The available category requires the software to be available as on the submission date. But there can be late modifications on the software which may not get sufficient time to be released even as beta

One option currently is to submit such a system under RDI category. But the rules are a bit ambiguous on Software RDI components. So, my proposal is to restrict RDI components to only hardware components and allow "Preview software" in Preview category.

Minigo real games played for each model should be logged and consider a tunnable hyperparameter

Among minigo hyperparameters, min_games_per_iteration (=8192) only restrict the minimum number of games should be played before a new training iteration can sample from.
https://github.com/mlperf/training/blob/master/reinforcement/tensorflow/minigo/ml_perf/flags/19/train_loop.flags#L9

However the reference implementation allows continuation of selfplay on the same model that creates more games until training iteration is finished. So the real number of games played for each model is an open HP [8192, +inf), which would affect the varity of samples.

Thus the real number of games played should be printed in the log to make reproduce a result possible, and should be considered a tunnable HP that fits HP stealing rule. The output in the logging could be something like:
"X games played for model 18"
"Y games played for model 19"
"Z games played for model 20"
...

Software updates for submission

Some entities (e.g., system OEMs) rely on 3rd party software (e.g., from HW/SW vendors). In some cases, the third party software release may not align exactly to benchmark submission deadlines. Do we want to allow "software" borrowing, similar to hyperparameter borrowing. This would have the benefit of setting a level playing field between vendors and their OEMs.

Where should `implementation_id` go in results/?

Where should implementation_id go in results/? I can find implementation_id under code/ and under measurements/. Where should we put the results of the corresponding implementation_id in measurements?

The benchmark names: {"resnet", "ssd-small", "ssd-large"} or {"resnet50", "ssd-mobilenet", "ssd-resnet34"}

The benchmark names: {"resnet", "ssd-small", "ssd-large"} or {"resnet50", "ssd-mobilenet", "ssd-resnet34"}

In v0.5, these three benchmarks were called {"resnet", "ssd-small", "ssd-large"}. In v0.7, both the reference implementation and the mlperf.conf.

I think we should settle on one set of names since

  1. The model name is crucial when LoadGen loads the mlperf.conf and user.conf files. If the input model name is different from what LoadGen expects, LoadGen silently ignores the errors, and that leads to invalid submission configurations.
  2. Having two names for the same benchmark in the same version (v0.7 in this case) is confusing to the external readers.

I will create an MR to update the benchmark names for these three benchmarks to the new names, and we can discuss in the WG if this is agreed upon the WG.

Proposal for improved submission process

The current submission process is inadequate in two ways:

  1. It allows submitters who have not yet submitted to access the submission repo and preview other submitters' results. This gives potential submitters an unfair competitive advantage if they wait until the last minute before deciding to submit. Unfairness is further compounded by the submission deadline not being friendly for all time zones. Companies in Asia typically submit the night before. Also, it is difficult for companies that coordinate with worldwide partners to synchronize submission time, putting them at a further disadvantage .
  2. It allows companies who have no interest in submitting early access to results.

Proposal for next round:

  • Submitters mirror the submission repo (https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/duplicating-a-repository) and push the submission to their mirror. Before submission deadline, submitter must notify a designated non-submitting chair (Vijay/David?) of their intention to submit along with a link and access to their submission mirror.
  • After submission deadline, the non-submitting chairs will then merge all of the submission mirrors' contents into a new 'results_preview' mirror of the submission repo. This ensures submitters will not be able to preview other submitters' submissions.
  • After the 'results_preview' mirror has been populated, access is then simultaneously granted to only those submitters that have actually submitted.

@DilipSequeira @petermattson @TheKanter

Remove the requirement to submit trace.json

Generating the trace can bottleneck loadgen, The log can be gigabytes - or when compressed, hundreds of megabytes - for a single benchmark/scenario combination.

We suggest it be removed from the required results data for inference submissions.

Update TOU?

Do we want to update TOU to mention availability classes (available, preview, research)?

add few required files to accuracy run

We want everything loadgen leaves there so we can check for errors in the accuracy run:

mlperf_log_summary.txt
mlperf_log_detail.txt
mlperf_log_accuracy.json

Allow placing user.conf under results/

At the moment, we place user.conf under measurements/ e.g. closed/Intel/measurements/clx_9282-2s_openvino-linux/ssd-small/Server. It can be argued, however, that it belongs under results e.g. closed/Intel/results/clx_9282-2s_openvino-linux/ssd-small/Server.

I don't think we should strictly require all performance and accuracy runs to happen with exactly the same LoadGen parameters. For example, the submitter may decide to do an accuracy Server run under a higher QPS than any of the corresponding performance runs to increase the throughput. (We don't measure the latency for accuracy runs, therefore latency constraints don't have to be obeyed.)

As another example, the submitter may have a bunch of VALID performance Server runs at slightly different QPS values e.g. [ 99.1, 99.2, 99.5, 99.3, 99.1 ]. In principle, they should have no problem to obtain 3 more VALID runs at QPS=99.1, but it would be just contributing to global warming. Instead, they should be allowed to submit these 5 runs and claim QPS=99.1 as achieved.

In such cases, the user.conf files may be slightly different. It's probably undecidable which one should be stored under measurements then. However, we won't be loosing any information if we allow to store them next to the LoadGen logs for each run.

Require disclosure of Accuracy when quoting Open division results

Propose replacing

When comparing Open and Closed division results any ways in which the Open result would not qualify as a Closed result must be identified.

with

When comparing Open and Closed division results any ways in which the Open result would not qualify as a Closed result must be identified. When quoting an Open division result, any reduction in accuracy vs the requirements for a Closed result must be identified even if not making a comparison vs other submissions.

Clarity regarding source code requirement in Available category

We have this software requirement for submission.
"source code must be sufficient to reproduce the results of the submission"

Here does the "results of the submission" imply

  1. loadgen logs or
  2. the submission tarball?

While the first one is enough to ensure a check of accuracy/performance, second one provides an easy way for anyone to replicate the results and ensures submission compliance.

It would be good to clarify the exact requirement in the rules, as the effort required on the side of the submitter is different for the two cases.

Update section 5.6.2

"Inference benchmark directory names must be one of { mobilenet, ssd-small, resnet, ssd-large, gnmt }."
This needs to be updated for v0.7. Note that we are using '-99.9' and '-99' suffixes for the benchmarks where those apply.

training submission directory structure in benchmarks/

As a first time user I am profoundly confused by the structure of the submission directories. It appears to me that there are situations where there may be name clashes - in particular if a participant has run the same benchmark on the same system using different implementations.

Also I am confused about the allowed benchmark names for the training benchmarks. In particular bert is missing?

I submitted some pull request for possible changes to avoid this: 129, 130, 131

As we are in the middle of a submission period changes are probably not good to make now, but it would perhaps be a good idea to reach out to the current participants in case you think there is reason to clarify these matters?

Add Paper Citations to Repos

To help keep track of how MLPerf is being used, we should ask people to cite our work. I already created PRs for the training and inference repos.

Similar PRs would also be good for the following:

You could even include the "MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance" paper on the policies repo.

Reviewer assignment guarantee #250

Moving Training Policies issue #250 here:

Example language for rules:

Every submission gets a review by another submitter. If you submit, you may have to do a review as assigned by the review committee. The review committee will strive to match the amount of review work to the amount of work put into the submission. Any submitter is free to review any other submitter, in addition to any assigned reviews.

Example Algorithm: the review committee will:

Stack rank by number of submissions
Assign reviewers in pairs walking down the stack rank
If an odd number of reviewers, the bottom 3 in the stack rank will review each other.

Allow placing NOTES.txt under measurements/

The results table has a Notes column. However, at the moment it is filled in manually.

For populating this column automatically, I propose allowing to create a Tweet-sized file called NOTES.txt under measurements/. Multiple files for different benchmarks and scenarios could be concatenated.

Schedule deadline for public-facing materials

MLPerf's public-facing material (i.e. press releases, slide decks for press etc) should be available for review by submitters as part of the submission schedule. Let's add this for future rounds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.