benchmarks-tekton's People
benchmarks-tekton's Issues
When running CPU version of the benchmarking, it still requests GPUs
In tf_cnn/task/benchmark.yaml:
resources:
limits:
memory: "12Gi"
nvidia.com/gpu: 4
requests:
memory: "10Gi"
nvidia.com/gpu: 4
This is used even during CPU-only runs
The benchmark does not wait for the ImageStream build to complete before running the next test
When doing multiple runs, I noticed that as long as the ImageStream has a tag present, it will run the tests regardless of whether or not the new image build has completed. That's not a problem per se, but thought it was odd behavior
CPU benchmark has null num_gpus, code uses it for number of CPUs
When using CPU-only benchmarking, it sets num_gpus, however setting it this way causes an error. In addition, setting it to 0 causes a different error due to:
https://github.com/tensorflow/benchmarks/blob/master/scripts/tf_cnn_benchmarks/benchmark_cnn.py#L750,L752
It uses that for the number of CPUs to run
Fix the broken master branch
recent change to support gpus has broken the master branch.
There is no easy fix other than reverting the commit to 485736f
-
Remove buildconfig tasks
-
Refactor cpu and gpu benchmark tasks
BuildConfig references wrong ImageStream tag
When attempting to run the benchmarks, one issue I found was that the ImageStream tag and the tag referenced in the BuildConfig do not match:
ImageStream Tag: tfc-benchmark
BuildConfig Uses: tf-cnn-benchmark:latest
Addressed by my PR, but issue here for tracking
Convert IntelAI / models into Tekton Pipelines
Use the models in the following repo https://github.com/IntelAI/models/tree/master/benchmarks
to create Tekton Pipelines which use TensorFlow models into tekton pipelines.
Dockerfile doesn't build, devtoolset-7 not valid for scl_enable
When running the dockerfile build, it fails on the line:
RUN source scl_source enable devtoolset-7 rh-python36 && \
pip install --upgrade pip setuptools wheel && \
Saying devtoolset-7 is invalid, I removed it and it appears to work
Benchmark fails almost immediately on training start with OOM error
https://github.com/tensorflow/benchmarks is no longer maintained https://github.com/tensorflow/models/tree/master/official is listed as alternative
According to the README.md at https://github.com/tensorflow/benchmarks/blob/master/scripts/tf_cnn_benchmarks/README.md it seems the repo is no longer maintained, and directs users over to https://github.com/tensorflow/models/tree/master/official
Should I look at refactoring the tests to use maintained models?
Convert GNMT benchmark into Tekton pipelines
Use the following repo https://github.com/mlperf/training_results_v0.6/tree/master/NVIDIA/benchmarks
to create a tekton pipeline for GNMT model.
The pipeline should use 1-4 GPUs.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.