Is there an option to only run the steps that downloads the data set without actually

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Sorry <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

option to prefetch the data about ck HOT 8 CLOSED

jdesfossez commented on August 13, 2024

option to prefetch the data

from ck.

Comments (8)

arjunsuresh commented on August 13, 2024

Yes, all CM scripts are modular and so we can do this. For example, the below command will download the full imagenet validation set and exports the downloaded paths.

cm run script - -tags=get,dataset,val,imagenet,original,_full -j

from ck.

jdesfossez commented on August 13, 2024

Thank you !
I tried it but it is complaining, am I missing something ?

$ cm run script - -tags=get,dataset,val,imagenet,original, _full -j
* cm run script _full --tags="get,dataset,val,imagenet,original"

CM error: no scripts were found with above tags (when variations ignored)!

from ck.

jdesfossez commented on August 13, 2024

Ok, the space before _full had to be removed, that one is working now.

But I am trying this example and nothing is happening, do you know why or how I can debug further ?
https://github.com/mlcommons/ck/tree/master/cm-mlops/script/get-ml-model-retinanet#cm-cli

$ cm run script --tags=get,ml-model,raw,resnext50,retinanet,object-detection
* cm run script "get ml-model raw resnext50 retinanet object-detection"

Thanks !

from ck.

gfursin commented on August 13, 2024

Hi @jdesfossez,

CM scripts installs artifacts to the CM cache and make them available to other CM scripts via API and/or ENV variables.

You can see the cache with all artifacts including above model as follows:

cm show cache
cm show cache --tags=get,ml-model,resnext50

You can find your model and extra CM meta files as follows:

cm find cache --tags=get,ml-model,raw,resnext50,retinanet,object-detection

Basically CM is a database of objects connected by tags, UIDs and ENV variables ...

Please check these 2 tutorials that may give you more ideas behind CM:

That's how we reuse individual CM scripts (and workflows assembled from those scripts) for reproducibility initiatives at conferences and other initiatives to make it easier to run AI on different platforms ...

We are interested to know your use cases and how CM can help - please feel free to talk to us via Discord server or we can set up a conf-call ...

Thank you for your interest and feedback!

from ck.

jdesfossez commented on August 13, 2024

Hi !
Thank you, that helps a lot, I missed the first tutorial, I am glad you linked it here.

My current goal is to automate performance testing of GPUs in a public cloud environment. I need to easily and quickly compare the impact of various hypervisor-level changes, so this project seems perfect for that purpose. Eventually I will use it as well to submit results.
Another quick question, is there a clean way for me to specify at run-time the location of the data ? For example if I wanted to make a local mirror and have the VM download from there instead of hitting the public servers.
Thanks again !

from ck.

arjunsuresh commented on August 13, 2024

Sorry @jdesfossez for the typo -- I was typing on mobile :(

"is there a clean way for me to specify at run-time the location of the data "

I believe you want to use a private URL here right? Currently we are supporting multiple downloaded sources like this but not custom URLs - we can do this by next release.

But for most of the large datasets, there is an option to provide the path to it via input/env variable and prevent a download like done here

from ck.

arjunsuresh commented on August 13, 2024

This solution can work for using custom URLs.

from ck.

jdesfossez commented on August 13, 2024

ah perfect, thank you so much !
I will close this as it's not really an issue, but I appreciate the guidance !

from ck.

option to prefetch the data about ck HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent