Hi, According to the readme, trains supports: <p di

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

you can set the api.files_server

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Model snapshots about clearml HOT 8 CLOSED

allegroai commented on May 19, 2024

Model snapshots

from clearml.

Comments (8)

bmartinn commented on May 19, 2024 1

I actually think this is a great approach!

My thinking is that we will add it to the trains.conf as following:
sdk.development.default_output_uri=""

Then when no specific output_uri is provided for Task.init(...) it will use the the default value from the configuration file. Obviously when running with trains-agent you will be overriding both in the UI, but the value appearing in the UI will be the last value used when executed "manually".

Sounds good?

from clearml.

bmartinn commented on May 19, 2024

Hi @elinep ,

Your observation is correct, model snapshots (and also artifacts) will be automatically copied if an experiment is initialized with output_uri destination,

In the following example, all model files and artifacts will be copied to sub-folders in /tmp/data

task = Task.init('examples', 'model test', output_uri='/tmp/data')

And here we will upload a copy of the models / artifacts to an http/s server using http post:

task = Task.init('examples', 'model test', output_uri='https://demofiles.trains.allegro.ai')

Notice that if you are working with http post, I recommend upgrading to the latest RC, as we increased upload timeouts after receiving feedback that sometimes uploads fail too quickly.

$ pip install trains==0.12.2rc0

Is this what you were looking for?

from clearml.

elinep commented on May 19, 2024

Thanks for your reply.
Indeed, this is the feature I was looking for.

I still have some questions:

Is it possible to put the output_uri in the trains.conf file?
The default artifact destination is set in trains.conf api.file_server. Why model parameters behave differently?

from clearml.

bmartinn commented on May 19, 2024

you can set the api.files_server in the trains.conf this will change the default artifacts upload destination, as well as the debug images destination. It will not cause Trains to store a copy of the model file in that destination though ...
I guess the reason for that is the thin line between auto-magic and being creepy :)

Now for a longer more tedious explanation on what and why we designed it this way.

Artifacts & debug images are uploaded by the user as an active function call, this creates full transparency to the fact they are actively being sent and stored somewhere ( i.e. api.files_server but can be changed in the SDK).

Models are copied auto-magically, i.e. you still call model.save but trains will catch this call and copy the model file to some central storage.

One option we had was to always have this behavior and constantly copy models to the trains-server. But we received feedback from users that during "debugging" they usually had very little use for these models, and constantly storing made little sense.
This is why we opted for logging the location of the model files stored, but not for copying them somewhere.

That said we allowed this behavior to be controlled through the UI, so when automating the training process with trains-agent (right-click Clone + right-click Enqueue) you can set the output_uri destination by editing the "Output Destination" in the "Execution" tab of the experiment.

This will cause the remotely executed experiment to auto-magically copy all the model files the experiments creates to the desired output destination.

@elinep feel free to suggest other strategies for logging/storing models, we always welcome new ideas :)

from clearml.

elinep commented on May 19, 2024

Thank you for your detailed answer.

In principle, I would not make any differences between images, curve data or models. In my opinion, it makes sense for Trains to intercept and save every production.
In practice, I understand that models can be heavy and that systematic copy might cause issues (disk space, latency when downloading/uploading from/to the file server)

Still, I feel like it would make sense to have a default model output_uri parameter in the config. Then we can optionally disable (or enable) the automagical model update during the task init. I guess in practice users don't change this address so often, it would be convenient to set it once for all.

What do you think?

from clearml.

elinep commented on May 19, 2024

👍
Thanks for your time.

from clearml.

bmartinn commented on May 19, 2024

Hi @elinep ,
I'm happy to say the default_output_uri feature is already in RC, you can start using it :)
$ pip install trains==0.12.2rc2

from clearml.

elinep commented on May 19, 2024

well done, thanks

from clearml.

Model snapshots about clearml HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent