Giter Club home page Giter Club logo

Comments (8)

bmartinn avatar bmartinn commented on May 19, 2024 1

I actually think this is a great approach!

My thinking is that we will add it to the trains.conf as following:
sdk.development.default_output_uri=""

Then when no specific output_uri is provided for Task.init(...) it will use the the default value from the configuration file. Obviously when running with trains-agent you will be overriding both in the UI, but the value appearing in the UI will be the last value used when executed "manually".

Sounds good?

from clearml.

bmartinn avatar bmartinn commented on May 19, 2024

Hi @elinep ,

Your observation is correct, model snapshots (and also artifacts) will be automatically copied if an experiment is initialized with output_uri destination,

In the following example, all model files and artifacts will be copied to sub-folders in /tmp/data

task = Task.init('examples', 'model test', output_uri='/tmp/data')

And here we will upload a copy of the models / artifacts to an http/s server using http post:

task = Task.init('examples', 'model test', output_uri='https://demofiles.trains.allegro.ai')

Notice that if you are working with http post, I recommend upgrading to the latest RC, as we increased upload timeouts after receiving feedback that sometimes uploads fail too quickly.

$ pip install trains==0.12.2rc0

Is this what you were looking for?

from clearml.

elinep avatar elinep commented on May 19, 2024

Thanks for your reply.
Indeed, this is the feature I was looking for.

I still have some questions:

  1. Is it possible to put the output_uri in the trains.conf file?
  2. The default artifact destination is set in trains.conf api.file_server. Why model parameters behave differently?

from clearml.

bmartinn avatar bmartinn commented on May 19, 2024
  1. you can set the api.files_server in the trains.conf this will change the default artifacts upload destination, as well as the debug images destination. It will not cause Trains to store a copy of the model file in that destination though ...

  2. I guess the reason for that is the thin line between auto-magic and being creepy :)

Now for a longer more tedious explanation on what and why we designed it this way.

Artifacts & debug images are uploaded by the user as an active function call, this creates full transparency to the fact they are actively being sent and stored somewhere ( i.e. api.files_server but can be changed in the SDK).

Models are copied auto-magically, i.e. you still call model.save but trains will catch this call and copy the model file to some central storage.

One option we had was to always have this behavior and constantly copy models to the trains-server. But we received feedback from users that during "debugging" they usually had very little use for these models, and constantly storing made little sense.
This is why we opted for logging the location of the model files stored, but not for copying them somewhere.

That said we allowed this behavior to be controlled through the UI, so when automating the training process with trains-agent (right-click Clone + right-click Enqueue) you can set the output_uri destination by editing the "Output Destination" in the "Execution" tab of the experiment.

This will cause the remotely executed experiment to auto-magically copy all the model files the experiments creates to the desired output destination.

@elinep feel free to suggest other strategies for logging/storing models, we always welcome new ideas :)

from clearml.

elinep avatar elinep commented on May 19, 2024

Thank you for your detailed answer.

In principle, I would not make any differences between images, curve data or models. In my opinion, it makes sense for Trains to intercept and save every production.
In practice, I understand that models can be heavy and that systematic copy might cause issues (disk space, latency when downloading/uploading from/to the file server)

Still, I feel like it would make sense to have a default model output_uri parameter in the config. Then we can optionally disable (or enable) the automagical model update during the task init. I guess in practice users don't change this address so often, it would be convenient to set it once for all.

What do you think?

from clearml.

elinep avatar elinep commented on May 19, 2024

👍
Thanks for your time.

from clearml.

bmartinn avatar bmartinn commented on May 19, 2024

Hi @elinep ,
I'm happy to say the default_output_uri feature is already in RC, you can start using it :)
$ pip install trains==0.12.2rc2

from clearml.

elinep avatar elinep commented on May 19, 2024

well done, thanks

from clearml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.