Comments (8)
I actually think this is a great approach!
My thinking is that we will add it to the trains.conf
as following:
sdk.development.default_output_uri=""
Then when no specific output_uri
is provided for Task.init(...) it will use the the default value from the configuration file. Obviously when running with trains-agent
you will be overriding both in the UI, but the value appearing in the UI will be the last value used when executed "manually".
Sounds good?
from clearml.
Hi @elinep ,
Your observation is correct, model snapshots (and also artifacts) will be automatically copied if an experiment is initialized with output_uri
destination,
In the following example, all model files and artifacts will be copied to sub-folders in /tmp/data
task = Task.init('examples', 'model test', output_uri='/tmp/data')
And here we will upload a copy of the models / artifacts to an http/s server using http post:
task = Task.init('examples', 'model test', output_uri='https://demofiles.trains.allegro.ai')
Notice that if you are working with http post, I recommend upgrading to the latest RC, as we increased upload timeouts after receiving feedback that sometimes uploads fail too quickly.
$ pip install trains==0.12.2rc0
Is this what you were looking for?
from clearml.
Thanks for your reply.
Indeed, this is the feature I was looking for.
I still have some questions:
- Is it possible to put the output_uri in the trains.conf file?
- The default artifact destination is set in trains.conf api.file_server. Why model parameters behave differently?
from clearml.
-
you can set the
api.files_server
in thetrains.conf
this will change the default artifacts upload destination, as well as the debug images destination. It will not cause Trains to store a copy of the model file in that destination though ... -
I guess the reason for that is the thin line between auto-magic and being creepy :)
Now for a longer more tedious explanation on what and why we designed it this way.
Artifacts & debug images are uploaded by the user as an active function call, this creates full transparency to the fact they are actively being sent and stored somewhere ( i.e. api.files_server
but can be changed in the SDK).
Models are copied auto-magically, i.e. you still call model.save but trains will catch this call and copy the model file to some central storage.
One option we had was to always have this behavior and constantly copy models to the trains-server. But we received feedback from users that during "debugging" they usually had very little use for these models, and constantly storing made little sense.
This is why we opted for logging the location of the model files stored, but not for copying them somewhere.
That said we allowed this behavior to be controlled through the UI, so when automating the training process with trains-agent
(right-click Clone + right-click Enqueue) you can set the output_uri
destination by editing the "Output Destination" in the "Execution" tab of the experiment.
This will cause the remotely executed experiment to auto-magically copy all the model files the experiments creates to the desired output destination.
@elinep feel free to suggest other strategies for logging/storing models, we always welcome new ideas :)
from clearml.
Thank you for your detailed answer.
In principle, I would not make any differences between images, curve data or models. In my opinion, it makes sense for Trains to intercept and save every production.
In practice, I understand that models can be heavy and that systematic copy might cause issues (disk space, latency when downloading/uploading from/to the file server)
Still, I feel like it would make sense to have a default model output_uri parameter in the config. Then we can optionally disable (or enable) the automagical model update during the task init. I guess in practice users don't change this address so often, it would be convenient to set it once for all.
What do you think?
from clearml.
👍
Thanks for your time.
from clearml.
Hi @elinep ,
I'm happy to say the default_output_uri
feature is already in RC, you can start using it :)
$ pip install trains==0.12.2rc2
from clearml.
well done, thanks
from clearml.
Related Issues (20)
- Training take 2x longer since 1.13.0 with FastAI HOT 3
- Parameter type defaults to string in experiment window HOT 2
- relative path to `clearml.conf` does not work HOT 3
- Unable to use StorageManager to cache files on NFS storage HOT 1
- HPO converts all hyperparameters into strings HOT 2
- Color selection in Reports HOT 7
- Not seeing "DevOps Services" example project HOT 2
- Save hidden/visible scalars layout in "Compare Experiments" tab HOT 2
- OutputModel.config_dict causes "E AttributeError: 'DummyModel' object has no attribute 'locked'" HOT 1
- Pipeline example does not work HOT 10
- ClearML does not find all packages HOT 5
- Local data sync into clearml-data HOT 1
- Clear plots tab HOT 1
- Add an option to hide MULTI_NODE_INSTANCE Tasks HOT 1
- Patches for Lightning have not kept up with backwards-incompatible changes HOT 5
- Allow more powerful `get` pipelines filter HOT 1
- Services queue no longer working HOT 4
- Hierarchical/subfolder support for organizing figures in Plots tab
- Incorrect docker environment setup HOT 2
- Invalid requirement error when having clearml[gs] installed HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml.