avsecz / gin-train Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 3.0 77 KB

Tracking ML experiments using gin-config, wandb, comet.ml and S3.

License: MIT License

Python 97.22% Makefile 2.78%

gin-train's People

Contributors

Stargazers

Watchers

Forkers

dansimancas hoeze stefanches7

gin-train's Issues

CLI returns 120 output code. It should return 0

Seems that due to the way argh returns the serialized dictionary, the return code is 120 instead of 0 which doesn't work well with snakemake

Log files to comet

https://www.comet.ml/docs/python-sdk/Experiment/#experimentlog_asset

Add generic logger for different experiment tracking libraries

Since there are multiple solutions available for tracking ML experiments, it would be nice to have an abstract ExperimentLogger() class and specific implementations of it to allow for all the other platform-specific experiment classes.

Decorate external libraries

Add a module external (not imported by default), where you would externally import all the classes from libraries such as fastai

maybe one could have a separate python package for each

Extenal libraries

pytorch
fastai
keras

Hyper-parameter optimization

Main goal would be to define the Objective analogous to kopt' CompileFN but now using the gin-config. Arguments of Objective would be same as arguments to gin_train but where the gin-config files would be normal gin files which would be overriden using gin bindings (https://github.com/Avsecz/gin-train/blob/master/gin_train/cli/gin_train.py#L187). E.g. either pass to parse_config_files_and_bindings as bindings or specify by:

gin.bind_parameter('supernet.num_layers', 5)
gin.bind_parameter('supernet.weight_decay', 1e-3)

where values can be any valid python object (lists, tuples, dicts, strings). Note that if we use bind_parameter, then finalize() should only be called after we have bound all the parameters.

import json
def config2bindings(config):
    return [f"{k} = {json.dumps(v)}" for k,v in config.items()]

config = {'asd': [1,2,3],
   ...: "dsa": 10,
   ...: "dsads": "11",
   ...: "dsasdsadas": {"a": 1}}

bindings = config2bindings(config)

In [9]: for p in bindings: print(p)
asd = [1, 2, 3]
dsa = 10
dsads = "11"
dsasdsadas = {"a": 1}

Both assume that the dictionary would solely be a key-value mapping which might contain dictionaries / lists as values but these will not be interpreted as nested variables.

Note - note_params should be used to keep track of the hyper-parameter optimization study and the run-id

Additional arguments to Objective

objective_metric="acc", # which metric to optimize for. can be nested if multiple
objective_metric_mode="max"

Multiple different Objective versions would need to be implemented, one for each hyper-parameter optimization system.

Supported backends:

ray tune - RayObjective(...)
- which also supports HyperOpt
(maybe) hyperopt - HyperoptObjective

For more advanced scenarios we would probably need to implement the Trainable class ourselves

Use log_parameters

Instead of Experiment.log_multiple_params use Experiment.log_parameters

Upload trained models to S3 etc

Design decisions

Saving to different backends

output_dir allows to have the full s3 path or gcs path
output_dir can be a comma separated list of output_directories including S3
Implement a wrapper to write the output to multiple locations
Use pyfilesystem2 for easier writes (pass the directories)
- https://github.com/PyFilesystem/pyfilesystem2

https://www.pyfilesystem.org/page/s3fs/

from fs import open_fs
s3fs = open_fs('s3://mybucket')
s3fs.listdir()

# gcs
# pip install fs-gcsfs
gcsfs = open_fs("gs://mybucket/root_path?strict=False")

# ssh
# pip install fs.sshfs
my_fs = fs.open_fs("ssh://[user[:password]@]host[:port]/[directory]")

Caveats

If s3 or other filesystems don't allow to append, then only write the files once they are complete
- is there a way to buffer the writes?
or, write to a local disk and then at the end upload all the results to S3

Adding a random prefix

Allow the user to only specify the local folder where to save the results while auto-generating the final folder name

Flag name: --auto-subdir

Notes

test if you can write to the remote file-system before training the model
- write out the hyper-parameters

Authentication

use environment variables

Update

first simple version. write all files to a local directory. When you're done. Run copy_fs():
- https://docs.pyfilesystem.org/en/latest/guide.html#moving-and-copying
- https://www.willmcgugan.com/blog/tech/post/announcing-pyfilesystem-2/
- if the local directory was not specified under output_file create a temporary directory in /tmp/

Add gt-gather command

Add a command which gathers all the experiments in a folder into a single table (csv file), similar to the table in kopt. This can then be easily imported into google sheets

Sacred support

https://github.com/IDSIA/sacred
sacred is an alternative to wandb.io for managing training runs.
It can store the model together with a source code copy in mongodb.

(Omniboard)[https://github.com/vivekratnavel/omniboard] is a very good web frontend for it.

Advantages:

open source / free
stores source code if needed
can be hosted locally

NAs conversion issue

I run a simple model with one-hot encoded sequence as feature. I've tried both single float as target and nparray as target, nevertheless get following error in both cases:

Add gputil gpu scheduling and memory fraction

Add https://github.com/anderskm/gputil to auto-schedule model training

import GPUtil

if args.gpu == -1:
    gpu = GPUtil.getFirstAvailable(attempts=3, includeNan=True)[0]
else:
    gpu = args.gpu

add also the GPU memory fraction to use.

Allow to specify `per_process_gpu_memory_fraction` in the `gt` CLI

The value is currently hard-coded to .5