Giter Club home page Giter Club logo

hi-ml's People

Contributors

ant0nsc avatar dccastro avatar dependabot[bot] avatar dumbledad avatar fepegar avatar harshita-s avatar javier-alvarez avatar jonathantripp avatar jubinmathew1995 avatar kenza-bouzid avatar markpinnock avatar maxilse avatar mebristo avatar microsoftopensource avatar peterhessey avatar pre-commit-ci[bot] avatar sangamswadik avatar vale-salvatelli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hi-ml's Issues

Basic Unit Tests

Sketch basic unit tests & mocks for:

  1. Uploading a single Python file to AzureML, running it, and returning the run_id. Then wait for run completion, then downloading output files and stdout

  2. Uploading a single Python file and a data file, e.g. data.csv, rest as above.

  3. Uploading two Python files, to check imports, rest as above.

Create a daily build for hi-ml that tests docker support

Building a docker image costs about 20min, that's too long for a PR build.

  • In the PR builds, only execute normal conda runs without docker.
  • In the daily build, test docker image generation, and in particular test if the docker_shm_size is respected
  • Ensure that, if the build fails, people get an email that is not the standard github email

Add all feature required to support InnerEye

  • run_config has a field max_run_duration_seconds, make that an argument of submit_if_needed
  • Add an argument for raise_on_error: wait_for_completion(self, ... raise_on_error=True):
  • Add an argument for returning after submitting the job, rather than exiting
  • Add an argument for experiment name

Use an alternative pypi server to upload the test packages

We have sporadic test pipeline failures when building a Docker image in the AzureML jobs. Example here The job claims that package version post282 does not exist - but the github build agents successfully pulled that version a few minutes before. This probably comes from a package mirror that AML are using, that is not completely up to date with test.pypi.
As a workaround, we could publish to an MS internal package feed. AML would probably not have a cache of that, and hence always use the latest.

Add tensorboard monitoring script

  • Script that starts tensorboard server on local box
  • should pick up most recent run automatically, or accept a run_id argument
  • See Word document for a link to examples from the chestXray project
  • This script should be installed automatically in /bin when installing the package.
  • Check with Mel if there are any extra gadgets that would be helpful here. For example: Should we not only store most_recent_run.txt, but a full submission history (most_recent_runs.txt), and tensorboard monitoring can then pick up the last 3, for example?

Switch tests to use a separate AzureML workspace

  • Create a new AzureML workspace and storage account only for the hi-ml package. Ideally, this should be tried out with Azure CLI tools - it would be ideal if we had a completely script-driven way of creating
    • resource group
    • workspace
    • storage account
    • service principal
    • data store
    • compute cluster
  • Document all the necessary steps in an MD file, store the script(s) to create everything
  • Switch the github workflow to this new workspace and service principal

Add/check the the `LGTM.UploadSnapshot` element of our security checks

Potential omission from #55 where it says

For CodeQL, please ensure the following (detailed instructions for CodeQL can be found here):
Select the source code language in the CodeQL task.
If your application was developed using multiple languages, add multiple CodeQL tasks.
Define the build variable LGTM.UploadSnapshot=true.
Configure the build to allow scripts to access OAuth token.
If the code is hosted in Github, create Azure DevOps PAT token with code read scope for dev.azure.com/Microsoft (or ‘all’)
organization and set the local task variable System_AccessToken with it. (Note: This only works for YAML-based pipelines.)
Review security issues by navigating to semmleportal.azurewebsites.net/lookup. It may take up to one day to process results.

This is not done yet (unless it happens automatically) and I cannot find any mention of LGTM in InnerEye to crib from.

The CodeQL Portal says "Only Visual Studio Team System and Azure Dev Ops URLs are supported" and will not upload a snapshot from GitHub

Add all items required for making the repository public

Ensure that all files have copyright notices, and that editors are set up to automatically insert them (PyCharm does it correctly on InnerEye)

You must run the following source code analysis tools:
CredScan
CodeQL (Semmle)
Component Governance Detection
The easiest way to run these tools is to add thems in your build pipeline in a Microsoft-managed Azure DevOps account.

For CodeQL, please ensure the following (detailed instructions for CodeQL can be found here):
Select the source code language in the CodeQL task.
If your application was developed using multiple languages, add multiple CodeQL tasks.
Define the build variable LGTM.UploadSnapshot=true.
Configure the build to allow scripts to access OAuth token.
If the code is hosted in Github, create Azure DevOps PAT token with code read scope for dev.azure.com/Microsoft (or ‘all’) organization and set the local task variable System_AccessToken with it. (Note: This only works for YAML-based pipelines.)
Review security issues by navigating to semmleportal.azurewebsites.net/lookup. It may take up to one day to process results.

Switch InnerEye to using hi-ml as a package

  • Branch on InnerEye-DeepLearning: antonsc/himl. This code uses hi-ml as a submodule
  • Remove the submodule hi-ml
  • Add hi-ml as a package in environment.yml
  • At present, there are two test failures, that should vanish when switching to hi-ml as a package:
    • test_register_and_score_model in the TrainEnsemble leg
    • test_submit_for_inference in the TrainInAzureMLViaSubmodule leg

Set up a PR build and other pipelines

From old repo, copy over everything that makes sense. This would include:

  • flake8 and mypy checks as github workflows
  • Running pytest and publishing pytest results to ADO
  • Running credscan and component governance
  • Copying issues to Azure DevOps

Handle the "v" in version numbering

Our code in setup.py will trigger with new tags. setuptools.setup will reject tags that are not release versions but we could do more to make that explicit by checking for the leading "v".

Also when we tag releases as, say, "v0.1.1" the leading "v" is carried through setuptools.setup so it becomes part of the pip test download

Successfully installed pip-21.2.4
Collecting hi-ml==v0.1.0
Downloading hi_ml-0.1.0-py3-none-any.whl (25 kB)

(from here)

This works, but it would be cleaner to submit the version number using the public version identifier format mandated in PEP 440, i.e. without the leading "v"

Update tooling

  • Add vscode config files
  • Add more testing to makefile
  • Add code coverage comment bot

Add options for input and output datasets

submit_if_needed needs arguments for

  • default data store
  • input datasets
  • output datasets

Datasets can be specified either programmatically via a DatasetConfig, or as a string. DatasetConfig should contain

  • dataset name. If does not yet exist, create
  • datastore name - if missing, use the default data store
  • version (optional) - only makes sense for input datasets
  • should the dataset be mounted or downloaded when running in AML?
  • folder location for mount or download when running in AML
  • Optional: folder location to use when running outside AML

Datasets as strings:

  • "foo": use dataset foo
  • "foo:2": Use version 2 of dataset "foo"
  • "mount:foo:/tmp/nix": Mount dataset at the given path. Create an equivalent for "download". Add options for versions

Depending on if the dataset is in the input or outputs list, we can create an input or output config from that.

Keeping private functions private

We are not consistent about prefixing private functions with one underscore (module functions), or two (instance methods). That doesn't matter much for an internal project but as we hope to have external developers using these packages we should signal clearly the difference between private functions which should only be called by code inside the package, and public functions which may be useful to consumers of the package.

When not providing ignored_folders, code crashes

amlignore_path is not assigned if ignored_folders is empty:

    if ignored_folders:
        amlignore_path = snapshot_root_directory or Path.cwd()
        amlignore_path = amlignore_path / ".amlignore"
        lines_to_append = [str(path) for path in ignored_folders] if ignored_folders else []
    with append_to_amlignore(
            amlignore=amlignore_path,
            lines_to_append=lines_to_append):

AML Entry script path as Linux-style path; setting data store

from old code:

    # AzureML seems to sometimes expect the entry script path in Linux format, hence convert to posix path
    entry_script_relative_path = source_config.entry_script.relative_to(source_config.root_folder).as_posix()

In the same washup:

    # Use blob storage for storing the source, rather than the FileShares section of the storage account.
    run_config.source_directory_data_store = workspace.datastores.get(WORKSPACE_DEFAULT_BLOB_STORE_NAME).name

Create documentation and usage examples

  • Start with the simplest possible script. Explain what is being uploaded and executed
  • Then add components one by one: Conda environment, pip extra index, docker
  • Use of existing environment (can skip that)
  • Input datasets
  • Output datasets
  • From the InnerEye documentation, copy over the section that explains how we are dealing with datasets. Extend and polish that. What are AML datasets, how do they relate to datasets in blob storage?

Better Requirements

Add a new file: run_requirements.txt with the package run requirements.
Parse this and add it to the setup.py install_requires array.
Add a new shell script to pip install all the requirements for ease of development.

Improve coverage for `submit_to_azure_if_needed`

  • Break submit_to_azure_if_needed into smaller functions
  • Write unit tests for those smaller functions
  • Write unit tests for the various utility functions they rely on (copied from Inner Eye where possible)
  • Where appropriate (hopefully everywhere!) mark as @pytest.mark.fast

Remove random post number from PyPi publication process

To publish a new package to PyPi (i.e. the public package repository, not the test one) we run these commands:

make clean
make build
twine upload dist/*

That runs setup.py. To get local testing working we added code to setup.py that checks whether it is running in GitHub and if it is not then it changes the post number to a string of nine random digits. Unfortunately that code runs when packaging for real, i.e. not as part of testing, and so instead of a build numbers like 0.0.1post1, 0.0.1post2 , 0.0.1post3 etc. we get ones like 0.0.1post 5725449762

Test Against Editable or Installed Package

When running the unit tests, some folders are created using the pytest tmp_path fixture and the src folder is copied into them. This means that the coverage tool things they are different to the installed package. When running as a test in github action then the package is already installed, so this should not be necessary. When running locally it should be possible to install the src folder as an editable package, with the -e option and the tests still run.

Upload and Run Very simple script

To get things started, upload and run a xscript that just prints out a message handed in as an argument using our new submit_to_azure_if_needed function

Switch PR pipeline to use make

In the Makefile, we now have code to do mypy, flake, environment building. Use those as building blocks in the PR build. This way we can ensure that the local dev environment is the same as used in the cloud.

Improve logic for waiting for job completion

Move this segment into the AzureML layer:

        # For PR builds where we wait for job completion, the job must have ended in a COMPLETED state.
        if self.azure_config.wait_for_completion and not is_run_and_child_runs_completed(azure_run):
            raise ValueError(f"Run {azure_run.id} in experiment {azure_run.experiment.name} or one of its child "
                             "runs failed.")

Add generic helper functions from InnerEye

  • package_setup_and_hacks - parts of it should be called by submit_if_needed (matplotlib, MKL)
  • is_global_rank_zero()
  • set_environment_variables_for_multi_node: This should probably be renamed to include something around Pytorch Lightning, because we set environment variables in a way that the PL trainer needs.

Add helper functions for use in AML jobs

Helper functions to use in code to download a file from run (that work seamlessly in local runs and in AML) – used for downloading a checkpoint

  • In AzureML, they should take authentication/workspace from the current run context
  • Outside AzureML, it needs a config.json object

Code coverage has 0% coverage

Unfortunately the build code coverage is still pointing at "src" when evaluating the package. Change it to point at "health"

Add a script to download files from runs

  • download via wildcard from the most recent run, or a given run
  • downloaded folder structure should include the run, to have multiple results side by side easily
  • Install as executable when installing the package, see #59

Use mypy_runner, not mypy

The github action, build-test-pr.yml, and makefile, both call mypy directly. Change them both to call mypy_runner instead.

Also change the build dependencies to publish to test.pypi only dependent on pytest

Improve dev onboarding documentation

  • Ensure that the Readme file on pypi is correct and meaningful
  • Describe how to set up a dev environment (create conda env/pip)
  • How to set up and use a Service Principal
  • How does device login work (and when does it not work)
  • Describe how to do a release (pep guidelines for versions)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.