microsoft / mlos Goto Github PK

View Code? Open in Web Editor NEW

134.0 12.0 63.0 255.76 MB

MLOS is a project to enable autotuning for systems.

Home Page: https://microsoft.github.io/MLOS

License: MIT License

PowerShell 0.88% Batchfile 0.01% Python 83.70% Jupyter Notebook 9.71% Dockerfile 0.52% Shell 2.02% Makefile 3.15%

performance-engineering optimize-systems infrastructure data-science autotuning benchmarking benchmarking-framework

mlos's Introduction

MLOS

MLOS is a project to enable autotuning for systems.

MLOS

Overview

MLOS currently focuses on an offline tuning approach, though we intend to add online tuning in the future.

To accomplish this, the general flow involves

Running a workload (i.e., benchmark) against a system (e.g., a database, web server, or key-value store).
Retrieving the results of that benchmark, and perhaps some other metrics from the system.
Feed that data to an optimizer (e.g., using Bayesian Optimization or other techniques).
Obtain a new suggested config to try from the optimizer.
Apply that configuration to the target system.
Repeat until either the exploration budget is consumed or the configurations' performance appear to have converged.

Source: LlamaTune: VLDB 2022

For a brief overview of some of the features and capabilities of MLOS, please see the following video:

Organization

To do this this repo provides two Python modules, which can be used independently or in combination:

mlos-bench provides a framework to help automate running benchmarks as described above.
mlos-viz provides some simple APIs to help automate visualizing the results of benchmark experiments and their trials.

It provides a simple plot(experiment_data) API, where experiment_data is obtained from the mlos_bench.storage module.
mlos-core provides an abstraction around existing optimization frameworks (e.g., FLAML, SMAC, etc.)

It is intended to provide a simple, easy to consume (e.g. via pip), with low dependencies abstraction to
- describe a space of context, parameters, their ranges, constraints, etc. and result objectives
- an "optimizer" service abstraction (e.g. register() and suggest()) so we can easily swap out different implementations methods of searching (e.g. random, BO, LLM, etc.)
- provide some helpers for automating optimization experiment runner loops and data collection

For these design requirements we intend to reuse as much from existing OSS libraries as possible and layer policies and optimizations specifically geared towards autotuning systems over top.

By providing wrappers we aim to also allow more easily experimenting with replacing underlying optimizer components as new techniques become available or seem to be a better match for certain systems.

Contributing

See CONTRIBUTING.md for details on development environment and contributing.

Getting Started

The development environment for MLOS uses conda and devcontainers to ease dependency management, but not all these libraries are required for deployment.

For instructions on setting up the development environment please try one of the following options:

see CONTRIBUTING.md for details on setting up a local development environment
launch this repository (or your fork) in a codespace, or
have a look at one of the autotuning example repositories like sqlite-autotuning to kick the tires in a codespace in your browser immediately :)

`conda` activation

Create the mlos Conda environment.
```
conda env create -f conda-envs/mlos.yml
```
See the conda-envs/ directory for additional conda environment files, including those used for Windows (e.g. mlos-windows.yml).

or
```
# This will also ensure the environment is update to date using "conda env update -f conda-envs/mlos.yml"
make conda-env
```
Note: the latter expects a *nix environment.
Initialize the shell environment.
```
conda activate mlos
```

Usage Examples

`mlos-core`

For an example of using the mlos_core optimizer APIs run the BayesianOptimization.ipynb notebook.

`mlos-bench`

For an example of using the mlos_bench tool to run an experiment, see the mlos_bench Quickstart README.

Here's a quick summary:

./scripts/generate-azure-credentials-config > global_config_azure.jsonc

# run a simple experiment
mlos_bench --config ./mlos_bench/mlos_bench/config/cli/azure-redis-1shot.jsonc

`mlos-viz`

For a simple example of using the mlos_viz module to visualize the results of an experiment, see the sqlite-autotuning repository, especially the mlos_demo_sqlite_teachers.ipynb notebook.

Installation

The MLOS modules are published to pypi when new releases are tagged:

To install the latest release, simply run:

# this will install just the optimizer component with SMAC support:
pip install -U mlos-core[smac]

# this will install just the optimizer component with flaml support:
pip install -U "mlos-core[flaml]"

# this will install just the optimizer component with smac and flaml support:
pip install -U "mlos-core[smac,flaml]"

# this will install both the flaml optimizer and the experiment runner with azure support:
pip install -U "mlos-bench[flaml,azure]"

# this will install both the smac optimizer and the experiment runner with ssh support:
pip install -U "mlos-bench[smac,ssh]"

# this will install the postgres storage backend for mlos-bench
# and mlos-viz for visualizing results:
pip install -U "mlos-bench[postgres]" mlos-viz

Details on using a local version from git are available in CONTRIBUTING.md.

mlos's People

Contributors

Stargazers

Watchers

mlos's Issues

Improve mlos library to support Python 3.8

Several of the modern tool installers default to Python 3.8. We should see what can be done to make the mlos python library work for that as well rather than pinning on 3.7 which complicates the install/setup process.

Reevaluate use of `getcwd` in python code

There are several places where getcwd is used to compose a path assuming that the script is executed from the source/Mlos.Python directory or else used to create a temporary file.

For the first, we should change it to be relative to the file referencing it so that the script can be executed from a different directory.

For the second, we should be using the system provided get temp files to avoid security issues.

Windows docker instructions miss building the container

https://microsoft.github.io/MLOS/documentation/01-Prerequisites/#step-3-install-windows-build-tools

Add timeout to CI checks

A python unit test failed in Github Action pipeline, but it took a long time:
https://github.com/microsoft/MLOS/pull/29/checks?check_run_id=1061557146#step:9:41362

We should see about controlling that with some timeouts.

Using vscode from WSL for dotnet editing requires additional setup work

Right now, attempting to edit dotnet code inside vscode, launched from a WSL instance throws an error about a missing .net sdk.

In theory, we should be able to do the following:

# setup the environment to find the locally installed dotnet
. ./scripts/init.linux.sh
# start vscode and inherit those environment variables (especially PATH)
code .

Unfortunately, it seems that in WSL, vscode is launched really as a remote Windows process with a proxy server to the WSL environment, so those variables are not passed through.

microsoft/vscode-remote-release#1700

Either we should apply some of the suggested fixes in that issue to automatically set them up for the user, or just document how to install a dotnet sdk in the environment.

A third option I'd like to find time to do is to publish our docker images and provide a .devcontainer/ json for automatically letting vscode set itself up in a reasonable way to edit in the container with all the right bits already prepared.

Use logger instead of print in optimizer startup scripts

per @byte-sculptor

add conda installation instructions for notebooks

Add OSS Examples

We need some OSS examples to use both for initial experience and documentation purposes as well as CI/CD test integrations.

Add support for alternative backend storage

MLOS should be a bit more generic in its support of backend storage for the models, optimizers, experiments, etc. (e.g. not just SqlServer).

Some other potential targets include: mlflow with files, sqlite, mysql, postgres, etc.

CI pipelines for regenerating and deploying documentation website via hugo.

We should revisit hooking up documentation generation via hugo to the CI pipelines at some point.

Also documenting how to do it even if it's still manual.

Originally posted by @bpkroth in #14 (comment)

Create mailmap

We should create a mailmap for microsoft alias -> github handles so that the commit logs map both identities to the same person.

Add logging infrastructure

We've noticed that troubleshooting, particularly where there is cross process/language waiting or lookups involved, could be added by some simple print statements to track the status of various operations (e.g. assembly lookup, synchronization points, etc.)

However, these statements are undesirable in a production environment so have been eschewed thus far.

We should implement a logging mechanism to make that configurable on a case by case basis.

Import initial MLOS snapshot code

Need to extract out the private portions of the initial implementation for open-source publication.

Enable gcc support

Currently to build C++ code generation code we rely on clangs support for MSVC attributes (e.g. to ignore duplicate definitions at link time).

There are multiple ways around this that would help enable gcc support including reorganizing the code generation output or using some macros to add #ifdef wrappers around the attributes usages.

Consider this as icing/wishlist for now.

Add code coverage checks

Breaking out from #6:

It would be good to add code coverage checks and badges for that to the repo landing pages.

Add tests for IPython Notebooks

At the very least we should make sure that these notebooks continue to execute without throwing exceptions. At best we validate that their output is what we expected.

We should also assert that specific notebooks are checked in with outputs.

Improve VSCode based building/debugging.

We can consider adding some .vscode configurations to aid the initial build/debug experience inside the VSCode IDE.

Move to pytest

we should use pytest as test runner

Fix broken links it git pages

There are lots of broken links when browsing this https://microsoft.github.io/MLOS/ due to relative links in the markdown.

I may take on trying to fix them up using the archor/sed hack discussed in #14

Add .editorconfigs

We can consider adding some .editorconfig style entries to aid editors to follow style guidelines in addition to build-time lint checking.

Ensure microservices can be started from VSCode

There might be an issue with starting the GRPC services as processes from within VSCode. We need to double check.

Python Unit Test Timeouts

Hmm, Python unit tests are still timing out. Something else might be going on than the Python unit tests just randomly taking a long time due to high-degree polynomials being chosen (which I think we already reduced).

Originally posted by @bpkroth in #66 (comment)

Add Linux build/test piplines

Should be able to do this with a combination of Docker and make fairly shortly.

Fixup python dependency setup instructions for consistency

I think during #12 (https://github.com/microsoft/MLOS/pull/12/files#diff-758669e740e7424c31dfb6cf2edbd749) @amueller and I discussed making the instructions here https://github.com/microsoft/MLOS/blob/main/documentation/01-Prerequisites.md#install-python-dependencies consistent with the instructions at https://github.com/microsoft/MLOS#python-only-installation

Make sure pip is installed in the conda environment

build instructions: docker is not alternative path

Looking at the build docs it looks like there are three paths, docker, linux and windows:
https://microsoft.github.io/MLOS/documentation/02-Build/#docker

But docker is actually "just" how to install docker, so you still need to run the linux installation afterwards.
I think we should restructure the docs to make this more clear, or just say "now do the linux build" after the docker install.

Improve docker development integration with vscode

We should add a .devcontainer/ json config to allow easier development integration between vscode and docker:
https://code.visualstudio.com/docs/remote/containers

Probably best to do this after #36

Fix documentation generation issues with recent nbconvert

We currently get an error in the website/build_site.sh script if pip selects a new version (e.g. 6.0.1):

jinja2.exceptions.TemplateNotFound: index.md.j2

Originally posted by @bpkroth in #61 (comment)

test_lasso_hierarchical_categorical_predictions seems flaky

@edcthayer adding an issue to track this here

Seen this a couple of times recently. Sometimes with slightly different KeyErrors (e.g. medium_quadratic_params instead).
Rerunning it seems to make it go away.

2020-09-22T15:46:14.7210731Z [10 rows x 5 columns]
2020-09-22T15:46:14.7211123Z     raise_missing = True
2020-09-22T15:46:14.7211913Z     self = <pandas.core.indexing._LocIndexer object at 0x0000021609A396D8>
2020-09-22T15:46:14.7212932Z   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\pandas\core\indexing.py", line 1646, in _validate_read_indexer
2020-09-22T15:46:14.7213796Z     raise KeyError(f"{not_found} not in index")
2020-09-22T15:46:14.7214363Z     ax = Index(['vertex_height', 'medium_quadratic_params.x_1',
2020-09-22T15:46:14.7214990Z        'medium_quadratic_params.x_2', 'low_quadratic_params.x_1',
2020-09-22T15:46:14.7215540Z        'low_quadratic_params.x_2'],
2020-09-22T15:46:14.7216039Z       dtype='object')
2020-09-22T15:46:14.7216430Z     axis = 1
2020-09-22T15:46:14.7216840Z     indexer = array([ 0,  3,  4,  1,  2, -1, -1], dtype=int64)
2020-09-22T15:46:14.7217460Z     key = Index(['vertex_height', 'low_quadratic_params.x_1', 'low_quadratic_params.x_2',
2020-09-22T15:46:14.7218181Z        'medium_quadratic_params.x_1', 'medium_quadratic_params.x_2',
2020-09-22T15:46:14.7218832Z        'high_quadratic_params.x_1', 'high_quadratic_params.x_2'],
2020-09-22T15:46:14.7219465Z       dtype='object')
2020-09-22T15:46:14.7219832Z     missing = 2
2020-09-22T15:46:14.7220333Z     not_found = ['high_quadratic_params.x_1', 'high_quadratic_params.x_2']
2020-09-22T15:46:14.7220882Z     raise_missing = True
2020-09-22T15:46:14.7221500Z     self = <pandas.core.indexing._LocIndexer object at 0x0000021609A396D8>
2020-09-22T15:46:14.7222280Z KeyError: "['high_quadratic_params.x_1', 'high_quadratic_params.x_2'] not in index"

From:
https://pipelines.actions.githubusercontent.com/fOuLpdRLhJegHdOkuie9qUN2ZnNM5WKTsTmQkIMYbUKwUvkp3o/_apis/pipelines/1/runs/246/signedlogcontent/14?urlExpires=2020-09-22T16%3A10%3A29.3461235Z&urlSigningMethod=HMACV1&urlSignature=EIMbhzZxH5LmY8MLhX91%2FWJ9h02Ut6Typi2flZjOW5Q%3D

remove SmartCache.ipynb

Publish C# API documentation from comments

We already do this for Python using sphinx.
It's already possible to output xml from the msbuild .csproj files. Should be able to output either HTML directly or use another tool to help with that.

Enable -Wall for C++ builds on Linux

Current Linux builds don't enable -Wall (i.e. fail on all warnings).

This is generally good practice and would help us enable integration with more projects with fewer build issues.

Getting there may take some reorgs of the code generation output (e.g. to avoid duplicate definitions that are currently just tagged to ignore in the linker).

Publish docker build images

We should publish the base build image portion used in the current base Dockerfile to avoid needing to execute all of the apt-get commands each time.

This would also be useful for CI pipelines wanting to make use of those images to build/test.

Enable CI/CD integrations on GitHub for gated checkins

We need to re-implement gated checkin pipelines on GitHub to ensure good quality.

Add C support

This will need some C++ wrapper functions and some build tweaks to consuming projects.

Current thought is to integrate with sqlite as an example.

Reliance on docker is problematic on windows machines

Installing docker requires windows 10 2004 which is blocked on some machines and it might be tricky to install. I didn't manage on my workstation. This might need a workaround for doing this in a teaching environment.

Add Java support

Mostly a placeholder for future work: we should add java support for code generation to talk to the external agent for tracking experiments and telemetry so that we can tune another common class of systems: distributed java applications.

One nice possibility here could be to integrate existing java language attributes so that the code generation process is a bit more native feeling (rather than the extra C# annotated structs that we currently have to support cross compiler C++).

Internal references broken on hugo website

Clicking a link in the Contents section of Prerequisites sends you to a different page:
https://microsoft.github.io/MLOS/documentation/01-Prerequisites/

Ensure license headers are present in all shell scripts

Non-option arguments to MlosAgentServer are silently ignored

Calling

tools/bin/dotnet out/dotnet/source/Mlos.Agent.Server/obj/AnyCPU/Mlos.Agent.Server.dll anything_at_all

doesn't yield an error and anything_at_all is silently ignored. I think we should error if there's anything we can't parse.

Add badges for build status

Breaking out open items from #6:

It would be nice to add build status badges to the main github landing page.

run notebooks when building website

Right now the checked-in version is just converted with nbconvert, we should add the execute flag.

Can't call optimum from BayesianOptimizerProxy

currently the optimum implementation of BayesianOptimizerProxy is empty.

Move the CI to conda

Do we have an issue to move the CI to conda?

Originally posted by @amueller in #79 (comment)

Investigate Code Owners to help auto-populate PR reviewers list

https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners

Call uncrustify during Linux build process as well

Currently uncrustify is only run during the msbuild tasks/pipelines. We should make it work for CMake and Linux as well.

Execute Python long haul tests on a schedule

@byte-sculptor observed that we are currently missing the Python long haul tests.

We don't necessarily want these in the CI pipelines (they take too long), but we do want to execute them periodically.

To do that, we'll need a separate .github/workflows/scheduled.yml sort of file for the scheduled tasks.

Another task that we'll want to put in there are periodic docker image rebuilds (e.g. to catch security patches). See Also: #36

microsoft / mlos Goto Github PK

mlos's Introduction

MLOS

Contents

Overview

Organization

Contributing

Getting Started

conda activation

Usage Examples

mlos-core

mlos-bench

mlos-viz

Installation

See Also

Examples

mlos's People

Contributors

Stargazers

Watchers

Forkers

mlos's Issues

Recommend Projects

Recommend Topics

Recommend Org

`conda` activation

`mlos-core`

`mlos-bench`

`mlos-viz`