microsoft / mlos Goto Github PK
View Code? Open in Web Editor NEWMLOS is a project to enable autotuning for systems.
Home Page: https://microsoft.github.io/MLOS
License: MIT License
MLOS is a project to enable autotuning for systems.
Home Page: https://microsoft.github.io/MLOS
License: MIT License
We can consider adding some .vscode
configurations to aid the initial build/debug experience inside the VSCode IDE.
Installing docker requires windows 10 2004 which is blocked on some machines and it might be tricky to install. I didn't manage on my workstation. This might need a workaround for doing this in a teaching environment.
There are lots of broken links when browsing this https://microsoft.github.io/MLOS/ due to relative links in the markdown.
I may take on trying to fix them up using the archor/sed hack discussed in #14
Several of the modern tool installers default to Python 3.8. We should see what can be done to make the mlos python library work for that as well rather than pinning on 3.7 which complicates the install/setup process.
There might be an issue with starting the GRPC services as processes from within VSCode. We need to double check.
Current Linux builds don't enable -Wall
(i.e. fail on all warnings).
This is generally good practice and would help us enable integration with more projects with fewer build issues.
Getting there may take some reorgs of the code generation output (e.g. to avoid duplicate definitions that are currently just tagged to ignore in the linker).
@byte-sculptor observed that we are currently missing the Python long haul tests.
We don't necessarily want these in the CI pipelines (they take too long), but we do want to execute them periodically.
To do that, we'll need a separate .github/workflows/scheduled.yml
sort of file for the scheduled tasks.
Another task that we'll want to put in there are periodic docker image rebuilds (e.g. to catch security patches). See Also: #36
Right now the checked-in version is just converted with nbconvert, we should add the execute
flag.
Breaking out open items from #6:
It would be nice to add build status badges to the main github landing page.
Need to extract out the private portions of the initial implementation for open-source publication.
We currently get an error in the website/build_site.sh
script if pip
selects a new version (e.g. 6.0.1):
jinja2.exceptions.TemplateNotFound: index.md.j2
Originally posted by @bpkroth in #61 (comment)
Clicking a link in the Contents section of Prerequisites sends you to a different page:
https://microsoft.github.io/MLOS/documentation/01-Prerequisites/
Hmm, Python unit tests are still timing out. Something else might be going on than the Python unit tests just randomly taking a long time due to high-degree polynomials being chosen (which I think we already reduced).
Originally posted by @bpkroth in #66 (comment)
We should add a .devcontainer/
json config to allow easier development integration between vscode and docker:
https://code.visualstudio.com/docs/remote/containers
Probably best to do this after #36
Should be able to do this with a combination of Docker and make fairly shortly.
We should create a mailmap for microsoft alias -> github handles so that the commit logs map both identities to the same person.
We should revisit hooking up documentation generation via hugo to the CI pipelines at some point.
Also documenting how to do it even if it's still manual.
Originally posted by @bpkroth in #14 (comment)
There are several places where getcwd
is used to compose a path assuming that the script is executed from the source/Mlos.Python
directory or else used to create a temporary file.
For the first, we should change it to be relative to the file referencing it so that the script can be executed from a different directory.
For the second, we should be using the system provided get temp files to avoid security issues.
Right now, attempting to edit dotnet code inside vscode, launched from a WSL instance throws an error about a missing .net sdk.
In theory, we should be able to do the following:
# setup the environment to find the locally installed dotnet
. ./scripts/init.linux.sh
# start vscode and inherit those environment variables (especially PATH)
code .
Unfortunately, it seems that in WSL, vscode is launched really as a remote Windows process with a proxy server to the WSL environment, so those variables are not passed through.
microsoft/vscode-remote-release#1700
Either we should apply some of the suggested fixes in that issue to automatically set them up for the user, or just document how to install a dotnet sdk in the environment.
A third option I'd like to find time to do is to publish our docker images and provide a .devcontainer/
json for automatically letting vscode set itself up in a reasonable way to edit in the container with all the right bits already prepared.
Once bootstrapping support in the random forest is ported from ADO, we can change some of the defaults of the RF and simplify the BO notebook.
Mostly a placeholder for future work: we should add java support for code generation to talk to the external agent for tracking experiments and telemetry so that we can tune another common class of systems: distributed java applications.
One nice possibility here could be to integrate existing java language attributes so that the code generation process is a bit more native feeling (rather than the extra C# annotated structs that we currently have to support cross compiler C++).
We should publish the base build image portion used in the current base Dockerfile to avoid needing to execute all of the apt-get
commands each time.
This would also be useful for CI pipelines wanting to make use of those images to build/test.
per @byte-sculptor
A python unit test failed in Github Action pipeline, but it took a long time:
https://github.com/microsoft/MLOS/pull/29/checks?check_run_id=1061557146#step:9:41362
We should see about controlling that with some timeouts.
Do we have an issue to move the CI to conda?
Originally posted by @amueller in #79 (comment)
Calling
tools/bin/dotnet out/dotnet/source/Mlos.Agent.Server/obj/AnyCPU/Mlos.Agent.Server.dll anything_at_all
doesn't yield an error and anything_at_all
is silently ignored. I think we should error if there's anything we can't parse.
We should have a place to document the high level features and items we'd like to support.
There's a spot to link to this on the top level README.md right now, but it's currently a dead link.
Breaking out from #6:
It would be good to add code coverage checks and badges for that to the repo landing pages.
@edcthayer adding an issue to track this here
Seen this a couple of times recently. Sometimes with slightly different KeyErrors (e.g. medium_quadratic_params
instead).
Rerunning it seems to make it go away.
2020-09-22T15:46:14.7210731Z [10 rows x 5 columns]
2020-09-22T15:46:14.7211123Z raise_missing = True
2020-09-22T15:46:14.7211913Z self = <pandas.core.indexing._LocIndexer object at 0x0000021609A396D8>
2020-09-22T15:46:14.7212932Z File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\pandas\core\indexing.py", line 1646, in _validate_read_indexer
2020-09-22T15:46:14.7213796Z raise KeyError(f"{not_found} not in index")
2020-09-22T15:46:14.7214363Z ax = Index(['vertex_height', 'medium_quadratic_params.x_1',
2020-09-22T15:46:14.7214990Z 'medium_quadratic_params.x_2', 'low_quadratic_params.x_1',
2020-09-22T15:46:14.7215540Z 'low_quadratic_params.x_2'],
2020-09-22T15:46:14.7216039Z dtype='object')
2020-09-22T15:46:14.7216430Z axis = 1
2020-09-22T15:46:14.7216840Z indexer = array([ 0, 3, 4, 1, 2, -1, -1], dtype=int64)
2020-09-22T15:46:14.7217460Z key = Index(['vertex_height', 'low_quadratic_params.x_1', 'low_quadratic_params.x_2',
2020-09-22T15:46:14.7218181Z 'medium_quadratic_params.x_1', 'medium_quadratic_params.x_2',
2020-09-22T15:46:14.7218832Z 'high_quadratic_params.x_1', 'high_quadratic_params.x_2'],
2020-09-22T15:46:14.7219465Z dtype='object')
2020-09-22T15:46:14.7219832Z missing = 2
2020-09-22T15:46:14.7220333Z not_found = ['high_quadratic_params.x_1', 'high_quadratic_params.x_2']
2020-09-22T15:46:14.7220882Z raise_missing = True
2020-09-22T15:46:14.7221500Z self = <pandas.core.indexing._LocIndexer object at 0x0000021609A396D8>
2020-09-22T15:46:14.7222280Z KeyError: "['high_quadratic_params.x_1', 'high_quadratic_params.x_2'] not in index"
This will need some C++ wrapper functions and some build tweaks to consuming projects.
Current thought is to integrate with sqlite as an example.
currently the optimum
implementation of BayesianOptimizerProxy is empty.
we should use pytest as test runner
MLOS should be a bit more generic in its support of backend storage for the models, optimizers, experiments, etc. (e.g. not just SqlServer).
Some other potential targets include: mlflow with files, sqlite, mysql, postgres, etc.
Currently to build C++ code generation code we rely on clang
s support for MSVC attributes (e.g. to ignore duplicate definitions at link time).
There are multiple ways around this that would help enable gcc
support including reorganizing the code generation output or using some macros to add #ifdef
wrappers around the attributes usages.
Consider this as icing/wishlist for now.
We've noticed that troubleshooting, particularly where there is cross process/language waiting or lookups involved, could be added by some simple print statements to track the status of various operations (e.g. assembly lookup, synchronization points, etc.)
However, these statements are undesirable in a production environment so have been eschewed thus far.
We should implement a logging mechanism to make that configurable on a case by case basis.
Make sure the project builds and works easily in a Linux environment.
At the very least we should make sure that these notebooks continue to execute without throwing exceptions. At best we validate that their output is what we expected.
We should also assert that specific notebooks are checked in with outputs.
We can consider adding some .editorconfig
style entries to aid editors to follow style guidelines in addition to build-time lint checking.
Looking at the build docs it looks like there are three paths, docker, linux and windows:
https://microsoft.github.io/MLOS/documentation/02-Build/#docker
But docker is actually "just" how to install docker, so you still need to run the linux installation afterwards.
I think we should restructure the docs to make this more clear, or just say "now do the linux build" after the docker install.
We need to re-implement gated checkin pipelines on GitHub to ensure good quality.
We already do this for Python using sphinx.
It's already possible to output xml from the msbuild .csproj
files. Should be able to output either HTML directly or use another tool to help with that.
I think during #12 (https://github.com/microsoft/MLOS/pull/12/files#diff-758669e740e7424c31dfb6cf2edbd749) @amueller and I discussed making the instructions here https://github.com/microsoft/MLOS/blob/main/documentation/01-Prerequisites.md#install-python-dependencies consistent with the instructions at https://github.com/microsoft/MLOS#python-only-installation
Currently uncrustify is only run during the msbuild
tasks/pipelines. We should make it work for CMake and Linux as well.
We need some OSS examples to use both for initial experience and documentation purposes as well as CI/CD test integrations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.