Giter Club home page Giter Club logo

mlos's Issues

Reliance on docker is problematic on windows machines

Installing docker requires windows 10 2004 which is blocked on some machines and it might be tricky to install. I didn't manage on my workstation. This might need a workaround for doing this in a teaching environment.

Improve mlos library to support Python 3.8

Several of the modern tool installers default to Python 3.8. We should see what can be done to make the mlos python library work for that as well rather than pinning on 3.7 which complicates the install/setup process.

Enable -Wall for C++ builds on Linux

Current Linux builds don't enable -Wall (i.e. fail on all warnings).

This is generally good practice and would help us enable integration with more projects with fewer build issues.

Getting there may take some reorgs of the code generation output (e.g. to avoid duplicate definitions that are currently just tagged to ignore in the linker).

Execute Python long haul tests on a schedule

@byte-sculptor observed that we are currently missing the Python long haul tests.

We don't necessarily want these in the CI pipelines (they take too long), but we do want to execute them periodically.

To do that, we'll need a separate .github/workflows/scheduled.yml sort of file for the scheduled tasks.

Another task that we'll want to put in there are periodic docker image rebuilds (e.g. to catch security patches). See Also: #36

See Also: https://docs.github.com/en/free-pro-team@latest/actions/reference/events-that-trigger-workflows#scheduled-events

Python Unit Test Timeouts

Hmm, Python unit tests are still timing out. Something else might be going on than the Python unit tests just randomly taking a long time due to high-degree polynomials being chosen (which I think we already reduced).

Originally posted by @bpkroth in #66 (comment)

Create mailmap

We should create a mailmap for microsoft alias -> github handles so that the commit logs map both identities to the same person.

Reevaluate use of `getcwd` in python code

There are several places where getcwd is used to compose a path assuming that the script is executed from the source/Mlos.Python directory or else used to create a temporary file.

For the first, we should change it to be relative to the file referencing it so that the script can be executed from a different directory.

For the second, we should be using the system provided get temp files to avoid security issues.

Using vscode from WSL for dotnet editing requires additional setup work

Right now, attempting to edit dotnet code inside vscode, launched from a WSL instance throws an error about a missing .net sdk.

In theory, we should be able to do the following:

# setup the environment to find the locally installed dotnet
. ./scripts/init.linux.sh
# start vscode and inherit those environment variables (especially PATH)
code .

Unfortunately, it seems that in WSL, vscode is launched really as a remote Windows process with a proxy server to the WSL environment, so those variables are not passed through.

microsoft/vscode-remote-release#1700

Either we should apply some of the suggested fixes in that issue to automatically set them up for the user, or just document how to install a dotnet sdk in the environment.

A third option I'd like to find time to do is to publish our docker images and provide a .devcontainer/ json for automatically letting vscode set itself up in a reasonable way to edit in the container with all the right bits already prepared.

Add Java support

Mostly a placeholder for future work: we should add java support for code generation to talk to the external agent for tracking experiments and telemetry so that we can tune another common class of systems: distributed java applications.

One nice possibility here could be to integrate existing java language attributes so that the code generation process is a bit more native feeling (rather than the extra C# annotated structs that we currently have to support cross compiler C++).

Publish docker build images

We should publish the base build image portion used in the current base Dockerfile to avoid needing to execute all of the apt-get commands each time.

This would also be useful for CI pipelines wanting to make use of those images to build/test.

Add ROADMAP.md

We should have a place to document the high level features and items we'd like to support.

There's a spot to link to this on the top level README.md right now, but it's currently a dead link.

Add code coverage checks

Breaking out from #6:

It would be good to add code coverage checks and badges for that to the repo landing pages.

test_lasso_hierarchical_categorical_predictions seems flaky

@edcthayer adding an issue to track this here

Seen this a couple of times recently. Sometimes with slightly different KeyErrors (e.g. medium_quadratic_params instead).
Rerunning it seems to make it go away.

2020-09-22T15:46:14.7210731Z [10 rows x 5 columns]
2020-09-22T15:46:14.7211123Z     raise_missing = True
2020-09-22T15:46:14.7211913Z     self = <pandas.core.indexing._LocIndexer object at 0x0000021609A396D8>
2020-09-22T15:46:14.7212932Z   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\pandas\core\indexing.py", line 1646, in _validate_read_indexer
2020-09-22T15:46:14.7213796Z     raise KeyError(f"{not_found} not in index")
2020-09-22T15:46:14.7214363Z     ax = Index(['vertex_height', 'medium_quadratic_params.x_1',
2020-09-22T15:46:14.7214990Z        'medium_quadratic_params.x_2', 'low_quadratic_params.x_1',
2020-09-22T15:46:14.7215540Z        'low_quadratic_params.x_2'],
2020-09-22T15:46:14.7216039Z       dtype='object')
2020-09-22T15:46:14.7216430Z     axis = 1
2020-09-22T15:46:14.7216840Z     indexer = array([ 0,  3,  4,  1,  2, -1, -1], dtype=int64)
2020-09-22T15:46:14.7217460Z     key = Index(['vertex_height', 'low_quadratic_params.x_1', 'low_quadratic_params.x_2',
2020-09-22T15:46:14.7218181Z        'medium_quadratic_params.x_1', 'medium_quadratic_params.x_2',
2020-09-22T15:46:14.7218832Z        'high_quadratic_params.x_1', 'high_quadratic_params.x_2'],
2020-09-22T15:46:14.7219465Z       dtype='object')
2020-09-22T15:46:14.7219832Z     missing = 2
2020-09-22T15:46:14.7220333Z     not_found = ['high_quadratic_params.x_1', 'high_quadratic_params.x_2']
2020-09-22T15:46:14.7220882Z     raise_missing = True
2020-09-22T15:46:14.7221500Z     self = <pandas.core.indexing._LocIndexer object at 0x0000021609A396D8>
2020-09-22T15:46:14.7222280Z KeyError: "['high_quadratic_params.x_1', 'high_quadratic_params.x_2'] not in index"

From:
https://pipelines.actions.githubusercontent.com/fOuLpdRLhJegHdOkuie9qUN2ZnNM5WKTsTmQkIMYbUKwUvkp3o/_apis/pipelines/1/runs/246/signedlogcontent/14?urlExpires=2020-09-22T16%3A10%3A29.3461235Z&urlSigningMethod=HMACV1&urlSignature=EIMbhzZxH5LmY8MLhX91%2FWJ9h02Ut6Typi2flZjOW5Q%3D

Add C support

This will need some C++ wrapper functions and some build tweaks to consuming projects.

Current thought is to integrate with sqlite as an example.

Add support for alternative backend storage

MLOS should be a bit more generic in its support of backend storage for the models, optimizers, experiments, etc. (e.g. not just SqlServer).

Some other potential targets include: mlflow with files, sqlite, mysql, postgres, etc.

Enable gcc support

Currently to build C++ code generation code we rely on clangs support for MSVC attributes (e.g. to ignore duplicate definitions at link time).

There are multiple ways around this that would help enable gcc support including reorganizing the code generation output or using some macros to add #ifdef wrappers around the attributes usages.

Consider this as icing/wishlist for now.

Add logging infrastructure

We've noticed that troubleshooting, particularly where there is cross process/language waiting or lookups involved, could be added by some simple print statements to track the status of various operations (e.g. assembly lookup, synchronization points, etc.)

However, these statements are undesirable in a production environment so have been eschewed thus far.

We should implement a logging mechanism to make that configurable on a case by case basis.

Add linux support

Make sure the project builds and works easily in a Linux environment.

Add tests for IPython Notebooks

At the very least we should make sure that these notebooks continue to execute without throwing exceptions. At best we validate that their output is what we expected.

We should also assert that specific notebooks are checked in with outputs.

Add .editorconfigs

We can consider adding some .editorconfig style entries to aid editors to follow style guidelines in addition to build-time lint checking.

Publish C# API documentation from comments

We already do this for Python using sphinx.
It's already possible to output xml from the msbuild .csproj files. Should be able to output either HTML directly or use another tool to help with that.

Add OSS Examples

We need some OSS examples to use both for initial experience and documentation purposes as well as CI/CD test integrations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.