Giter Club home page Giter Club logo

mlflow's Introduction

MLflow: A Machine Learning Lifecycle Platform

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are:

  • MLflow Tracking: An API to log parameters, code, and results in machine learning experiments and compare them using an interactive UI.
  • MLflow Projects: A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others.
  • MLflow Models: A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML and AWS SageMaker.
  • MLflow Model Registry: A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models.

Latest Docs Apache 2 License Total Downloads Slack Account Twitter

Packages

PyPI PyPI - mlflow PyPI - mlflow-skinny
conda-forge Conda - mlflow Conda - mlflow-skinny
CRAN CRAN - mlflow
Maven Central Maven Central - mlflow-client Maven Central - mlflow-parent Maven Central - mlflow-scoring Maven Central - mlflow-spark

Job Statuses

Examples Action Status cross-version-tests r-devel test-requirements stale push-images

Installing

Install MLflow from PyPI via pip install mlflow

MLflow requires conda to be on the PATH for the projects feature.

Nightly snapshots of MLflow master are also available here.

Install a lower dependency subset of MLflow from PyPI via pip install mlflow-skinny Extra dependencies can be added per desired scenario. For example, pip install mlflow-skinny pandas numpy allows for mlflow.pyfunc.log_model support.

Documentation

Official documentation for MLflow can be found at https://mlflow.org/docs/latest/index.html.

Roadmap

The current MLflow Roadmap is available at https://github.com/mlflow/mlflow/milestone/3. We are seeking contributions to all of our roadmap items with the help wanted label. Please see the Contributing section for more information.

Community

For help or questions about MLflow usage (e.g. "how do I do X?") see the docs or Stack Overflow.

To report a bug, file a documentation issue, or submit a feature request, please open a GitHub issue.

For release announcements and other discussions, please subscribe to our mailing list ([email protected]) or join us on Slack.

Running a Sample App With the Tracking API

The programs in examples use the MLflow Tracking API. For instance, run:

python examples/quickstart/mlflow_tracking.py

This program will use MLflow Tracking API, which logs tracking data in ./mlruns. This can then be viewed with the Tracking UI.

Launching the Tracking UI

The MLflow Tracking UI will show runs logged in ./mlruns at http://localhost:5000. Start it with:

mlflow ui

Note: Running mlflow ui from within a clone of MLflow is not recommended - doing so will run the dev UI from source. We recommend running the UI from a different working directory, specifying a backend store via the --backend-store-uri option. Alternatively, see instructions for running the dev UI in the contributor guide.

Running a Project from a URI

The mlflow run command lets you run a project packaged with a MLproject file from a local path or a Git URI:

mlflow run examples/sklearn_elasticnet_wine -P alpha=0.4

mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.4

See examples/sklearn_elasticnet_wine for a sample project with an MLproject file.

Saving and Serving Models

To illustrate managing models, the mlflow.sklearn package can log scikit-learn models as MLflow artifacts and then load them again for serving. There is an example training application in examples/sklearn_logistic_regression/train.py that you can run as follows:

$ python examples/sklearn_logistic_regression/train.py
Score: 0.666
Model saved in run <run-id>

$ mlflow models serve --model-uri runs:/<run-id>/model

$ curl -d '{"dataframe_split": {"columns":[0],"index":[0,1],"data":[[1],[-1]]}}' -H 'Content-Type: application/json'  localhost:5000/invocations

Note: If using MLflow skinny (pip install mlflow-skinny) for model serving, additional required dependencies (namely, flask) will need to be installed for the MLflow server to function.

Official MLflow Docker Image

The official MLflow Docker image is available on GitHub Container Registry at https://ghcr.io/mlflow/mlflow.

export CR_PAT=YOUR_TOKEN
echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin
# Pull the latest version
docker pull ghcr.io/mlflow/mlflow
# Pull 2.2.1
docker pull ghcr.io/mlflow/mlflow:v2.2.1

Contributing

We happily welcome contributions to MLflow. We are also seeking contributions to items on the MLflow Roadmap. Please see our contribution guide to learn more about contributing to MLflow.

Core Members

MLflow is currently maintained by the following core members with significant contributions from hundreds of exceptionally talented community members.

mlflow's People

Contributors

aarondav avatar andrewmchen avatar ankit-db avatar annzhang-db avatar apurva-koti avatar arpitjasa-db avatar b-step62 avatar benwilson2 avatar chenmoneygithub avatar daniellok-db avatar dbczumar avatar dmatrix avatar gabrielfu avatar harupy avatar jdlesage avatar jerrylian-db avatar juntai-zheng avatar liangz1 avatar mateiz avatar michael-berk avatar mlflow-automation avatar mparkhe avatar prithvikannan avatar serena-ruan avatar shrinath-suresh avatar smurching avatar sueann avatar sunishsheth2009 avatar tomasatdatabricks avatar weichenxu123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlflow's Issues

Sagemaker usage

Describe the problem

FYI I already use Sagemaker in order to train and deploy my custom ML models there.
It's not clear to me how to deploy my model to Sagemaker. More specifically, how can I specify my predict function that will be called by the controller?
Additionally, it would be very useful to train my ML models on Sagemaker using the mlflow commands.

Proposal

I'd like to integrate Sagify (https://github.com/Kenza-AI/sagify, it's one of my open source projects) to mlflow so that to train and deploy ML models on Sagemaker. Please, check the workflow that I'd like to add to mlflow:

Proposal for Training and Deploying on SageMaker:

  1. mlflow sagemaker init -d src: This will create all boilerplate code under the directory src where all my ML code lives. My ML code under src uses already mlflow.
  2. Then, I need to implement two functions, train(...) and predict(input_json). The train(...) function should call my ML training logic that already lives under src, and I need to implement my transformer from JSON to an ML friendly data type in the predict(input_json) function.
  3. mlflow sagemaker build: It will build a Docker image that will contains all the code under src
  4. mlflow sagemaker push: It will push the built Docker image to ECS
  5. mlflow sagemaker local-train: It will the training logic that lives in the Docker image. This can be used for testing before trying on SageMaker
  6. mlflow sagemaker local-deploy: It will run the Docker image on deploy mode where I can test the rest endpoint that calls the trained model.
  7. mlflow sagemaker train: It will run the Docker image on train mode on SageMaker
  8. mlflow sagemaker deploy: It will run the Docker image on deploy mode on SageMaker

Proposal Only for Deploying on SageMaker:

  1. mlflow sagemaker init -d src: This will create all boilerplate code under the directory src where all my ML code lives. My ML code under src uses already mlflow.
  2. Then, I need to implement the functionpredict(input_json). I need to implement my transformer from JSON to an ML friendly data type in the predict(input_json) function.
  3. mlflow sagemaker build: It will build a Docker image that will contains all the code under src
  4. mlflow sagemaker push: It will push the built Docker image to ECS
  5. mlflow sagemaker local-deploy --model-path=<model_path> --run-id=<run_id>: It will run the Docker image on deploy mode where I can test the rest endpoint that calls the trained model.
  6. mlflow sagemaker train: It will run the Docker image on train mode on SageMaker
  7. mlflow sagemaker deploy --model-path=<model_path> --run-id=<run_id>: It will run the Docker image on deploy mode on SageMaker

Please, let me know about your thoughts. I'm thinking to proceed for the Proposal Only for Deploying on SageMaker.

Pyfunc and AzureML API docs missing

Currently running make inside of the docs dir does not generate API docs for the mlflow.pyfunc and mlflow.azureml modules.

Also, they are not currently hosted at mlflow.org

I believe this is because docs/source/python_api/mlflow.azureml and docs/source/python_api/mlflow.azureml are both missing the ".rst" suffix.

When I renamed those files by appending the ".rst" the API docs got built automatically as part of the docs website via the make command inside of the docs.

Note that even after this fix, like for SageMaker's API docs, the AzureML docs that get generated is just an empty page.

default local tracking uri not working on windows? "Tracking URI must be a local filesystem URI of the form 'file:///...' or a remote URI "

Was just testing on windows, had to force a tracking url for mlflow to work

the default for this case is:

mlflow.get_tracking_uri()
'd:\Code\mlflow\mlruns'

throws:
Exception: Tracking URI must be a local filesystem URI of the form 'file:///...' or a remote URI of the form 'http://...'. Please update the tracking URI via mlflow.set_tracking_uri

workaround:

mlflow.set_tracking_uri("file://d:/code/mlflow/mlruns")
mlflow.log_param("test",5)

seemed to work, mlruns folder now with the expected run info

thx!
(solid work, congrats, very promising,
ps-license would benefit from clarification on that api calls usage though)

Tracking datasets for each run

Not a bug, more of a feature request: it would be awesome if we could track the dataset (features) for each run/execution.

Unable to see mlflow ui at http://127.0.0.1:5000 when mlflow is running in docker container

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Docker container running in macOS Sierra 10.12.3
  • MLflow installed from (source or binary): Dockerfile
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.2.1
  • Python version: Python 3.6.5 :: Anaconda, Inc.
  • **npm version (if running the dev UI): 5.3.0
  • Exact command to reproduce:

mlflow ui

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in MLflow or a feature request.

We did a git clone of the mlflow repo and built a docker image from the Dockerfile. We spun up a docker container from that image. The container was up and running. We mapped the port to the default 5000. We tried opening a browser and reaching to "http://127.0.0.1:5000/". Unfortunately, we still could not get the dashboard.

Note: I am conversant with dockers, kubernetes, etc. Let me know how can help.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks,
please include the full traceback. Large logs and files should be attached.
Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Could not find valid Experiment ID when executing `mlflow run`

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS Sierra 10.12.6
  • MLflow installed from (source or binary): binary
  • MLflow version (run mlflow --version): mlflow, version 0.2.1
  • Python version: Python 3.6.6 :: Anaconda, Inc.
  • npm version (if running the dev UI): N.A
  • Exact command to reproduce:
(rr-sample-reg) ip-10-10-180-57:rr-sample-regression arinto$ mlflow experiments create life_sat_lr
Created experiment 'life_sat_lr' with id 1
(rr-sample-reg) ip-10-10-180-57:rr-sample-regression arinto$ export MLFLOW_EXPERIMENT_ID=1
(rr-sample-reg) ip-10-10-180-57:rr-sample-regression arinto$ mlflow run /Users/arinto/repository/github/rr-sample-regression -e main -P lr_feature="Employment rate as pct"

Describe the problem

I try to organize runs into experiment following this documentation but mlflow run shows exception. I'm not using tracking server at the moment.

Source code / logs

Exception stack from mlflow run

(rr-sample-reg) ip-10-10-180-57:rr-sample-regression arinto$ mlflow run /Users/arinto/repository/github/rr-sample-regression -e main -P lr_feature="Employment rate as pct"
=== Fetching project from /Users/arinto/repository/github/rr-sample-regression ===
=== Work directory for this run: /Users/arinto/repository/github/rr-sample-regression ===
=== Created directory /var/folders/fd/h7tg23rd2p3cx7mnp7j47_gw0000gn/T/tmpyzpx3_w9 for downloading remote URIs passed to arguments of type 'path' ===
Traceback (most recent call last):
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/bin/mlflow", line 11, in <module>
    sys.exit(cli())
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/cli.py", line 108, in run
    storage_dir=storage_dir)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/projects.py", line 285, in run
    storage_dir=storage_dir, git_username=git_username, git_password=git_password)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/projects.py", line 248, in _run_local
    _run_project(project, entry_point, work_dir, parameters, use_conda, storage_dir, experiment_id)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/projects.py", line 415, in _run_project
    source_type=SourceType.PROJECT)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/tracking/__init__.py", line 257, in start_run
    entry_point_name, source_type)
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/tracking/__init__.py", line 222, in _do_start_run
    source_version=(source_version or _get_source_version()), tags=[])
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/store/file_store.py", line 150, in create_run
    if self.get_experiment(experiment_id) is None:
  File "/Users/arinto/anaconda3/envs/rr-sample-reg/lib/python3.6/site-packages/mlflow/store/file_store.py", line 116, in get_experiment
    raise Exception("Could not find experiment with ID %s" % experiment_id)
Exception: Could not find experiment with ID 1

Feature request: allow user to control order of parameters in tracking UI

Describe the problem

There are often natural ways to order parameters, e.g. to group related quantities. Currently the UI orders them alphabetically. It would be nice to be able to control the ordering, e.g. to match the order in which the logging happens or by providing a list in the desired order.

Python 3.6 import mlflow not working.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • MLflow installed from (source or binary): Binary
  • MLflow version (run mlflow --version): 0.2.1
  • Python version: 3.6
  • **npm version (if running the dev UI):
  • Exact command to reproduce: import mlflow

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in MLflow or a feature request.

Receiving the following traceback when trying to import mlflow into my project. I am using a python 3.6 virtualenv within Pycharm. Steps to reproduce:

  1. pip install mlflow
  2. from within a module: import mlflow

protobuf version number 3.6

Results are the same if mlflow is installed from within the Pycharm interpreter manager.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks,
please include the full traceback. Large logs and files should be attached.
Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/tom/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/181.5087.37/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 19, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/tom/Documents/uri/reinforcement-learning-compilers/.venv/lib/python3.6/site-packages/mlflow/__init__.py", line 4, in <module>
    import mlflow.projects as projects  # noqa
  File "/home/tom/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/181.5087.37/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 19, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/tom/Documents/uri/reinforcement-learning-compilers/.venv/lib/python3.6/site-packages/mlflow/projects.py", line 17, in <module>
    from mlflow.entities.param import Param
  File "/home/tom/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/181.5087.37/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 19, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/tom/Documents/uri/reinforcement-learning-compilers/.venv/lib/python3.6/site-packages/mlflow/entities/param.py", line 2, in <module>
    from mlflow.protos.service_pb2 import Param as ProtoParam
  File "/home/tom/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/181.5087.37/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 19, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/tom/Documents/uri/reinforcement-learning-compilers/.venv/lib/python3.6/site-packages/mlflow/protos/service_pb2.py", line 18, in <module>
    from .scalapb import scalapb_pb2 as scalapb_dot_scalapb__pb2
  File "/home/tom/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/181.5087.37/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 19, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/tom/Documents/uri/reinforcement-learning-compilers/.venv/lib/python3.6/site-packages/mlflow/protos/scalapb/scalapb_pb2.py", line 25, in <module>
    dependencies=[google_dot_protobuf_dot_descriptor__pb2.DESCRIPTOR,])
TypeError: __new__() got an unexpected keyword argument 'serialized_options'

Several questions about Tracking UI

First of all, seems like "Date" is local time. If that is the case, it may cause confusing if having multiple timezone office.

For user, any plan to allowing provided as an parameter?

For storage, seems like it use local file as storage. Any roadmap/ plan to use database as storage media?

Cannot connect to ML Flow file for logging

Hi,

I am on windows, Python 3.5 anaconda installation.

I wanted to use Ml flow to log experiements.

When I am trying to use the script:

import os
from mlflow import log_metric, log_param, log_artifact
import mlflow as m

if name == "main":

# Set Tracking URI
m.set_tracking_uri("file:///C\\Users/hp/Desktop/All.txt")

# Log a parameter (key-value pair)
log_param("param1", 5)

# Log a metric; metrics can be updated throughout the run
log_metric("foo", 1)
log_metric("foo", 2)
log_metric("foo", 3)

# Log an artifact (output file)
with open("output.txt", "w") as f:
    f.write("Hello world!")
log_artifact("output.txt")

I get FileNotFoundError: [WinError 3] The system cannot find the path specified: '/C\Users\x0hp/Desktop/All.txt'

How do I use it?

I keep getting file errors like:

Exception: Tracking URI must be a local filesystem URI of the form 'file:///...' or a remote URI of the form 'http://...'. Please update the tracking URI via mlflow.set_tracking_uri

Or

OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '/C:/Users/hp/Desktop/All.txt'

mlflow ui - Compare Selected Parameters Issue

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux 4.14.47-1-MANJARO
  • MLflow installed from (source or binary): binary
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.1.0
  • Python version: Python 3.6.5 :: Anaconda custom (64-bit)
  • **npm version (if running the dev UI):
  • Exact command to reproduce:

Describe the problem

I ran example/tutorial/train.py as described in the tutorials multiple times with different combinations of parameters. Then I launched mlflow ui and was playing around with it. When I select multiple Runs and click on Compare Selected, I am getting same Parameters across the runs though they are different on the previous screen.
mlflow_ui

Unable to see mlflow ui at http://127.0.0.1:5000 on macOS Sierra version 10.12.3

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Sierra version 10.12.3
  • MLflow installed from (source or binary): binary . (I did "pip3 install mlflow")
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.2.1
  • Python version: Python 3.6.5 :: Anaconda, Inc.
  • **npm version (if running the dev UI): 5.3.0
  • Exact command to reproduce:

"mlflow ui"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in MLflow or a feature request.

Here is the terminal console logs:

$ mlflow ui
[2018-07-01 23:35:56 -0700] [3302] [INFO] Starting gunicorn 19.8.1
[2018-07-01 23:35:56 -0700] [3302] [INFO] Listening at: http://127.0.0.1:5000 (3302)
[2018-07-01 23:35:56 -0700] [3302] [INFO] Using worker: sync
[2018-07-01 23:35:56 -0700] [3305] [INFO] Booting worker with pid: 3305
[2018-07-01 23:46:16 -0700] [3302] [CRITICAL] WORKER TIMEOUT (pid:3305)
[2018-07-01 23:46:16 -0700] [3305] [INFO] Worker exiting (pid: 3305)
[2018-07-01 23:46:16 -0700] [3402] [INFO] Booting worker with pid: 3402

When I opened a browser and tried reaching to "http://127.0.0.1:5000/". I dont see the dashboard of all my runs. I only see "The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again."

It took some time to see the [CRITICAL] WORKER TIMEOUT error in the logs. My hunch is, the npm module, which serves the http://127.0.0.1:5000 request timeout as it is not up. Hence the error.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks,
please include the full traceback. Large logs and files should be attached.
Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Running in production

How does one go about running a production server? The startup message when launching the UI warns to not use mlflow ui in production:

 * Serving Flask app "mlflow.server" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Could you provide an example on best practices/what you would recommend? I'll be running an NGINX proxy in front of mlflow by default, but that's just meant to handle authentication.

Compiling service.proto

Hi,
currently I am trying to implement a functionality for getting an experiment by name for RestStore object. I figured that some work should be done in protobuf files. However, when I try to compile service.proto I get the following error:

scalapb/scalapb.proto: File not found.
databricks.proto: File not found.
service.proto: Import "scalapb/scalapb.proto" was not found or had errors.
service.proto: Import "databricks.proto" was not found or had errors.

Is there a specific reason why those 2 files are missing?

Thanks in advance :)

databricks.proto error when importing MLflow after tensorflow

When I try to import MLflow in my project, I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-d5dc4544d486> in <module>()
      9 from keras.optimizers import SGD
     10 from keras.preprocessing.image import ImageDataGenerator
---> 11 import mlflow
     12 
     13 import numpy as np

/usr/local/lib/python3.5/dist-packages/mlflow/__init__.py in <module>()
      6 
      7 # pylint: disable=wrong-import-position
----> 8 import mlflow.projects as projects # noqa
      9 import mlflow.tracking as tracking  # noqa
     10 

/usr/local/lib/python3.5/dist-packages/mlflow/projects.py in <module>()
     16 
     17 from mlflow.entities.source_type import SourceType
---> 18 from mlflow.entities.param import Param
     19 from mlflow import data
     20 import mlflow.tracking as tracking

/usr/local/lib/python3.5/dist-packages/mlflow/entities/param.py in <module>()
      1 from mlflow.entities._mlflow_object import _MLflowObject
----> 2 from mlflow.protos.service_pb2 import Param as ProtoParam
      3 
      4 
      5 class Param(_MLflowObject):

/usr/local/lib/python3.5/dist-packages/mlflow/protos/service_pb2.py in <module>()
     18 
     19 from mlflow.protos.scalapb import scalapb_pb2 as scalapb_dot_scalapb__pb2
---> 20 import mlflow.protos.databricks_pb2 as databricks__pb2
     21 
     22 

/usr/local/lib/python3.5/dist-packages/mlflow/protos/databricks_pb2.py in <module>()
     25   serialized_pb=_b('\n\x10\x64\x61tabricks.proto\x1a google/protobuf/descriptor.proto\x1a\x15scalapb/scalapb.proto\"\x9a\x01\n\x14\x44\x61tabricksRpcOptions\x12 \n\tendpoints\x18\x01 \x03(\x0b\x32\r.HttpEndpoint\x12\x1f\n\nvisibility\x18\x02 \x01(\x0e\x32\x0b.Visibility\x12\x1f\n\x0b\x65rror_codes\x18\x03 \x03(\x0e\x32\n.ErrorCode\x12\x1e\n\nrate_limit\x18\x04 \x01(\x0b\x32\n.RateLimit\"N\n\x0cHttpEndpoint\x12\x14\n\x06method\x18\x01 \x01(\t:\x04POST\x12\x0c\n\x04path\x18\x02 \x01(\t\x12\x1a\n\x05since\x18\x03 \x01(\x0b\x32\x0b.ApiVersion\"*\n\nApiVersion\x12\r\n\x05major\x18\x01 \x01(\x05\x12\r\n\x05minor\x18\x02 \x01(\x05\"@\n\tRateLimit\x12\x11\n\tmax_burst\x18\x01 \x01(\x03\x12 \n\x18max_sustained_per_second\x18\x02 \x01(\x03\"\x8c\x01\n\x15\x44ocumentationMetadata\x12\x11\n\tdocstring\x18\x01 \x01(\t\x12\x10\n\x08lead_doc\x18\x02 \x01(\t\x12\x1f\n\nvisibility\x18\x03 \x01(\x0e\x32\x0b.Visibility\x12\x1b\n\x13original_proto_path\x18\x04 \x03(\t\x12\x10\n\x08position\x18\x05 \x01(\x05\"g\n\x1f\x44\x61tabricksServiceExceptionProto\x12\x1e\n\nerror_code\x18\x01 \x01(\x0e\x32\n.ErrorCode\x12\x0f\n\x07message\x18\x02 \x01(\t\x12\x13\n\x0bstack_trace\x18\x03 \x01(\t*?\n\nVisibility\x12\n\n\x06PUBLIC\x10\x01\x12\x0c\n\x08INTERNAL\x10\x02\x12\x17\n\x13PUBLIC_UNDOCUMENTED\x10\x03*\xf6\x04\n\tErrorCode\x12\x12\n\x0eINTERNAL_ERROR\x10\x01\x12\x1b\n\x17TEMPORARILY_UNAVAILABLE\x10\x02\x12\x0c\n\x08IO_ERROR\x10\x03\x12\x0f\n\x0b\x42\x41\x44_REQUEST\x10\x04\x12\x1c\n\x17INVALID_PARAMETER_VALUE\x10\xe8\x07\x12\x17\n\x12\x45NDPOINT_NOT_FOUND\x10\xe9\x07\x12\x16\n\x11MALFORMED_REQUEST\x10\xea\x07\x12\x12\n\rINVALID_STATE\x10\xeb\x07\x12\x16\n\x11PERMISSION_DENIED\x10\xec\x07\x12\x15\n\x10\x46\x45\x41TURE_DISABLED\x10\xed\x07\x12\x1a\n\x15\x43USTOMER_UNAUTHORIZED\x10\xee\x07\x12\x1b\n\x16REQUEST_LIMIT_EXCEEDED\x10\xef\x07\x12\x1d\n\x18INVALID_STATE_TRANSITION\x10\xd1\x0f\x12\x1b\n\x16\x43OULD_NOT_ACQUIRE_LOCK\x10\xd2\x0f\x12\x1c\n\x17RESOURCE_ALREADY_EXISTS\x10\xb9\x17\x12\x1c\n\x17RESOURCE_DOES_NOT_EXIST\x10\xba\x17\x12\x13\n\x0eQUOTA_EXCEEDED\x10\xa1\x1f\x12\x1c\n\x17MAX_BLOCK_SIZE_EXCEEDED\x10\xa2\x1f\x12\x1b\n\x16MAX_READ_SIZE_EXCEEDED\x10\xa3\x1f\x12\x13\n\x0e\x44RY_RUN_FAILED\x10\x89\'\x12\x1c\n\x17RESOURCE_LIMIT_EXCEEDED\x10\x8a\'\x12\x18\n\x13\x44IRECTORY_NOT_EMPTY\x10\xf1.\x12\x18\n\x13\x44IRECTORY_PROTECTED\x10\xf2.\x12\x1f\n\x1aMAX_NOTEBOOK_SIZE_EXCEEDED\x10\xf3.:@\n\nvisibility\x12\x1d.google.protobuf.FieldOptions\x18\xd0\x86\x03 \x01(\x0e\x32\x0b.Visibility::\n\x11validate_required\x12\x1d.google.protobuf.FieldOptions\x18\xd1\x86\x03 \x01(\x08:4\n\x0bjson_inline\x12\x1d.google.protobuf.FieldOptions\x18\xd2\x86\x03 \x01(\x08:1\n\x08json_map\x12\x1d.google.protobuf.FieldOptions\x18\xd3\x86\x03 \x01(\x08:J\n\tfield_doc\x12\x1d.google.protobuf.FieldOptions\x18\xd4\x86\x03 \x03(\x0b\x32\x16.DocumentationMetadata:D\n\x03rpc\x12\x1e.google.protobuf.MethodOptions\x18\xd0\x86\x03 \x01(\x0b\x32\x15.DatabricksRpcOptions:L\n\nmethod_doc\x12\x1e.google.protobuf.MethodOptions\x18\xd4\x86\x03 \x03(\x0b\x32\x16.DocumentationMetadata:N\n\x0bmessage_doc\x12\x1f.google.protobuf.MessageOptions\x18\xd4\x86\x03 \x03(\x0b\x32\x16.DocumentationMetadata:N\n\x0bservice_doc\x12\x1f.google.protobuf.ServiceOptions\x18\xd4\x86\x03 \x03(\x0b\x32\x16.DocumentationMetadata:H\n\x08\x65num_doc\x12\x1c.google.protobuf.EnumOptions\x18\xd4\x86\x03 \x03(\x0b\x32\x16.DocumentationMetadata:O\n\x15\x65num_value_visibility\x12!.google.protobuf.EnumValueOptions\x18\xd0\x86\x03 \x01(\x0e\x32\x0b.Visibility:S\n\x0e\x65num_value_doc\x12!.google.protobuf.EnumValueOptions\x18\xd4\x86\x03 \x03(\x0b\x32\x16.DocumentationMetadataB*\n#com.databricks.api.proto.databricks\xe2?\x02\x10\x01')
     26   ,
---> 27   dependencies=[google_dot_protobuf_dot_descriptor__pb2.DESCRIPTOR,scalapb_dot_scalapb__pb2.DESCRIPTOR,])
     28 
     29 _VISIBILITY = _descriptor.EnumDescriptor(

/usr/local/lib/python3.5/dist-packages/google/protobuf/descriptor.py in __new__(cls, name, package, options, serialized_pb, dependencies, public_dependencies, syntax, pool)
    827         # TODO(amauryfa): use the pool passed as argument. This will work only
    828         # for C++-implemented DescriptorPools.
--> 829         return _message.default_pool.AddSerializedFile(serialized_pb)
    830       else:
    831         return super(FileDescriptor, cls).__new__(cls)

TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "databricks.proto":
  databricks.proto: Import "scalapb/scalapb.proto" has not been loaded.

Any idea where the issue may come from? Thank you!

Compare to other ML e2e platforms

First of all congratulations for releasing all this hard work to the public!

I went through the examples to see if I would be able to figure out how exactly does this project differentiates from others but only saw some minor technical differences.

Could you provide a summary on why did you decide to create a complete new ML pipeline instead of joining some of the other ongoing efforts?

Metrics chart wrongly connects points on X axis

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • MLflow installed from (source or binary): source
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.1.0
  • Python version: 3.6.5
  • npm version (if running the dev UI): 6.0.1
  • Exact command to reproduce: NA

Describe the problem

In the UI at the metrics graph view, it looks like this for me:
image

The data points are logged at the end of an epoch using this Keras callback:

class LossHistory(Callback):
    def on_epoch_end(self, epoch, logs=None):
        log_metric("loss", logs.get('loss'))
        # in reality more metrics here

logs.get('loss') just returns a number. The connections of the dots in the interface should be connecting X=0 to X=1 to X=2 and so on. Why they are currently connected in this strange way, I have no clue.

the command of mlflow cann't set host

if i run mlflow ui in my server, then the bind ip will be 127.0.0.1, and i cann't access by my computer. Should the mlflow add the setting of --host, like flask etc.

Exceptions when using "mlflow run"

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.12.1
  • MLflow installed from (source or binary): pip install
  • MLflow version (run mlflow --version): 0.1.0
  • Python version: 2.7.13

Describe the problem

Got exceptions when I run "mlflow run working_directory_name "

Source code / logs

The command I used: "mlflow run ." to run the whole project as a mlflow Project, I am not sure if this is a bug. The log is as follows:

=== Fetching project from . ===
=== Work directory for this run: . ===
=== Created directory /var/folders/p_/tdkm1f5159vdght67bhbc8rh0000gn/T/tmpHes0sB for downloading remote URIs passed to arguments of type 'path' ===
Traceback (most recent call last):
File "/usr/local/bin/mlflow", line 11, in
sys.exit(cli())
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/mlflow/cli.py", line 117, in run
use_temp_cwd=new_dir, storage_dir=_encode(storage_dir))
File "/usr/local/lib/python2.7/site-packages/mlflow/projects.py", line 284, in run
storage_dir=storage_dir)
File "/usr/local/lib/python2.7/site-packages/mlflow/projects.py", line 247, in _run_local
_run_project(project, entry_point, work_dir, parameters, use_conda, storage_dir, experiment_id)
File "/usr/local/lib/python2.7/site-packages/mlflow/projects.py", line 367, in _run_project
(exit_code, _, stderr) = process.exec_cmd(["conda", "--help"], throw_on_error=False)
File "/usr/local/lib/python2.7/site-packages/mlflow/utils/process.py", line 38, in exec_cmd
cmd, env=cmd_env, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=cwd, **kwargs)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

When comparing results, parameters are incorrectly displayed

I have executed the example code example/tutorial/train.py three times, with three different inputs for alpha and l1_ratio.
When I select all three runs and compare them, the "Parameters" section shows the values used for the first run, repeated across all three columns.

Input:

Run alpha l1_ratio
1 2 2
2 5 1
3 3 3

Comparison results:
image

Windows displays the error: Oops! Something went wrong. If this error persists, please report an issue to our Github.

System information

  • OS Platform and Distribution :Windows 10
  • MLflow installed from (source or binary): PIP installed in terminal-pip install mlflow
  • **MLflow version : Alpha
  • Python version: 3.6.5

Describe the problem

Whenever I try to view the artifact/MLmodel and model.pkl in UI ,"Oops! Something went wrong. If this error persists, please report an issue to our Github" with Niagara picture on the screen shows up

Source code / logs

location where the resources reside:
C:\Users\myname\mlruns\0\3419ce0a40154cd0a378fd1b5d69d241\artifacts\model

full path of model as visible in UI -
/Users/myname/mlruns\0\3419ce0a40154cd0a378fd1b5d69d241\artifacts/model

How to start a mlflow remote server?

System information

  • python 2.7.5
  • MLflow installed from pip
  • MLflow version 0.1.0

Describe the problem

How to start a mlflow remote server? By default "mlflow ui" starts a mlflow server on 127.0.0.1:5000, which is hard coded in the code of the mlflow server. I cannot access the server from outside. Examples and documentations shows that server only runs in localhost. How could I configure mlflow server so that I can access the server remotely?

Python version should be enforced in python MLprojects

When using the mlflow tracking API, I may use a conda environment to enforce reproducibility. One potentially non-obvious dependency is the Python version I use (in particular, 2 vs 3), especially pertinent since the default pickling version changed such that pickled objects in python3 are not readable by python2.

We should strive to help users remember to specify their python version, potentially by printing a warning if they provide a conda env that specifies no python version. The only problem with this particular proposal is if we later leverage the conda environment for other languages (e.g., R), although if we use reticulate, then we may anyway need Python.

As an example, not remembering to include the Python version in our tutorial example made it difficult for @tomasatdatabricks and I to collaborate, as he had miniconda3 installed and I had miniconda2 installed, which only showed up at model deploy time.

Run quickstart example failed

While run python example\quickstart\test.py it reports error:

Exception: Tracking URI must be a local filesystem URI of the form 'file:///...' or a remote URI of the form 'http://...'. Please update the tracking URI via mlflow.set_tracking_uri

Haven't found document about update URI setting.

Feature request: Tracking server uri prefix for proxy

I want a feature similar to sacredboard's "sub-url":
I want to use mlflow tracking server with jupyterhub. Jupyterhub has a "Services" option, which handles proxying services. But it adds a prefix to the url of the requests ("/services/<service_name>").
Could be something like:
mlflow server ... --sub-url "/services/mlflow"
That will serve everything at http:/localhost:5000/services/mlflow/...

To do this, sacredboard uses ideas from here: http://blog.macuyiko.com/post/2016/fixing-flask-url_for-when-behind-mod_proxy.html
We can use it too

Logging metrics per epoch

System information

N/A

Describe the problem

As it stands, mlflow doesn't seem to be capable of logging metrics per epoch during training (e.g. validation metrics during training of big DL models). Are there plans to add support for this and graphing of those metrics (to produce learning curves)?

Many thanks for you excellent project!

EDIT: Looking at https://mlflow.org/docs/latest/tracking.html, the document states:

Key-value metrics where the value is numeric. Each metric can be updated throughout the course of the run (for example, to track how your model’s loss function is converging), and MLflow will record and let you visualize the metric’s full history.

Looking through the Python API I don't see how this is possible as there is no method to update the current epoch.

Error while using UI for tracking results

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.13.3
  • MLflow installed from (source or binary): using pip
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.1.0
  • Python version: 3.6
  • **npm version (if running the dev UI):
  • Exact command to reproduce:

Describe the problem

Hi, I receive the following error while opening the terminal for tracking results

Source code / logs

The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

Python 3.5 example/tutorial Json encoding error

When running mlflow run example/tutorial -P alpha=0.4 with python=3.5.3 there is a json encoding issue:

manuel@manuel mlflow (master) $ mlflow run example/tutorial -P alpha=0.4
=== Fetching project from example/tutorial ===
=== Work directory for this run: example/tutorial ===
=== Created directory /tmp/tmp8xm4tafv for downloading remote URIs passed to arguments of type 'path' ===
Traceback (most recent call last):
  File "/home/manuel/anaconda3/envs/test/bin/mlflow", line 11, in <module>
    sys.exit(cli())
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/mlflow/cli.py", line 117, in run
    use_temp_cwd=new_dir, storage_dir=_encode(storage_dir))
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/mlflow/projects.py", line 284, in run
    storage_dir=storage_dir)
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/mlflow/projects.py", line 247, in _run_local
    _run_project(project, entry_point, work_dir, parameters, use_conda, storage_dir, experiment_id)
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/site-packages/mlflow/projects.py", line 374, in _run_project
    env_names = [os.path.basename(env) for env in json.loads(stdout)['envs']]
  File "/home/manuel/anaconda3/envs/test/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

Bug: display of artifacts in UI doesn't refresh properly

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes, but I don't think that's relevant
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.13.5
  • MLflow installed from (source or binary): binary
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.1.0
  • Python version: 3.6.4
  • **npm version (if running the dev UI):
  • Exact command to reproduce:

Describe the problem

Bug: the first artifact I click on for a specific run in the UI displays properly, but then that artifact continues to be displayed when I click on other artifacts, until I collapse the "Artifacts" section and re-expand it.

bug in docs about Local deploy mode

Under the Local deploy mode instructions of https://mlflow.org/docs/latest/models.html it says:

Python function flavor can be deployed locally via mlflow.auzremlmodule as

  • serve deploys model as a local REST api server
  • predict uses the model to generate prediction for local csv file.

But I think it should say mlflow.pyfunc right? I'm happy to make the PR if I'm right.

mlflow run local project will fetch project from git

Following the example to run project with
mlflow run example/tutorial -P alpha=0.4

Got the following error:

cmdline: git remote add origin d:/Workspace/mlflow/example/tutorial
stderr: 'fatal: remote origin already exists.'

How to execute the local project?

mlflow run should ensure that mlflow is available inside the resultant conda environment

Any project which I run using mlflow run should presumably have mlflow installed. Currently, we might recommend that people establish a pip dependency in their conda environment to include mlflow. This comes with a few pitfalls:

  • There is no connection between the mlflow run version and the version of mlflow in the conda environment. This discrepancy might cause certain API incompatibility (for example if the wrapper script starts or does not start a run, as discussed in #82). Additionally, I may update mlflow but forget to update the conda environment files of all projects.
  • Our mlflow caching mechanism does not work if I have a pip dependency without a version. If the version is updated on pypi, our cache is invalid, but I would have to manually delete the mlflow-$sha conda environment in order to invalidate it.

We want users to be able to easily upgrade mlflow and not run into compatibility problems, so I would recommend that we automatically inject the mlflow version of the outer script into the conda environment, if it's not present (or if it's present without a version).

Clean API for getting experiment name

I was wondering, wouldn't it be nice to have a neat API for getting an experiment by its name ?
One way to do it is actually creating an experiment and then looking for an assigned ID in mlflow ui for a sequential run.
Another way is too use the suggestion in #68.
On the other hand Experiment has two unique identifiers and it feels intuitive to have clean API for obtaining it both by ID and by Name.
Obtaining by ID is there in AbstractStore class.
What do you think ? I have started initial work towards this issue #122 . Just wanted to hear your thoughts on this matter. Maybe I am missing something and it does not make sense to do it at all ?

Feature request : scala or R tracking api

Describe the problem

Is it planned to add Tracking API for other languages like R or Scala?
Will the MLFlow project be able to handle project in other languages too?

regards

TypeError: __init__() got an unexpected keyword argument 'file'

Hello !
So I just decided to try mlflow ! However after installing conda with the installer.sh and pip install mlflow it seems that it really work as intended.
I am on Arch Manjaro.

~ ❯ mlflow
Traceback (most recent call last):
  File "/usr/bin/mlflow", line 7, in <module>
    from mlflow.cli import cli
  File "/usr/lib/python3.6/site-packages/mlflow/__init__.py", line 8, in <module>
    import mlflow.projects as projects # noqa
  File "/usr/lib/python3.6/site-packages/mlflow/projects.py", line 18, in <module>
    from mlflow.entities.param import Param
  File "/usr/lib/python3.6/site-packages/mlflow/entities/param.py", line 2, in <module>
    from mlflow.protos.service_pb2 import Param as ProtoParam
  File "/usr/lib/python3.6/site-packages/mlflow/protos/service_pb2.py", line 127, in <module>
    options=None, file=DESCRIPTOR),
TypeError: __init__() got an unexpected keyword argument 'file'

How do I solve this ?

Retrieve experiment ID by name

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow):
    Slightly altered version of the tutorial snippet. See below.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    This is a feature request, so it should affect all installations of mlflow 0.1.0.
  • MLflow installed from (source or binary):
    pip3 install mlflow (version 0.1.0)
  • MLflow version (run python -c "from mlflow import version; print(version.version)"):
    0.1.0
  • Python version:
    3.6.2
  • **npm version (if running the dev UI):
    3.10.8
  • Exact command to reproduce:

if __name__ == "__main__":
    # Create Experiment
    exp_id = mlflow.create_experiment('mlucas Test')

    # Start run as a child of that experiment
    mlflow.start_run(experiment_id=exp_id)

    # Log a parameter (key-value pair)
    mlflow.log_param("param1", 5)

    # Log a metric; metrics can be updated throughout the run
    mlflow.log_metric("f1", 0.5)

Note that the function I'm requesting doesn't exist. The above code will run once (and create the experiment), but subsequent calls will raise an error because the experiment already exists. There does not appear to be a way to retrieve the experiment ID using the name given the existing mlflow functions.

Describe the problem

Assuming that I programmatically create experiments, I'd like to be able to programmatically retrieve them as well. Currently, it's very easy to create experiments with mlflow.create_experiment, but there does not appear to be a pythonic means of retrieving existing experiment IDs via the module using the experiment name.

To programmatically use an existing experiment by name, I expect either: the create_experiment to return the id of the existing project (less ideal) OR to call something like get_experiment_by_name to retrieve experiment metadata OR to have to call list_experiments and find the relevant experiment metadata by looping through the response.

Possible function call: mlflow.list_experiments()
Desired response:

[
{
"id": 0,
"name": "Default",
"artifact_location": "/Users/michael.lucas/mlflow/mlruns/0"
},
...
]

Possible function call: mlflow.get_experiment_by_name(name="mlucas Test")
Desired response:

[
{
"id": 1,
"name": "mlucas Test",
"artifact_location": "/Users/michael.lucas/mlflow/mlruns/1"
}
]
If no experiments exists, could raise a relevant exception.

Any thoughts on these two options? And if the first is an interesting endpoint in general, does the proposed response (JSON-ic list of dicts) make sense within the mlflow paradigm?

Thanks!
Michael

Docs versioning wrong (when building locally and at mlflow.org)

  • Docs in master should be generated with a version number > 0.1.0 since the API is expanding post 0.1.0 as development happens.
  • We probably want to be updating the docs published on mlflow.org with non-API changing fixes (e.g. fixing typos) after we cut a release, but we should discuss the exact workflow.
  • Right now the API docs on mlflow.org for 0.1.0 are not accurate. E.g. https://mlflow.org/docs/0.1.0/python_api/mlflow.tracking.html shows list_experiments() but should not since this function was added after the 0.1.0 release. We need to regenerate the docs for 0.1.0 and push those to mlflow.org

No module named pandas when " mlflow run"

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • MLflow installed from (source or binary): binary
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.1.0
  • Python version: 3.5.1
  • **npm version (if running the dev UI):
  • Exact command to reproduce:mlflow run example/tutorial -P alpha=0.5 --no-conda

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in MLflow or a feature request.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks,
please include the full traceback. Large logs and files should be attached.
Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

pandas 0.22.0 is installed in ubuntu16.04, it successfully run python example/tutorial/train.py:
python example/tutorial/train.py
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
RMSE: 0.82224284976
MAE: 0.627876141016
R2: 0.126787219728

but failed as below:
mlflow run example/tutorial -P alpha=0.5 --no-conda
=== Fetching project from example/tutorial ===
=== Work directory for this run: example/tutorial ===
=== Created directory /tmp/tmpigdg385u for downloading remote URIs passed to arguments of type 'path' ===
=== Running command: python train.py 0.5 0.1 ===
Traceback (most recent call last):
File "train.py", line 9, in
import pandas as pd
ImportError: No module named pandas
=== Run failed ===

double checked that pandas installed:
$ python
Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.

import pandas as pd

Artifacts location

Hi,
I can save my logs to dbfs via mlflow.set_tracking_uri("FileStore/foo/") inside databricks, afterwards I use the CLI to download everything including Artifacts to my local machine where mlflow ui is running

It displays all the experiment metrics fine but the Artifact Location is still set to dbfs and not to my local folder. Is there a way to change the Location?

Pyspark.ml with mlflow

Hi:
I have a problem that how can I use mlflow with pyspark.ml? Is there an example program? Thank you!

Compare Runs: the parameter appear to be same

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Win10
  • MLflow installed from (source or binary): pip
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.1.0
  • Python version: python 3.6.5
  • **npm version (if running the dev UI):
  • Exact command to reproduce:

Describe the problem

After selecting multi result to compare, the parameter part appear to be all the same.

Source code / logs

[UI] Comparing Runs Parameters are wrong in comparison screen

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 (64 bit)
  • MLflow installed from (source or binary): source
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.1.0
  • Python version: 3.6
  • **npm version (if running the dev UI): NA
  • Exact command to reproduce:
    Go to path mlflow\example\tutorial
  1. python train.py

  2. python train.py 0.4 0.3

  3. Select both run summary check box
    image

  4. Click Compare Selected runs

  5. Parameters (alpha, l1_ratio) are showing same (0.5 0.5) for both run. It should have been (0.4 0.3) for the second run
    image

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in MLflow or a feature request.
Compare results window is showing last checked run parameters for both the runs. (Refer Images attached)

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks,
please include the full traceback. Large logs and files should be attached.

NA

Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Same as "Exact command to reproduce section" above (Follow steps 1 to 5)

Not able to access tracking UI when installing from source

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • MLflow installed from (source or binary): Source
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): commit 80db6e5
  • Python version: 3.6.5
  • **npm version (if running the dev UI):
  • Exact command to reproduce: mflow ui

Describe the problem

I just do python setup.py install under the source directory and then run mlflow ui. However when I tried to access http://localhost:5000, I always got 404 error.

Source code / logs

cd mlflow
python setup.py install
cd ../..
mlflow ui

mlflow ui exception could be more explanatory

If you already have an existing Flask server running or bound to port 5000 and you forgot about it, a subsequent invocation of 'mlflow ui' on command fails with the exception:
Traceback (most recent call last):
File "/Users/jules/pyenv/vpyenv/bin/mlflow", line 11, in
sys.exit(cli())
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/mlflow/cli.py", line 131, in ui
mlflow.server._run_server(file_store, file_store, host, port, 1)
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/mlflow/server/init.py", line 48, in _run_server
env=env_map, stream_output=True)
File "/Users/jules/pyenv/vpyenv/lib/python2.7/site-packages/mlflow/utils/process.py", line 38, in exec_cmd
raise ShellCommandException("Non-zero exitcode: %s" % (exit_code))
mlflow.utils.process.ShellCommandException: Non-zero exitcode: 1

Better to extract the exception or root cause of the error: For example port 5000 already bound, or failed to connect to port 5000. Already in use.
Cheers

404 when executing `mlflow ui` from mlflow repos root folder

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS X 10.12.6 (Sierra)
  • MLflow installed from (source or binary): binary
  • MLflow version (run python -c "from mlflow import version; print(version.VERSION)"): 0.2.1
  • Python version: Python 3.6.6 :: Anaconda, Inc.
  • **npm version (if running the dev UI): N.A
  • Exact command to reproduce:
    1. git clone https://github.com/databricks/mlflow
    2. cd mlflow
    3. python example/tutorial/train.py to execute a training
    4. mlflow ui
    5. Open localhost:5000 in browser

Describe the problem

Localhost:5000 returns 404. Here's the screenshot:
screen shot 2018-06-29 at 10 26 57 am

In above case, the root folder of code is /Volume/tvlk-repo/trial/mlflow, and the mlruns folder generated by training execution is in /Volume/tvlk-repo/trial/mlflow/mlruns

Interestingly, when I change the other directory and execute mlflow ui, I can access the dashboard.
screen shot 2018-06-29 at 10 32 04 am

Source code / logs

Output from mlflow ui in command line:

(mlflow-exp) ip-10-10-177-11:mlflow arinto$ mlflow ui
[2018-06-29 10:18:46 +0800] [15782] [INFO] Starting gunicorn 19.8.1
[2018-06-29 10:18:46 +0800] [15782] [INFO] Listening at: http://127.0.0.1:5000 (15782)
[2018-06-29 10:18:46 +0800] [15782] [INFO] Using worker: sync
[2018-06-29 10:18:46 +0800] [15785] [INFO] Booting worker with pid: 15785
[2018-06-29 10:19:27 +0800] [15782] [CRITICAL] WORKER TIMEOUT (pid:15785)
[2018-06-29 10:19:27 +0800] [15785] [INFO] Worker exiting (pid: 15785)
[2018-06-29 10:19:28 +0800] [15794] [INFO] Booting worker with pid: 15794
[2018-06-29 10:27:25 +0800] [15782] [CRITICAL] WORKER TIMEOUT (pid:15794)
[2018-06-29 10:27:25 +0800] [15794] [INFO] Worker exiting (pid: 15794)
[2018-06-29 10:27:25 +0800] [15848] [INFO] Booting worker with pid: 15848

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.