mlflow / mlflow-export-import Goto Github PK

License: Apache License 2.0

Python 90.10% HTML 9.10% Shell 0.80%

mlflow-export-import's Introduction

MLflow: A Machine Learning Lifecycle Platform

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are:

MLflow Tracking: An API to log parameters, code, and results in machine learning experiments and compare them using an interactive UI.
MLflow Projects: A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others.
MLflow Models: A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML and AWS SageMaker.
MLflow Model Registry: A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models.

Packages

PyPI
conda-forge
CRAN
Maven Central

Job Statuses

Installing

Install MLflow from PyPI via pip install mlflow

MLflow requires conda to be on the PATH for the projects feature.

Nightly snapshots of MLflow master are also available here.

Install a lower dependency subset of MLflow from PyPI via pip install mlflow-skinny Extra dependencies can be added per desired scenario. For example, pip install mlflow-skinny pandas numpy allows for mlflow.pyfunc.log_model support.

Documentation

Official documentation for MLflow can be found at https://mlflow.org/docs/latest/index.html.

Roadmap

The current MLflow Roadmap is available at https://github.com/mlflow/mlflow/milestone/3. We are seeking contributions to all of our roadmap items with the help wanted label. Please see the Contributing section for more information.

Community

For help or questions about MLflow usage (e.g. "how do I do X?") see the docs or Stack Overflow.

To report a bug, file a documentation issue, or submit a feature request, please open a GitHub issue.

For release announcements and other discussions, please subscribe to our mailing list ([email protected]) or join us on Slack.

Running a Sample App With the Tracking API

The programs in examples use the MLflow Tracking API. For instance, run:

python examples/quickstart/mlflow_tracking.py

This program will use MLflow Tracking API, which logs tracking data in ./mlruns. This can then be viewed with the Tracking UI.

Launching the Tracking UI

The MLflow Tracking UI will show runs logged in ./mlruns at http://localhost:5000. Start it with:

mlflow ui

Note: Running mlflow ui from within a clone of MLflow is not recommended - doing so will run the dev UI from source. We recommend running the UI from a different working directory, specifying a backend store via the --backend-store-uri option. Alternatively, see instructions for running the dev UI in the contributor guide.

Running a Project from a URI

The mlflow run command lets you run a project packaged with a MLproject file from a local path or a Git URI:

mlflow run examples/sklearn_elasticnet_wine -P alpha=0.4

mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.4

See examples/sklearn_elasticnet_wine for a sample project with an MLproject file.

Saving and Serving Models

To illustrate managing models, the mlflow.sklearn package can log scikit-learn models as MLflow artifacts and then load them again for serving. There is an example training application in examples/sklearn_logistic_regression/train.py that you can run as follows:

$ python examples/sklearn_logistic_regression/train.py
Score: 0.666
Model saved in run <run-id>

$ mlflow models serve --model-uri runs:/<run-id>/model

$ curl -d '{"dataframe_split": {"columns":[0],"index":[0,1],"data":[[1],[-1]]}}' -H 'Content-Type: application/json'  localhost:5000/invocations

Note: If using MLflow skinny (pip install mlflow-skinny) for model serving, additional required dependencies (namely, flask) will need to be installed for the MLflow server to function.

Official MLflow Docker Image

The official MLflow Docker image is available on GitHub Container Registry at https://ghcr.io/mlflow/mlflow.

export CR_PAT=YOUR_TOKEN
echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin
# Pull the latest version
docker pull ghcr.io/mlflow/mlflow
# Pull 2.2.1
docker pull ghcr.io/mlflow/mlflow:v2.2.1

Contributing

We happily welcome contributions to MLflow. We are also seeking contributions to items on the MLflow Roadmap. Please see our contribution guide to learn more about contributing to MLflow.

Core Members

MLflow is currently maintained by the following core members with significant contributions from hundreds of exceptionally talented community members.

mlflow-export-import's People

Contributors

Stargazers

Watchers

mlflow-export-import's Issues

Add source fields for Registered Models and Model Version as tags in destination object for governance

Source information will be placed under the mlflow_export_import prefix in the destination object's tags.

For example, for the version object:

The source version field will be saved in the destination version object as tag mlflow_export_import.src.version.
The source creation_timestamp field will be saved in the destination version object as tag mlflow_export_import.src.creation_timestamp.

Create Databricks workflow API for running jobs

A much improved version of https://github.com/amesar/databricks-api-workflow for running Databricks jobs.

MLFlow Exception - Databricks Personal Access Token timeout

When importing a model using the Linux host, I am experiencing what appears to be a personal access token timeout.

When performing an import from my laptop I get the an error message (below) that seems to indicate the PAS has expired after approximately16 minutes. Have you seen this before? Are you aware of any limitations on the length of time you have to do an import with a PAS?

I have no issue with smaller models that take less than 15 mins.

_rfc/experiments/07674a88fd9c4982a563b6c14999e104/8f2df6fcfcc84f868dc7949f93b89958/artifacts/rfc_oversample/model.pkl': 'MlflowException('API request failed with exception 403 Client Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. for url: https://xxxxxxxxxxxxxxxxxxxxxxx.blob.core.windows.net/jobs/xxxxxxxxxx/mlflow-tracking/xxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/artifacts/rfc_oversample/model.pkl?sig=ukWLiFQE%2By630OJeCkfVVyy2nqW9w3j8p8%2Fz76GGfZQ%3D&se=2022-11-23T15%3A24%3A12Z&sv=2019-02-02&spr=https&sp=w&sr=b&comp=block&blockid=MTQ1MDc4ZTBmZDc4NGMwODlhZDA3M2IxMjZmNzBmYjU%3D. Response text: AuthenticationFailedServer failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:701f6451-501e-007b-0c4f-ffc0b1000000\nTime:2022-11-23T15:24:23.0258209ZSigned expiry time [Wed, 23 Nov 2022 15:24:12 GMT] must be after signed start time [Wed, 23 Nov 2022 15:24:23 GMT]')'

Automatically start MLflow tracking servers for mlflow-export-import tests

Create a script that launches the source and destination Mlflow tracking servers and tears them down after running tests. Major refactor to make tests more robust and reflect production use cases.

Exporting artifacts of a specific version of a model

Hi,

We would like to export the artifacts of a specific version of a model.

Unfortunately, it seems that this is not possible: the function export_model from the ModelExporter class first fetches all versions, and then iteratively exports the artifacts of each model version. However, we often only need to export a specific model version. Moreover, since our artifacts can become quite big, and each model can have a lot of versions, it can take a lot of time (and storage) to first export all models, only to migrate a specific version.

Is this functionality currently on the roadmap? Otherwise I'm happy to contribute .

Add pytest.fixture(scope="session") to tests

Use standard pytest fixtures for test.

Export only runs in ACTIVE LifeCycleStage

Currently the export run tool exports a run regardless of its LifeCycleStage. Should not export runs in DELETED stage.

Add tests the compare predictions of both source and imported model

Refactor Databricks tests to allow for both with and without source tags export

Currently the Databricks tests only test for export source tags. Add flag to test either with or without source tags export.

Remove import-metadata-tags option and argument from all files

Add Junit and HTML reports to tests

Add --junitxml=test_junit.xml and --html=test_report.html to run_tests.sh scripts.

Add CLI option to import MLflow objects from a file

Instead of just supporting comma delimited object names, import a list of names from a files. For example, --experiments experiments.txt.

Getting wrong value of databricks rest client

@amesar Great stuff this tool provides. I am in the process of setting up a script which exports a mlflow run from a databricks workspace to a location. I am stuck with this issue where it says Run "some value" not found. Here is the full screenshot of the issue and the relevant code :

I believe the error is regarding the databricks REST client. The value that I am getting of the REST client is not correct. My databricks workspace has a different address and here I am seeing default value. I am not sure how to setup the correct host value.

The run id exists and we can be certain that there are no issues on that end.
Looking forward to hearing from you on this issue.Thanks.

Refactor and improve mlflow_export_import.metadata tags into 3 sets of mlflow_export_import tags for ML governance

Currently mlflow_export_import.metadata tags are a bit of a bit bucket. These tags are useful for governance, provenance and auditing purposes for regulated industries such as finance and HLS (health case and life science) industries. See MLflow Export Import Source Run Tags - mlflow_export_import for full details.

Rename the top-level prefix to mlflow_export_import.metadata to mlflow_export_import with 3 sub-prefix groups.
Rationalize these tags (and add more source tags) into 3 groups:
- MLflow system tags. All source MLflow system tags starting with mlflow. will be saved under the mlflow_export_import.mlflow. prefix.
- RunInfo field tags. Source RunInfo fields are captured in tags starting with mlflow_export_import.run_info..
- Metadata tag. Tags indicating source export metadata information such as mlflow_export_import.metadata.tracking_uri.

Use config.yaml instead of environment variables for OSS tests

Model on the config.yaml of Databricks MLflow tests

Model export/import between two local MLflow instances

Is there a way to export/import runs or models using these scripts without databricks involved? I am trying to copy a model between two local instances of MLflow and it seems that I can't do that without triggering DatabricksHttpClient that requires proper host configuration.

Implement bulk import all

Implement

Related to: #11 - Make export_all work correctly

Change hyperlinks to MLflowClient documentation since the package was changed in 1.28.0

See: mlflow/mlflow#6458 - Implement HTTP 301 redirect for MLflow 1.28.0 change of mlflow.tracking.MlflowClient to mlflow.client.MlflowClient to preserve existing documentation hyperlinks.

Need to change the README*.md and Databricks notebooks with links.

Enhance mlflow-export-import tests to use two tracking servers

Currently the tests use one tracking server. When importing, we add a special prefix to the imported object (run, experiment or model) and compare it with the original source object. This is both clunky and not a true emulation of a real export import.

The goal is to launch two tracking servers - one for the source and one for the imported target objects.

The test suite will do the following:

Launch two tracking servers
Run tests against these servers
Tear down the two servers

Related to: Issue 5 - Add pytest.fixture(scope="session") to tests

Remove deprecated Copy object logic

Version Sequence: Sort version sequence in the export log

The latest_version field of the exported log has model versions in reversed order which is causing the version numbers to be not in sequence when importing as the import in the new workspace will assign a new version id. The solution is to sort the latest_version using the version_id in ascending order before saving the export log.

FIx tag and description bugs for export/import models

The description and tag fields for registered models and their versions are not correctly being exported/imported.

Adapt Databricks notebooks to new OSS mlflow-export-import repo

Two sub-issues:

Change pip-install path in the notebooks
Adapt notebooks to new API per issue #2 - Enhance mlflow-export-import tests to use two tracking servers.
Change the git repo attached to notebook.
Then check notebooks into https://github.com/mlflow/mlflow-export-import/tree/master/databricks_notebooks.

Add import bulk Databricks notebooks

Fix Databricks notebooks to run as jobs

dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().get doesn't work when you run a notebook as a tag so need put switch around it.

import-run fails when there are duplicate metric values with SQL backend store

Running import-run with the attached directory succeeds with MLFLOW_TRACKING_URI set to some local directory, but fails when it's set to a tracking server that is backed by SQL.

I get this error:

Options:
  input_dir: /tmp/mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a
  experiment_name: instinct
  mlmodel_fix: True
  use_src_user_id: False
  dst_notebook_dir: None
  dst_notebook_dir_add_run_id: None
in_databricks: False
importing_into_databricks: False
Importing run from '/tmp/mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a'
Traceback (most recent call last):
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 70, in _import_run
    self._import_run_data(src_run_dct, run_id, src_run_dct["info"]["user_id"])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 105, in _import_run_data
    run_data_importer.log_metrics(self.mlflow_client, run_dct, run_id, MAX_METRICS_PER_BATCH)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 38, in log_metrics
    _log_data(run_dct, run_id, batch_size, get_data, log_data)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 19, in _log_data
    log_data(run_id, batch)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 37, in log_data
    client.log_batch(run_id, metrics=metrics)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/client.py", line 1099, in log_batch
    self._tracking_client.log_batch(run_id, metrics, params, tags)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/_tracking_service/client.py", line 415, in log_batch
    self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 341, in log_batch
    self._call_endpoint(LogBatch, req_body)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 57, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 280, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 206, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(pymysql.err.IntegrityError) (1062, "Duplicate entry 'inside_3-16686312949540-0-557f04b0a45c4486b067a7e4205bb1b9-0-1' for key 'metrics.PRIMARY'")
[SQL: INSERT INTO metrics (`key`, value, timestamp, step, is_nan, run_uuid) VALUES (%(key)s, %(value)s, %(timestamp)s, %(step)s, %(is_nan)s, %(run_uuid)s)]
[parameters: ({'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994150, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994160, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994170, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994180, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994190, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0166667, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.016, 'timestamp': 16686312994210, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}  ... displaying 10 of 891 total bound parameter sets ...  {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949680, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949690, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
Traceback (most recent call last):
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 70, in _import_run
    self._import_run_data(src_run_dct, run_id, src_run_dct["info"]["user_id"])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 105, in _import_run_data
    run_data_importer.log_metrics(self.mlflow_client, run_dct, run_id, MAX_METRICS_PER_BATCH)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 38, in log_metrics
    _log_data(run_dct, run_id, batch_size, get_data, log_data)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 19, in _log_data
    log_data(run_id, batch)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 37, in log_data
    client.log_batch(run_id, metrics=metrics)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/client.py", line 1099, in log_batch
    self._tracking_client.log_batch(run_id, metrics, params, tags)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/_tracking_service/client.py", line 415, in log_batch
    self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 341, in log_batch
    self._call_endpoint(LogBatch, req_body)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 57, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 280, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 206, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(pymysql.err.IntegrityError) (1062, "Duplicate entry 'inside_3-16686312949540-0-557f04b0a45c4486b067a7e4205bb1b9-0-1' for key 'metrics.PRIMARY'")
[SQL: INSERT INTO metrics (`key`, value, timestamp, step, is_nan, run_uuid) VALUES (%(key)s, %(value)s, %(timestamp)s, %(step)s, %(is_nan)s, %(run_uuid)s)]
[parameters: ({'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994150, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994160, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994170, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994180, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994190, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0166667, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.016, 'timestamp': 16686312994210, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}  ... displaying 10 of 891 total bound parameter sets ...  {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949680, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949690, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a.tar.gz

Create Databricks notebook job tests

Basic smoke tests to ensure that the notebooks run with the latest API changes. Tests will run externally as jobs that invoke the notebooks.

SeeL https://github.com/mlflow/mlflow-export-import/tree/master/databricks_notebooks

HTTP status code: 400. Reason: Bad Request when importing experiments from Legacy workspace into new

Receiving errors importing experiments on a new, blank E2 workspace. The old, legacy workspace was exported successfully, but errors such as the following appear while importing:

Creating Databricks workspace directory '/Users/[email protected]/example'
Traceback (most recent call last):
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/bulk/import_experiments.py", line 15, in _import_experiment
    importer.import_experiment(exp_name, exp_input_dir)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/experiment/import_experiment.py", line 35, in import_experiment
    mlflow_utils.set_experiment(self.mlflow_client, self.dbx_client, exp_name)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/mlflow_utils.py", line 56, in set_experiment
    create_workspace_dir(dbx_client, os.path.dirname(exp_name))
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/mlflow_utils.py", line 97, in create_workspace_dir
    dbx_client.post("workspace/mkdirs", { "path": workspace_dir })
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/http_client.py", line 50, in post
    return json.loads(self._post(resource, data).text)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/http_client.py", line 46, in _post
    self._check_response(rsp,uri)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/http_client.py", line 63, in _check_response
    raise MlflowExportImportException(f"HTTP status code: {rsp.status_code}. Reason: {rsp.reason}. URI: {uri}. Params: {params}.")
mlflow_export_import.common.MlflowExportImportException: HTTP status code: 400. Reason: Bad Request. URI: https://my-workspace.cloud.databricks.com/api/2.0/workspace/mkdirs. Params: None.

Any way to see what the request is to determine why it is getting a 400? Could this be because the workspace is blank and the users do not exist yet?

To fetch experiments, use new MlflowClient.search_experiments() of MLflow 1.28.0

Change the Archive value to Archived to support export of Archived model

Replace ALL_STAGES = "Production,Staging,Archive,None" to
ALL_STAGES = "Production,Staging,Archived,None"

Use pre-commit to standardize code formatting

Request Summary

I would like to implement a tool like pre-commit to handle auto-code formatting and quality checks. This would be very helpful for onboarding new contributors.

Let me know if this is of interest I'm happy to help implement it.

As a contributor I would like:

A consistent code format across the repo
A way to enforce this code formatting on my own code without having to think too hard (see black)
A one time formatting of all existing code to match this style

Implementation Details

Step 1)

Add a new .pre-commit-config.yaml file at the root of the repo (I'll explain below what this does)

exclude: docs|.git|.tox
default_stages: [commit]
fail_fast: false

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
    -   id: check-yaml
    -   id: check-ast
    -   id: check-docstring-first
    -   id: check-merge-conflict
    -   id: mixed-line-ending

-   repo: https://github.com/timothycrosley/isort
    rev: 5.10.1
    hooks:
    -   id: isort
        args: [--profile, black]

-   repo: https://github.com/psf/black
    rev: 22.6.0
    hooks:
    -   id: black-jupyter

-   repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
    rev: v2.4.0
    hooks:
    -   id: pretty-format-yaml
        args: [--autofix, --indent, '4']
    -   id: pretty-format-ini
        args: [--autofix]
    -   id: pretty-format-toml
        args: [--autofix]

# sets up .pre-commit-ci.yaml to ensure pre-commit dependencies stay up to date
ci:
    autoupdate_schedule: weekly
    skip: []
    submodules: false

The above config file sets up a number of tools to run automatically on edited files when the git commit action is performed:

black python code autoformatting
isort python import sorting
More!
- Trailing whitespace cleanup
- Newline Adders
- Merge Conflict Checkers
- toml/yaml/ini autoformatting

Step 2)

Commit the above file and install pre-commit:

pip install pre-commit
pre-commit install
pre-commit autoupdate

Run a onetime code-cleanup of everything

pre-commit run --all-files

Step 3)

Push all of these changes up into GitHub. This can be a painful part of implementing a tool like pre-commit since there will be a massive diff - I recommend the original maintainer be the one to push those changes to retain git blame history.

Step 4)

Add some details for new contributors. I have an example here I try to re-use across GitHub: https://juftin.com/camply/contributing.html

Step 5)

Nothing, new contributors code will auto-format during commit and they'll learn an awesome tool while they're at it

Make export_all work correctly

Make export.sh work properly.

Notes:

Make the output directory the same as that of export_models.sh
Figure out how to correctly export all experiments/runs and have them correctly map to a registered model version's run_ud

Mlflow host or token is not configured correctly (Open-source)

Hi !
I'am wondering if i's possible to export/import mlflow (experiement, run, model, etc) with a MLflow with remote Tracking Server, backend and artifact stores. So without using Databricks.
If it's the case I have this error "Mlflow host or token is not configured correctly (Open-source)".
Someone can help me ? Thank you
Victor,

get_mlflow_host_token() does not load token in environment variable.

We want to use mlflow-export-import to migrate models between OOS tracking servers in an enterprise setting (at a bank). However, since our tracking servers are both behind oauth2 proxies, support for bearer tokens is essential for us to make it work.

I inspected the code and the reason for this is that the function def get_mlflow_host_token() does not actually loads the token from the environment variable, and hence returns none.

Is this on purpose? Otherwise., I created a PR that fixes the issue.

importing models with mlflow-artifacts: source

Importing models that have a source that starts "mlflow-artifacts:" fails. This is due to the fact that there is a file existence check override for "dbfs:" but not "mlflow-artifiacts:".

Databricks MLflow - When we click reproduce run, it is throwing an error. "Unexpected error when submitting form."

Customer have used this repo https://github.com/mlflow/mlflow-export-import, to migrate from ST shard to E2.

When they are trying to access the source notebook from MLflow tracking UI it is redirecting to the ST shard.

Add smoke tests with click CliRunner for all user-facing main programs

Basic smoke tests to ensure that the CLI main programs run with the latest API changes.

Add tests to check that registered model stages are correctly exported/imported.

For Databaricks notebook, add stage multiselect widget and tests.

Add all entrypoints to a single Click Group

Currently this package uses a number of entrypoints to access different functionality. Moving all of these Click Commands under a Singular Click group would be much easier.

I've already performed this work on an old fork of https://github.com/amesar/mlflow-export-import/. I'll open a PR on this repo instead.

Here's what it would look like:

❯ mlflow-export-import --help
Usage: mlflow-export-import [OPTIONS] COMMAND [ARGS]...

  MLflow Export / Import CLI: Command Line Interface

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  export-all          Export the entire tracking server All registered...
  export-experiment   Exports an experiment to a directory.
  export-experiments  Exports experiments to a directory.
  export-model        Export a registered model and all the experiment runs...
  export-models       Exports models and their versions' backing Run along...
  export-run          Exports a run to a directory.
  find-artifacts      Find artifacts that match a filename
  http-client         Interact with the MLflow Export/Import HTTP Client
  import-experiment   Import an experiment from a directory.
  import-experiments  Import a list of experiment from a directory.
  import-model        Import a registered model and all the experiment runs...
  import-models       Imports models and their experiments and runs.
  import-run          Imports a run from a directory.
  list-models         Lists all registered models.

Add internal links to README files

Runs imported but models not after using import-models

Hi, thank you so much for developing this package. I am trying to migrate experiments, runs, and models from our old mlflow server to our new one. I exported the models using export-models --output-dir mlflow_model_output_dir2 --models shortage_lgb_resource_count_3dp_all_21mth,shortage_lgb_open_duty_ct_3dp_all_21mth --export-source-tags True --export-all-runs True sucessfully. I then changed the mlflow uri to our new location and used import-models. The experiment and runs are all imported and when looking at the model registry the model names are both there. However, under the registry/model name there are no versions. In the experiment UI window there are no links to models, but I can find the artifacts along with the "register model" button if I click through the runs into a single runs details. I have attached screenshots of the original model registry for the model I am moving, the new model registry, and the output of the import-models run.

Transfer mlflow-export-import repo files to https://github.com/mlflow

Copy all the files with their history from https://github.com/amesar/mlflow-export-import. Issues cannot be copied over.

Add Sphinx documentation

Prevent duplicate active stages (Production, Staging) on multiple imports into same registered model

If you import versions into a model with multiple import-model calls, MLflow UI semantics of having just one active stage were not being honored. Duplicate active stages (Production, Staging) were observed, i.e. two or more Production stages which cannot occur in the UI.

Problem: In import_model.py, the archive_existing_versions argument in the MlflowClient.transition_model_version_stage() call, was not set (default is False).

Fix: Set the archive_existing_versions argument to True to avoid multiple active stages.

See:

MlflowClient.transition_model_version_stage() documentation page.
mlflow/mlflow#6414

Align bulk.export_experiments and bulk.export_models CLI options

Publish mlflow-export-import package to PyPI

As follow-up to this issue I think it would be very benefitial for the project to enable installation via pip using PyPi to integrate mlflow-export-import more easily into applications.

Make Databricks notebook widget names consistent

Make widget names consistent across notebooks.
Add integer count before each widget name to ensure proper visual ordering.
Make appropriate code changes in databricks_tester.py.

Add project description to PyPI mlflow-export-import-project

Update collection Databricks notebooks and check into git

Object of type MlflowExportImportException is not JSON serializable

When running import-all --input-dir <my_directory>

Traceback (most recent call last):
File "/databricks/python3/bin/import-all", line 8, in
sys.exit(main())
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/databricks/python3/lib/python3.8/site-packages/mlflow_export_import/bulk/import_models.py", line 117, in main
import_all(
File "/databricks/python3/lib/python3.8/site-packages/mlflow_export_import/bulk/import_models.py", line 76, in import_all
utils.write_json_file(fs, "import_report.json", dct)
File "/databricks/python3/lib/python3.8/site-packages/mlflow_export_import/utils.py", line 78, in write_json_file
fs.write(path, json.dumps(dct,indent=2)+"\n")
File "/usr/lib/python3.8/json/init.py", line 234, in dumps
return cls(
File "/usr/lib/python3.8/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/usr/lib/python3.8/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.8/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/usr/lib/python3.8/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type MlflowExportImportException is not JSON serializable