Giter Club home page Giter Club logo

mlflow-export-import's Introduction

MLflow: A Machine Learning Lifecycle Platform

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are:

  • MLflow Tracking: An API to log parameters, code, and results in machine learning experiments and compare them using an interactive UI.
  • MLflow Projects: A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others.
  • MLflow Models: A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML and AWS SageMaker.
  • MLflow Model Registry: A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models.

Latest Docs Apache 2 License Total Downloads Slack Account Twitter

Packages

PyPI PyPI - mlflow PyPI - mlflow-skinny
conda-forge Conda - mlflow Conda - mlflow-skinny
CRAN CRAN - mlflow
Maven Central Maven Central - mlflow-client Maven Central - mlflow-parent Maven Central - mlflow-scoring Maven Central - mlflow-spark

Job Statuses

Examples Action Status cross-version-tests r-devel test-requirements stale push-images slow-tests website-e2e

Installing

Install MLflow from PyPI via pip install mlflow

MLflow requires conda to be on the PATH for the projects feature.

Nightly snapshots of MLflow master are also available here.

Install a lower dependency subset of MLflow from PyPI via pip install mlflow-skinny Extra dependencies can be added per desired scenario. For example, pip install mlflow-skinny pandas numpy allows for mlflow.pyfunc.log_model support.

Documentation

Official documentation for MLflow can be found at https://mlflow.org/docs/latest/index.html.

Roadmap

The current MLflow Roadmap is available at https://github.com/mlflow/mlflow/milestone/3. We are seeking contributions to all of our roadmap items with the help wanted label. Please see the Contributing section for more information.

Community

For help or questions about MLflow usage (e.g. "how do I do X?") see the docs or Stack Overflow.

To report a bug, file a documentation issue, or submit a feature request, please open a GitHub issue.

For release announcements and other discussions, please subscribe to our mailing list ([email protected]) or join us on Slack.

Running a Sample App With the Tracking API

The programs in examples use the MLflow Tracking API. For instance, run:

python examples/quickstart/mlflow_tracking.py

This program will use MLflow Tracking API, which logs tracking data in ./mlruns. This can then be viewed with the Tracking UI.

Launching the Tracking UI

The MLflow Tracking UI will show runs logged in ./mlruns at http://localhost:5000. Start it with:

mlflow ui

Note: Running mlflow ui from within a clone of MLflow is not recommended - doing so will run the dev UI from source. We recommend running the UI from a different working directory, specifying a backend store via the --backend-store-uri option. Alternatively, see instructions for running the dev UI in the contributor guide.

Running a Project from a URI

The mlflow run command lets you run a project packaged with a MLproject file from a local path or a Git URI:

mlflow run examples/sklearn_elasticnet_wine -P alpha=0.4

mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.4

See examples/sklearn_elasticnet_wine for a sample project with an MLproject file.

Saving and Serving Models

To illustrate managing models, the mlflow.sklearn package can log scikit-learn models as MLflow artifacts and then load them again for serving. There is an example training application in examples/sklearn_logistic_regression/train.py that you can run as follows:

$ python examples/sklearn_logistic_regression/train.py
Score: 0.666
Model saved in run <run-id>

$ mlflow models serve --model-uri runs:/<run-id>/model

$ curl -d '{"dataframe_split": {"columns":[0],"index":[0,1],"data":[[1],[-1]]}}' -H 'Content-Type: application/json'  localhost:5000/invocations

Note: If using MLflow skinny (pip install mlflow-skinny) for model serving, additional required dependencies (namely, flask) will need to be installed for the MLflow server to function.

Official MLflow Docker Image

The official MLflow Docker image is available on GitHub Container Registry at https://ghcr.io/mlflow/mlflow.

export CR_PAT=YOUR_TOKEN
echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin
# Pull the latest version
docker pull ghcr.io/mlflow/mlflow
# Pull 2.2.1
docker pull ghcr.io/mlflow/mlflow:v2.2.1

Contributing

We happily welcome contributions to MLflow. We are also seeking contributions to items on the MLflow Roadmap. Please see our contribution guide to learn more about contributing to MLflow.

Core Members

MLflow is currently maintained by the following core members with significant contributions from hundreds of exceptionally talented community members.

mlflow-export-import's People

Contributors

amesar avatar dbczumar avatar kriscon-db avatar mingyu89 avatar smurching avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlflow-export-import's Issues

MLFlow Exception - Databricks Personal Access Token timeout

When importing a model using the Linux host, I am experiencing what appears to be a personal access token timeout.

When performing an import from my laptop I get the an error message (below) that seems to indicate the PAS has expired after approximately16 minutes. Have you seen this before? Are you aware of any limitations on the length of time you have to do an import with a PAS?

I have no issue with smaller models that take less than 15 mins.

_rfc/experiments/07674a88fd9c4982a563b6c14999e104/8f2df6fcfcc84f868dc7949f93b89958/artifacts/rfc_oversample/model.pkl': 'MlflowException('API request failed with exception 403 Client Error: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. for url: https://xxxxxxxxxxxxxxxxxxxxxxx.blob.core.windows.net/jobs/xxxxxxxxxx/mlflow-tracking/xxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/artifacts/rfc_oversample/model.pkl?sig=ukWLiFQE%2By630OJeCkfVVyy2nqW9w3j8p8%2Fz76GGfZQ%3D&se=2022-11-23T15%3A24%3A12Z&sv=2019-02-02&spr=https&sp=w&sr=b&comp=block&blockid=MTQ1MDc4ZTBmZDc4NGMwODlhZDA3M2IxMjZmNzBmYjU%3D. Response text: AuthenticationFailedServer failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:701f6451-501e-007b-0c4f-ffc0b1000000\nTime:2022-11-23T15:24:23.0258209ZSigned expiry time [Wed, 23 Nov 2022 15:24:12 GMT] must be after signed start time [Wed, 23 Nov 2022 15:24:23 GMT]')'

Exporting artifacts of a specific version of a model

Hi,

We would like to export the artifacts of a specific version of a model.

Unfortunately, it seems that this is not possible: the function export_model from the ModelExporter class first fetches all versions, and then iteratively exports the artifacts of each model version. However, we often only need to export a specific model version. Moreover, since our artifacts can become quite big, and each model can have a lot of versions, it can take a lot of time (and storage) to first export all models, only to migrate a specific version.

Is this functionality currently on the roadmap? Otherwise I'm happy to contribute .

Getting wrong value of databricks rest client

@amesar Great stuff this tool provides. I am in the process of setting up a script which exports a mlflow run from a databricks workspace to a location. I am stuck with this issue where it says Run "some value" not found. Here is the full screenshot of the issue and the relevant code :
image

I believe the error is regarding the databricks REST client. The value that I am getting of the REST client is not correct. My databricks workspace has a different address and here I am seeing default value. I am not sure how to setup the correct host value.

The run id exists and we can be certain that there are no issues on that end.
Looking forward to hearing from you on this issue.Thanks.

Refactor and improve mlflow_export_import.metadata tags into 3 sets of mlflow_export_import tags for ML governance

Currently mlflow_export_import.metadata tags are a bit of a bit bucket. These tags are useful for governance, provenance and auditing purposes for regulated industries such as finance and HLS (health case and life science) industries. See MLflow Export Import Source Run Tags - mlflow_export_import for full details.

  • Rename the top-level prefix to mlflow_export_import.metadata to mlflow_export_import with 3 sub-prefix groups.
  • Rationalize these tags (and add more source tags) into 3 groups:
    • MLflow system tags. All source MLflow system tags starting with mlflow. will be saved under the mlflow_export_import.mlflow. prefix.
    • RunInfo field tags. Source RunInfo fields are captured in tags starting with mlflow_export_import.run_info..
    • Metadata tag. Tags indicating source export metadata information such as mlflow_export_import.metadata.tracking_uri.

Model export/import between two local MLflow instances

Is there a way to export/import runs or models using these scripts without databricks involved? I am trying to copy a model between two local instances of MLflow and it seems that I can't do that without triggering DatabricksHttpClient that requires proper host configuration.

Enhance mlflow-export-import tests to use two tracking servers

Currently the tests use one tracking server. When importing, we add a special prefix to the imported object (run, experiment or model) and compare it with the original source object. This is both clunky and not a true emulation of a real export import.

The goal is to launch two tracking servers - one for the source and one for the imported target objects.

The test suite will do the following:

  1. Launch two tracking servers
  2. Run tests against these servers
  3. Tear down the two servers

Related to: Issue 5 - Add pytest.fixture(scope="session") to tests

Version Sequence: Sort version sequence in the export log

The latest_version field of the exported log has model versions in reversed order which is causing the version numbers to be not in sequence when importing as the import in the new workspace will assign a new version id. The solution is to sort the latest_version using the version_id in ascending order before saving the export log.

import-run fails when there are duplicate metric values with SQL backend store

Running import-run with the attached directory succeeds with MLFLOW_TRACKING_URI set to some local directory, but fails when it's set to a tracking server that is backed by SQL.

I get this error:

Options:
  input_dir: /tmp/mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a
  experiment_name: instinct
  mlmodel_fix: True
  use_src_user_id: False
  dst_notebook_dir: None
  dst_notebook_dir_add_run_id: None
in_databricks: False
importing_into_databricks: False
Importing run from '/tmp/mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a'
Traceback (most recent call last):
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 70, in _import_run
    self._import_run_data(src_run_dct, run_id, src_run_dct["info"]["user_id"])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 105, in _import_run_data
    run_data_importer.log_metrics(self.mlflow_client, run_dct, run_id, MAX_METRICS_PER_BATCH)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 38, in log_metrics
    _log_data(run_dct, run_id, batch_size, get_data, log_data)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 19, in _log_data
    log_data(run_id, batch)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 37, in log_data
    client.log_batch(run_id, metrics=metrics)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/client.py", line 1099, in log_batch
    self._tracking_client.log_batch(run_id, metrics, params, tags)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/_tracking_service/client.py", line 415, in log_batch
    self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 341, in log_batch
    self._call_endpoint(LogBatch, req_body)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 57, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 280, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 206, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(pymysql.err.IntegrityError) (1062, "Duplicate entry 'inside_3-16686312949540-0-557f04b0a45c4486b067a7e4205bb1b9-0-1' for key 'metrics.PRIMARY'")
[SQL: INSERT INTO metrics (`key`, value, timestamp, step, is_nan, run_uuid) VALUES (%(key)s, %(value)s, %(timestamp)s, %(step)s, %(is_nan)s, %(run_uuid)s)]
[parameters: ({'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994150, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994160, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994170, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994180, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994190, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0166667, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.016, 'timestamp': 16686312994210, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}  ... displaying 10 of 891 total bound parameter sets ...  {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949680, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949690, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
Traceback (most recent call last):
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 70, in _import_run
    self._import_run_data(src_run_dct, run_id, src_run_dct["info"]["user_id"])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 105, in _import_run_data
    run_data_importer.log_metrics(self.mlflow_client, run_dct, run_id, MAX_METRICS_PER_BATCH)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 38, in log_metrics
    _log_data(run_dct, run_id, batch_size, get_data, log_data)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 19, in _log_data
    log_data(run_id, batch)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 37, in log_data
    client.log_batch(run_id, metrics=metrics)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/client.py", line 1099, in log_batch
    self._tracking_client.log_batch(run_id, metrics, params, tags)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/_tracking_service/client.py", line 415, in log_batch
    self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 341, in log_batch
    self._call_endpoint(LogBatch, req_body)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 57, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 280, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 206, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(pymysql.err.IntegrityError) (1062, "Duplicate entry 'inside_3-16686312949540-0-557f04b0a45c4486b067a7e4205bb1b9-0-1' for key 'metrics.PRIMARY'")
[SQL: INSERT INTO metrics (`key`, value, timestamp, step, is_nan, run_uuid) VALUES (%(key)s, %(value)s, %(timestamp)s, %(step)s, %(is_nan)s, %(run_uuid)s)]
[parameters: ({'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994150, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994160, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994170, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994180, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994190, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0166667, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.016, 'timestamp': 16686312994210, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}  ... displaying 10 of 891 total bound parameter sets ...  {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949680, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949690, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a.tar.gz

HTTP status code: 400. Reason: Bad Request when importing experiments from Legacy workspace into new

Receiving errors importing experiments on a new, blank E2 workspace. The old, legacy workspace was exported successfully, but errors such as the following appear while importing:

Creating Databricks workspace directory '/Users/[email protected]/example'
Traceback (most recent call last):
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/bulk/import_experiments.py", line 15, in _import_experiment
    importer.import_experiment(exp_name, exp_input_dir)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/experiment/import_experiment.py", line 35, in import_experiment
    mlflow_utils.set_experiment(self.mlflow_client, self.dbx_client, exp_name)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/mlflow_utils.py", line 56, in set_experiment
    create_workspace_dir(dbx_client, os.path.dirname(exp_name))
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/mlflow_utils.py", line 97, in create_workspace_dir
    dbx_client.post("workspace/mkdirs", { "path": workspace_dir })
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/http_client.py", line 50, in post
    return json.loads(self._post(resource, data).text)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/http_client.py", line 46, in _post
    self._check_response(rsp,uri)
  File "/home/ec2-user/mlflow-export-import/lib64/python3.7/site-packages/mlflow_export_import/common/http_client.py", line 63, in _check_response
    raise MlflowExportImportException(f"HTTP status code: {rsp.status_code}. Reason: {rsp.reason}. URI: {uri}. Params: {params}.")
mlflow_export_import.common.MlflowExportImportException: HTTP status code: 400. Reason: Bad Request. URI: https://my-workspace.cloud.databricks.com/api/2.0/workspace/mkdirs. Params: None.

Any way to see what the request is to determine why it is getting a 400? Could this be because the workspace is blank and the users do not exist yet?

Use pre-commit to standardize code formatting

Request Summary

I would like to implement a tool like pre-commit to handle auto-code formatting and quality checks. This would be very helpful for onboarding new contributors.

Let me know if this is of interest I'm happy to help implement it.

As a contributor I would like:

  • A consistent code format across the repo
  • A way to enforce this code formatting on my own code without having to think too hard (see black)
  • A one time formatting of all existing code to match this style

Implementation Details

Step 1)

Add a new .pre-commit-config.yaml file at the root of the repo (I'll explain below what this does)

exclude: docs|.git|.tox
default_stages: [commit]
fail_fast: false

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
    -   id: check-yaml
    -   id: check-ast
    -   id: check-docstring-first
    -   id: check-merge-conflict
    -   id: mixed-line-ending

-   repo: https://github.com/timothycrosley/isort
    rev: 5.10.1
    hooks:
    -   id: isort
        args: [--profile, black]

-   repo: https://github.com/psf/black
    rev: 22.6.0
    hooks:
    -   id: black-jupyter

-   repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
    rev: v2.4.0
    hooks:
    -   id: pretty-format-yaml
        args: [--autofix, --indent, '4']
    -   id: pretty-format-ini
        args: [--autofix]
    -   id: pretty-format-toml
        args: [--autofix]

# sets up .pre-commit-ci.yaml to ensure pre-commit dependencies stay up to date
ci:
    autoupdate_schedule: weekly
    skip: []
    submodules: false

The above config file sets up a number of tools to run automatically on edited files when the git commit action is performed:

  • black python code autoformatting
  • isort python import sorting
  • More!
    • Trailing whitespace cleanup
    • Newline Adders
    • Merge Conflict Checkers
    • toml/yaml/ini autoformatting

Step 2)

Commit the above file and install pre-commit:

pip install pre-commit
pre-commit install
pre-commit autoupdate

Run a onetime code-cleanup of everything

pre-commit run --all-files

Step 3)

Push all of these changes up into GitHub. This can be a painful part of implementing a tool like pre-commit since there will be a massive diff - I recommend the original maintainer be the one to push those changes to retain git blame history.

Step 4)

Add some details for new contributors. I have an example here I try to re-use across GitHub: https://juftin.com/camply/contributing.html

Step 5)

Nothing, new contributors code will auto-format during commit and they'll learn an awesome tool while they're at it

Make export_all work correctly

Make export.sh work properly.

Notes:

  1. Make the output directory the same as that of export_models.sh
  2. Figure out how to correctly export all experiments/runs and have them correctly map to a registered model version's run_ud

Mlflow host or token is not configured correctly (Open-source)

Hi !
I'am wondering if i's possible to export/import mlflow (experiement, run, model, etc) with a MLflow with remote Tracking Server, backend and artifact stores. So without using Databricks.
If it's the case I have this error "Mlflow host or token is not configured correctly (Open-source)".
Someone can help me ? Thank you
Victor,

get_mlflow_host_token() does not load token in environment variable.

We want to use mlflow-export-import to migrate models between OOS tracking servers in an enterprise setting (at a bank). However, since our tracking servers are both behind oauth2 proxies, support for bearer tokens is essential for us to make it work.

I inspected the code and the reason for this is that the function def get_mlflow_host_token() does not actually loads the token from the environment variable, and hence returns none.

Is this on purpose? Otherwise., I created a PR that fixes the issue.

importing models with mlflow-artifacts: source

Importing models that have a source that starts "mlflow-artifacts:" fails. This is due to the fact that there is a file existence check override for "dbfs:" but not "mlflow-artifiacts:".

Add all entrypoints to a single Click Group

Currently this package uses a number of entrypoints to access different functionality. Moving all of these Click Commands under a Singular Click group would be much easier.

I've already performed this work on an old fork of https://github.com/amesar/mlflow-export-import/. I'll open a PR on this repo instead.

Here's what it would look like:

mlflow-export-import --help
Usage: mlflow-export-import [OPTIONS] COMMAND [ARGS]...

  MLflow Export / Import CLI: Command Line Interface

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  export-all          Export the entire tracking server All registered...
  export-experiment   Exports an experiment to a directory.
  export-experiments  Exports experiments to a directory.
  export-model        Export a registered model and all the experiment runs...
  export-models       Exports models and their versions' backing Run along...
  export-run          Exports a run to a directory.
  find-artifacts      Find artifacts that match a filename
  http-client         Interact with the MLflow Export/Import HTTP Client
  import-experiment   Import an experiment from a directory.
  import-experiments  Import a list of experiment from a directory.
  import-model        Import a registered model and all the experiment runs...
  import-models       Imports models and their experiments and runs.
  import-run          Imports a run from a directory.
  list-models         Lists all registered models.

Runs imported but models not after using import-models

Hi, thank you so much for developing this package. I am trying to migrate experiments, runs, and models from our old mlflow server to our new one. I exported the models using export-models --output-dir mlflow_model_output_dir2 --models shortage_lgb_resource_count_3dp_all_21mth,shortage_lgb_open_duty_ct_3dp_all_21mth --export-source-tags True --export-all-runs True sucessfully. I then changed the mlflow uri to our new location and used import-models. The experiment and runs are all imported and when looking at the model registry the model names are both there. However, under the registry/model name there are no versions. In the experiment UI window there are no links to models, but I can find the artifacts along with the "register model" button if I click through the runs into a single runs details. I have attached screenshots of the original model registry for the model I am moving, the new model registry, and the output of the import-models run.
original_mlflow_server_model_registry
new_mlflow_server_model_registry
terminal_import_model_finished_

Prevent duplicate active stages (Production, Staging) on multiple imports into same registered model

If you import versions into a model with multiple import-model calls, MLflow UI semantics of having just one active stage were not being honored. Duplicate active stages (Production, Staging) were observed, i.e. two or more Production stages which cannot occur in the UI.

Problem: In import_model.py, the archive_existing_versions argument in the MlflowClient.transition_model_version_stage() call, was not set (default is False).

Fix: Set the archive_existing_versions argument to True to avoid multiple active stages.

See:

Object of type MlflowExportImportException is not JSON serializable

When running import-all --input-dir <my_directory>

Traceback (most recent call last):
File "/databricks/python3/bin/import-all", line 8, in
sys.exit(main())
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/databricks/python3/lib/python3.8/site-packages/mlflow_export_import/bulk/import_models.py", line 117, in main
import_all(
File "/databricks/python3/lib/python3.8/site-packages/mlflow_export_import/bulk/import_models.py", line 76, in import_all
utils.write_json_file(fs, "import_report.json", dct)
File "/databricks/python3/lib/python3.8/site-packages/mlflow_export_import/utils.py", line 78, in write_json_file
fs.write(path, json.dumps(dct,indent=2)+"\n")
File "/usr/lib/python3.8/json/init.py", line 234, in dumps
return cls(
File "/usr/lib/python3.8/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/usr/lib/python3.8/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.8/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/usr/lib/python3.8/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type MlflowExportImportException is not JSON serializable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.