Giter Club home page Giter Club logo

Comments (11)

aarondav avatar aarondav commented on May 18, 2024

Note that this is related to #99, but a different route is suggested for python version. We should consider how the mlflow version might affect the reproducibility of results. Presumably, the mlflow version should not change the actual training results (unless relying on new mlflow features, in which case it's somewhat likely that you'd need to update mlflow outside of the conda environment).

The proposal above (that we don't override the mlflow version if it's explicitly provided) is consistent with the proposal in #99; if I want to fix my mlflow version for extra reproducibility, I may.

from mlflow.

aarondav avatar aarondav commented on May 18, 2024

@mateiz @tomasatdatabricks

from mlflow.

tomasatdatabricks avatar tomasatdatabricks commented on May 18, 2024

I think the MLflow is not necessarily required to be inside the user environment. Someone might use MLflow to package and run their project but do not use it within. And if it is used inside, it can be different version than executes the run. Injecting version of the "outside" MLFlow would break the reproducibility in some cases.

But I can see how that is confusing. Maybe we should just . issue a warning if the versions do not match?

from mlflow.

aarondav avatar aarondav commented on May 18, 2024

I think at the very least, if the mlflow version in the conda environment is specified without a version, it should definitely be set to the one outside the environment. Anyway if you don't fix the version, it is not reproducible.

from mlflow.

tomasatdatabricks avatar tomasatdatabricks commented on May 18, 2024

True. But no version gives you at least implicit >= current_version requirement. With fixing the version to the outside environment you could go several versions back.

from mlflow.

aarondav avatar aarondav commented on May 18, 2024

I don't think this is a complete argument; any conda env that includes mlflow via pip may eventually run into version incompatibility and caching problems. I think we should either explicitly not recommend this pattern (through docs, warnings, or errors), or give it different semantics.

One reason to enable this pattern by giving it different semantics is to give an easy way for users to keep up to date with mlflow, by tying it to the host version. On the other hand, it adds complexity (the behavior is not identical to loading the environment directly).

There may be alternative solutions to consider, like if we can break our conda cache if we identify that the pip dependencies are out of date, but that may take extra computational time which should be evaluated.

from mlflow.

tomasatdatabricks avatar tomasatdatabricks commented on May 18, 2024

I completely agree that by not-specifying mlflow version you are risking version incompatible and loss of reproducibility and I think we should definitely discourage the pattern in docs and also maybe detect it at runtime and issue warnings.

But (personally) I don't think it's a good idea to enforce non-standard conda-behavior on users. I found it weird to inject version of the outside environment which may differ for different users. Besides, why should it be different for mlflow vs other packages with no version specified?

I think caching is a separate issue. Can't we fetch latest version from conda and tie it to that instead of the environment version?

from mlflow.

andyk avatar andyk commented on May 18, 2024

but I would have to manually delete the mlflow-$sha conda environment in order to invalidate it.

On two separate occasions, I've had to do this to get things working again!

from mlflow.

tomasatdatabricks avatar tomasatdatabricks commented on May 18, 2024

This is very similar to what we do for models. When serving models we do need mlflow to be installed inside of the model environment. We have decided not to install MLflow into the model environment by default. The model environment is expected to include mlflow. However, we did add a command line option to force-install mlflow into the environment.

=> I think we should not change the current behavior but it may be a good idea to add a custom flag.

from mlflow.

stale avatar stale commented on May 18, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from mlflow.

stale avatar stale commented on May 18, 2024

This issue has not been interacted with for 60 days since it was marked as stale! As such, the issue is now closed. If you have this issue or more information, please re-open the issue!

from mlflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.