Comments (11)
Note that this is related to #99, but a different route is suggested for python version. We should consider how the mlflow version might affect the reproducibility of results. Presumably, the mlflow version should not change the actual training results (unless relying on new mlflow features, in which case it's somewhat likely that you'd need to update mlflow outside of the conda environment).
The proposal above (that we don't override the mlflow version if it's explicitly provided) is consistent with the proposal in #99; if I want to fix my mlflow version for extra reproducibility, I may.
from mlflow.
from mlflow.
I think the MLflow is not necessarily required to be inside the user environment. Someone might use MLflow to package and run their project but do not use it within. And if it is used inside, it can be different version than executes the run. Injecting version of the "outside" MLFlow would break the reproducibility in some cases.
But I can see how that is confusing. Maybe we should just . issue a warning if the versions do not match?
from mlflow.
I think at the very least, if the mlflow version in the conda environment is specified without a version, it should definitely be set to the one outside the environment. Anyway if you don't fix the version, it is not reproducible.
from mlflow.
True. But no version gives you at least implicit >= current_version requirement. With fixing the version to the outside environment you could go several versions back.
from mlflow.
I don't think this is a complete argument; any conda env that includes mlflow via pip may eventually run into version incompatibility and caching problems. I think we should either explicitly not recommend this pattern (through docs, warnings, or errors), or give it different semantics.
One reason to enable this pattern by giving it different semantics is to give an easy way for users to keep up to date with mlflow, by tying it to the host version. On the other hand, it adds complexity (the behavior is not identical to loading the environment directly).
There may be alternative solutions to consider, like if we can break our conda cache if we identify that the pip dependencies are out of date, but that may take extra computational time which should be evaluated.
from mlflow.
I completely agree that by not-specifying mlflow version you are risking version incompatible and loss of reproducibility and I think we should definitely discourage the pattern in docs and also maybe detect it at runtime and issue warnings.
But (personally) I don't think it's a good idea to enforce non-standard conda-behavior on users. I found it weird to inject version of the outside environment which may differ for different users. Besides, why should it be different for mlflow vs other packages with no version specified?
I think caching is a separate issue. Can't we fetch latest version from conda and tie it to that instead of the environment version?
from mlflow.
but I would have to manually delete the mlflow-$sha conda environment in order to invalidate it.
On two separate occasions, I've had to do this to get things working again!
from mlflow.
This is very similar to what we do for models. When serving models we do need mlflow to be installed inside of the model environment. We have decided not to install MLflow into the model environment by default. The model environment is expected to include mlflow. However, we did add a command line option to force-install mlflow into the environment.
=> I think we should not change the current behavior but it may be a good idea to add a custom flag.
from mlflow.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from mlflow.
This issue has not been interacted with for 60 days since it was marked as stale! As such, the issue is now closed. If you have this issue or more information, please re-open the issue!
from mlflow.
Related Issues (20)
- [BUG]: MLflow recipe still erroring out because of logging png file twice HOT 2
- [BUG] Generating static files workflow (pipeline) HOT 1
- [BUG] Each step is no longer displayed in the metric plots HOT 6
- [BUG] HOT 3
- [BUG] mflow R package only returns `Error in wait_for`[BUG] HOT 8
- [BUG] latest mlflow version does not support conda environment docker HOT 4
- [BUG] Dockerfile.test cannot be built due to missing clint
- [BUG] onnx model logged twice (second time in metadata folder) HOT 2
- [BUG] Further local dev environment setup dependency issues for Apple Silicon Macs HOT 2
- Enable `ISC001` HOT 4
- [BUG] OpenAI flavor `api_request_parallel_processor.py` marks requests as failed due to unavailable TPM capacity HOT 2
- running into an error while running mlflow run . [SETUP-BUG] HOT 5
- Support to detect Custom Evaluator in pyspark for auto logging HOT 2
- [BUG] Cannot provide a custom ArtifactRepository plugin without implementing a custom FileStore/ModelRegistryStore HOT 11
- [BUG]mlflow.log_input will export dataset to json and import dataset again
- [BUG] SSL issue when uploading artifact via Python interface HOT 4
- [BUG] Running into exception "Dataset schema exceeds the maximum length of 65535" when using Sqlite backend HOT 16
- [FR] More flexible X-axis for time series plots HOT 8
- [FR] Upgrade `gunicorn` to v22
- [BUG](Metric) Plots in the UI only show the latest data point HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlflow.