This GitHub is my solution to a project of one the Machine Learning Devops Nanodegree.
You can fing my results and code at the following links
- wandb url: https://wandb.ai/mpoliti08/nyc_airbnb/overview?nw=nwusermpoliti08
- github url: https://github.com/March-08/build-ml-pipeline-for-short-term-rental-prices
The focus of this project is to build and end to end machine learning learnrning pipeline for short term rental prices in NYC. In this project we investigate the integration of several tools that enanble us to perform experimentations in a clear and structure way. The tools on which we focused are:
- Hydra: for configurations and hyperparameter tuning
- W&B: used as artifact store, data versioning, and monitoring the training
- MLflow: to orchestrate the whol ML lifecycle.
The open source dataset is about rental prices in New York City, provided by Airbnb.
All the steps of the pipeline can be run levareging MLflow. So this is the onlu thing you need to install. Then MLflow will take care of installing averyting else is needed for each component of the pipeline, creating isolated virtual environments for each component. In order to install MLflow:
> pip install mlflow
To make sure your installed mlflow succesfully run the following command.
> pip show mlflow
Now you should be able to run the entire pipeline from the root directory using mlflow.
> mlflow run .
If you want to run the download
and the basic_cleaning
steps, you can similarly do:
> mlflow run . -P steps=download,basic_cleaning
You can override any other parameter in the configuration file using the Hydra syntax, by
providing it as a hydra_options
parameter. For example, say that we want to set the parameter
modeling -> random_forest -> n_estimators to 10 and etl->min_price to 50:
> mlflow run . \
-P steps=download,basic_cleaning \
-P hydra_options="modeling.random_forest.n_estimators=10 etl.min_price=50"
All the steps in the pipeline will produce results (in term of performance and artifacts) that will be saved into wandb.
Make sure you have a wandb account, and you are logged in:
> wandb login