Giter Club home page Giter Club logo

mlflow-demo's Introduction

mlflow-demo

Simple Demo of MLflow Project with LightGBM


Data

Adult Data Set from UCI Machine Learning Repository

Variable Explanation
age continuous
workclass Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked
fnlwgt continuous
education Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool
education-num continuous
marital-status Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse
occupation Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces
relationship Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried
race White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black
sex Female, Male
capital-gain continuous
capital-loss continuous
hours-per-week continuous
native-country United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands
target >50K, <=50K

Feature Engineering

Since this demo project doesn't focus on machine learning algorithms, so only simple feature engineering steps are implemented. More specifically,

  • target: transform '>50K' as 1 and '<=50K' as 0
  • fill missing categorical value with 'Missing' and apply label-encoder
  • fill missing numerical value with median value from training set
  • since the data set is imbalanced, AUC is used as the metric

Modeling

Gradient Boosting model from LigthGBM are implemented. Check the documentation for details.


How to run the code

  • Method 1:
$ git clone https://github.com/JifuZhao/mlflow-demo.git
$ mlflow run mlflow-demo \
  -P train_path=absolute path/mlflow-demo/data/train.csv \
  -P test_path=absolute path/mlflow-demo/data/test.csv \
  -P num_boost_round=1000 -P learning_rate=0.1 -P num_leaves=31 \
  -P max_depth=-1 -P min_data_in_leaf=20

Note: You need to change the file path to be your local absolute path.

After the code successfully runs, you can get the following results:

  • Method 2:
$ git clone https://github.com/JifuZhao/mlflow-demo.git
$ cd mlflow-demo/
$ python train.py ./data/train.csv ./data/test.csv 1000 0.1 31 -1 20

After running, you can view the result through MLflow UI:

$ mlflow ui

In the browser, go to http://127.0.0.1:5000, you will get the following results.


Example MLflow projects:

Copyright @ Jifu Zhao 2018

mlflow-demo's People

Contributors

jifuzhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mlflow-demo's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.