Giter Club home page Giter Club logo

code_challenge's Introduction

Environmental Code Challenge

Jul 18th, 2018

Classification and prediction project

Created by

License Python version

Notebook  •  Main Features  •  Usage  • Dependencies • Folder Structure   


Overview

Task 1 - Classify wind turbine failure


Classify if the turbine will break down within the next 40 days


predictive_maintenance_dataset.csv is a file that contains parameters and settings for many wind turbines:

  • operational_setting_1
  • operational_setting_2
  • sensor_measurement_1
  • sensor_measurement_2 ...

There is a column called unit_number which specifies which turbine it is, and one called status, in which a value of 1 means the turbine broke down that day, and 0 means it didn't.

The task is to create a model that, when fed with operational settings and sensor measurements (unit_number and time_stamp will not be fed in), outputs 1 if the turbine will break down within the next 40 days, and 0 if not.

For a closer look at the process, please review the Jupyter Notebook

Task 2 - Predict city pollution


Predict the pollution value after 6 hours.

forecasting_dataset.csv is a file that contains pollution data for a city. The task is to create a model that, when fed with columns co_gt, nhmc, c6h6, s2, nox, s3, no2, s4, s5, t, rh, ah, and level, predicts the value of y six hours later.

For a closer look at the process, please review the Jupyter Notebook

Notebook

A writeup explaining design decisions, potential works and the reasons for making current choices: Notebook

For log files, open using TensorBoard by typing below command in your terminal in where the log folder is:

tensorboard --logdir=logs

Main Features

The model is actually a pipeline for both tasks.

Task 1 pipeline contains:

  • Get dummies from categorical variable and drop 1 level
  • Select only features appears in training set
  • Impute with the mean
  • Feed Forward Neural Network with Keras

Task 2 pipeline contains:

  • Select only features appears in training set
  • Get dummies from categorical variable
  • Impute with the mean
  • Feed Forward Neural Network with Keras

Usage

Download the model saved in pickle file in Result folder.

For task 1:

  • Load in the pipeline first
  • Then load the keras model in the pipeline. (use Keras 1.2 to load the model)
# Load the pipeline first:
pl_load_in = joblib.load('../../results/task1_pipeline.pkl')

# Then, load the Keras model:
pl_load_in.named_steps['model'].model = load_model('../../results/task1_keras_model.h5')

# Test the model:
# Compute and print MSE for validation
ypred = pl_load_in.predict(Xval)
mse = mean_squared_error(yval, ypred)
print("Mean squared error: %f" % (mse))

# reset index for comparison (if yval already have clean index, this step can be omitted)
yval2 = yval.reset_index(drop=True)

# assign hard label (function hard_label() is in src.task1.reform_results)
new_ypred=pd.DataFrame(ypred)[0].apply(hard_label)

# Compute the accuracy: accuracy for validation
accuracy = float(np.sum(new_ypred==yval2))/yval2.shape[0]
print("accuracy: {}%".format(round(accuracy*100, 3)))

For task 2:

Load in the pipeline. The model is included in the pipeline.

# load the model from disk
filename = 'results/task2_model.pkl' # path leads to pickle model
loaded_model = pickle.load(open(filename, 'rb'))

# Test the model
ypred = loaded_model.predict(Xtest)
print("R squared score is:", r2_score(ytest,ypred).round(3))

Dependencies

  • numpy
  • pandas
  • missingno
  • imbalanced-learn
  • sklearn
  • statsmodels
  • keras 2.0 for modelling, 1.2 if just need to load model and use it.
  • matplotlib
  • seaborn
  • scikitplot

Folder Structure

The hierarchy of this repository is described like below:

     .
     |-- README 
     |-- LICENSE
     |-- .gitignore.py        
     |-- data
     |   -- predictive_maintenance_dataset.csv
     |   -- forecasting_dataset.csv
     |-- doc 
     |   -- notebook.md         # electronic lab notebook
     |   -- manuscript.md       
     |-- results		# storing all the result models 
     |-- src                    # source code used for both tasks
     |   -- task1               # code specific for task 1
     |   -- task2               # code specific for task 2
     |-- test			# tests for functions
     |-- assets                 # store images
     |-- bin
     |   -- # keep all the files I want to delete but not sure whether I will need it later

code_challenge's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.