pet_adoption

Experimenting with machine learning models that attempt to predict pet adoption speeds

Introduction

This repository contains the scripts, data sets and report required for the final module of Professional Certificate in Data Science learning programme by HarvardX

The final Course Capstone (PH125.9x) requires the student to apply machine learning techniques that go beyond standard linear regression. The student will have the opportunity to use a publicly available dataset to solve the problem of choice.

The ability to clearly communicate the process and insights gained from an analysis is an important skill for data scientists.

Data set can be selected from Kaggle or UCI Machine Learning Repository.

Summary of file contents

The R script file contains codes used for data exploration, visualisation and building the prediction models
The Rmd file contains codes to generate the PDF report that describes the process and outcome of the development of models that predict pet adoption speeds
The R object files (.rds) contain models that will be loaded by the Rmd script to generate the PDF report

Detail description of files in this repository

File sets (required)

Report in PDF: cyo_project.PDF
Report in Rmd: cyo_project.Rmd
Script in R that generates my prediction models and result findings: cyo_project.R
There are 4 data files in CSV format (pets.csv, state_labels.csv, color_labels.csv, breed_labels.csv)
Prediction models stored as R objects in RDS format that the Rmd file will load to produce the PDF report. Otherwise it can take hours (in my case, days on a standard MacBook Air with 8GB RAM) to completely run all the experimental models. There are 4 files (model_1.rds, model_2.rds, model_3_ntrees.rds, model_3.rds) that will be used by the Rmd script.

File sets (optional)

The results.rds contain the final results (this file is not loaded by Rmd script, it is included for reference should one be interested)
The model_3_modellist_ntrees.rds file that contains the models for all ntree values of model 3 is 280MB - and could not be loaded to GitHub or the edX site. You may run the codes for generating model_3 in cyo_project.R file, or contact me should you be interested to have a copy. The file is optional and not required by the Rmd script to generate the PDF file

Note

All the R, Rmd, CSV data files and RDS object files are to be saved to the same directory for the Rmd and R scripts to run correctly.
The R script will make use of RStudioApi call to set the working directory in order to pick up the CSV data files correctly.
The Rmd script will use a locally defined variable 'working_dir' to pick up the CSV data files and R object files that contain the relevant prediction models. Change the 'working_dir' in the Rmd file according to the local directory where the files are stored on your machine.

dccsim / pet_adoption Goto Github PK

pet_adoption's Introduction

pet_adoption

Introduction

Summary of file contents

Detail description of files in this repository

File sets (required)

File sets (optional)

Note

pet_adoption's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent