Giter Club home page Giter Club logo

group01-project03's Introduction

#README.md

##Methodology

##Continuous Feature Construction

Price Ratios from Retail / Auction

Significance was found using the ratios of different auction values, which were then incorporated into the features. The significant feature ratios and their respective "key values" are:

###Key Values

Feature Numerical Representation
Acquisition Auction Clean Price 1
Acquisition Retail Avg Price 2
Acquisition Retail Clean Price 3
Current Retail Avg Price 6
Current Retail Clean Price 7

###Feature Ratios

Ratio
1 / 2
1 / 3
2 / 3
2 / 6
2 / 7
3 / 6
3 / 7

Miles Per Year

The number of miles a car was driven per year was found to be of significant explanatory power, which makes sense as it often refers to "how hard a car was driven." Therefore, miles_per_year is also included in the final feature selection

Purchase Price as a Ratio of Auction / Retail

The purchase price of the car proportional to:

  1. Acquisition Retail Clean Price
  2. Current Auction Avg Price
  3. Current Retail Clean Price

so those three features were also included in the data set

Purchase Price over Warranty Cost

The Purchase price as a proportion of warranty cost served as a good feature to predict whether a car was a lemon and was therefore included.

##Binary Feature Construction

###At Risk Models

Certain models and years exhibited a higher propensity to being bad purchases, therefore another value entitield "model_year" was created, and the 15 most "at risk" model_years were included in the analysis, where 1 (if they were an "at risk model_year) or 0 if they were not

##Discrete Feature Construction

The zipcode of where the car was purchased was not as significant as the "first three digits" of the zipcode, which allowed for the "general region of the auction" to be calculated. therefore, the trunctated_zipcode appears in the final training data displaying only the first 3 digits of the zipcode.

##Chosen Algorithms

Both the Gradient boost and the BaggingClassifier were used to test the efficacy of the prediction algorithm.

In general, the Bagging Classifier had greater disparity between the f1_score of the training and testing. In gneral, as the max_sample parameter was increased, so too the the disparity between the f1-train and f1-test increased. However, this was not the case with the Gradient Boost Algorithm. Similar values for the f1-train and f1-test were achieved for different values of max_sample, leading to greater predictability of the algorithm. Therefore, Gradient Boost was chosen as our Classification Algorithm.

##Results

bag_img boost_img

Overall, the Boosting algorithm provided greater consistently between in-sample and out-of-sample f1 scores, and therefore was our algorithm of choice.

group01-project03's People

Contributors

benjaminmgross avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

gehui fagan2888

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.