Giter Club home page Giter Club logo

predicting-car-prices's Introduction

predicting-car-prices

We need to build the model to determine the price of a used car.We are interested in the following parameters :

  • the quality of the prediction
  • the speed of the prediction
  • the time required for training

The dataset can be found here: www.kaggle.com/dataset/f8a4f6645bbf81179d976bfd42fa80ee167f68e4408da355810110f0dc6ac001

Step 1: Preprocess the data
We will remove anomalies, fill missing values(where possible), reduce the number of categories for features (there are around 250 categories for Model column), remove uninformative features.

Step 2: Prepare data for model training
Since we will be building RandomForest and Linear Regression models among others, we need to encode categorical features. One hot encoding has been used for this purpose. For models which deal only with numeric data it is better to use dummification instead of label encoding. Label encoding only maps each string to an integer and the model will wrongly interpret this as having a numeric relationship with target variable and other features. We will also scale numeric variables using StandardScaler. LightGBM and Catboost can work with categorical features so there is no need to send encoded features for model training.

Step 3: Model building
We will build Linear Regression fo sanity check. Random Forest, and stochastic gradient boosting trees, Xgboost, LightGBM, Catboost.

The statistic used as a measure of model's performance is RMSE. RMSE for test set for each model along with training time and prediction time have been stored in a dataframe for comparison.

predicting-car-prices's People

Contributors

shradha289 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.