Giter Club home page Giter Club logo

e-commerceml's Introduction

E-commerceML

Machine Learning model development for a transport company, the objective is to predict whether an order will arrive on time or not.

Problem Description

We are part of a logistics company that works for an important E-Commerce portal, and our Team Leader gives us the task of implementing a model that allows us to predict whether a shipment will arrive on time or not, according to the information contained in the dataset.

About the dataset

The main dataset is a version of Kaggle E-Commerce Shipping Data. This dataset contains the following information:

  • ID: ID Number of Customers.
  • Warehouse block: The Company have big Warehouse which is divided in to block such as A,B,C,D,E.
  • Mode of shipment:The Company Ships the products in multiple way such as Ship, Flight and Road.
  • Customer care calls: The number of calls made from enquiry for enquiry of the shipment.
  • Customer rating: The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best).
  • Cost of the product: Cost of the Product in US Dollars.
  • Prior purchases: The Number of Prior Purchase.
  • Product importance: The company has categorized the product in the various parameter such as low, medium, high.
  • Gender: Male and Female.
  • Discount offered: Discount offered on that specific product.
  • Weight in gms: It is the weight in grams.
  • Reached on time: It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time.

Metrics to be evaluated

Recall of the Confusion Matrix will be used as a method for evaluating model performance. Our main interest is to find those shipments that will not arrive on time. The recall will answer the question: What percentage of shipments that do not arrive on time are we able to identify?

$$ Recall=\frac{TP}{TP+FN}$$

where $TP$ the true positives and $FN$ the false negatives.

Accuracy is a metric also based on the confusion matrix. In this case we will take this metric to evaluate the classification performance for both class 1 and class 0 in our target variable. Note that in this exercise the primary class will be class 1, i.e. those shipments that do not arrive on time.

$$ Accuracy=\frac{TP + TN}{TP+ TN + FN + FP}$$

where $TP$ the true positives, $TN$ true negatives, $FN$ false negatives, $FP$ false positives.

General Steps

  1. Exploratory Data Analysis (EDA)
  2. Data Preprocessing
  3. First Modeling Batch (Working with raw data)
  4. Second Modeling Batch (Aplying One hot Encoding)
  5. Third Modeling Batch (Evaluating StandardScaler)
  6. Fourth Modeling Batch (Evaluating Dimension Reduction using PCA)
  7. Final model selection and searching for best hyperparameters with GridSearchCV
  8. Conclusions

For more deep information please don't hesitate to open the main.ipynb.

Documentation to highlight

Contact

Greetings, Jean Paul Fabra Ruiz: [email protected]

LinkedIn: https://www.linkedin.com/in/jeanfabra/

e-commerceml's People

Contributors

jeanfabra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.