Giter Club home page Giter Club logo

analyzing-data-patterns-and-different-classification-methods's Introduction

Analyzing data patterns and different classification methods

Table Of Content

  • Task Description
  • Feature Extraction and Selection
  • Classification Algorithm
    • Code to save NB in pipeline [Figure 2]
    • Code to save Logistic Regression in pipeline [Figure 3]
    • Code to save Decision tree in pipeline [Figure 4]
    • Code to save LinearSVC in pipeline [Figure 5]
    • Pairplot [Figure 1]
  • Output
    • Confusion Matrix of LinearSVC [Figure 6]
    • Confusion Matrix of Decision Tree [Figure 7]
    • Confusion Matrix of Naive Bayes [Figure 8]
    • Confusion Matrix of Logistic Regression [Figure 9]
  • Accuracy Comparison
    • Accuracy of all models Train V Test [Figure 10]
  • Reference

Task Description

The objective of this assignment was to install scikit-learn library for python, retrieve language dataset from DSL and perform language classification using four different machine learning algorithm. After training the models, we ran accuracy test for all the models in the pipeline to showcase which model had the highest accuracy.

Feature Extraction and Selection

Before extracting the features from the training dataset, we ran PCA to determine which labels are best fit for classification by performing Kbest method using chi-square. The data was dense and it doesn’t really work properly with PCA, so we used truncatedSVD to plot the relation in figure[1]. The plots helps us to understand the label subsequently saving us the time to train the data by eliciting the relation between each feature with other. The plot also helps us to visualize the number of labels present in the data viz. 14.

Classification Algorithm

We were instructed to use four different classification algorithms namely, Naive Bayes, LinearSVC, Decision Tree and Logistic Regression. We loaded all the models in a pipeline and we store it in a python dictionary to loop over them one after the other.

Naive Bayes

We used multinomial NB (Naive Bayes)[3] to perform ‘transform’ on our data and we used countvectorizer, feature selection and the model itself in our pipeline.

Code to save NB in pipeline [Figure 2]

Naive Bayes uses a prior data (known knowledge) to classify the data into different labels. Since the data had more than two labels we went with Multinomial NB algorithm to perform classification.

Logistic Regression

Logistic regression[4] is well known algorithm in classification. We played around with this algorithm and figured that ‘newton-cg’ was solver for this particular dataset. Since, the sigmoid and tanH functions were not giving us the accuracy we were expecting, we used ‘newton-cg’.

Code to save Logistic Regression in pipeline [Figure 3]

Decision Tree Classifier

Usually Decision Tree[1] is used for predictive analysis, but it also has the capability to perform classification. It one of widely used algorithm used in Machine Learning. It is the foundation that runs the random forest classifier.

Code to save Decision tree in pipeline [Figure 4]

LinearSVC

LinearSVC[2] uses the One-vs-All (also known as One-vs-Rest) multiclass reduction. It is also noted here. Also, for multi-class classification problem SVC fits K * (K - 1) / 2 models where K is the amount of classes.

Code to save LinearSVC in pipeline [Figure 5]

Pairplot [Figure 1]

Output

The Confusion Matrix of LinearSVC

Confusion Matrix of LinearSVC [Figure 6]

The confusion Matrix for Decision Tree

Confusion Matrix of Decision Tree [Figure 7]

The confusion Matrix for Naive Bayes

Confusion Matrix of Naive Bayes [Figure 8]

The confusion Matrix for Logistic Regression

Confusion Matrix of Logistic Regression [Figure 9]

Accuracy Comparison

Accuracy of all models Train V Test [Figure 10]

We can clearly see that the accuracy of Naive Bayes was higher during the training however the model didn’t perform that well when it saw unseen data. LinearSVC, on

the other hand, performed consistently across train and test data.

Reference

[1] scikit-learn, "scikit-learn," [Online]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#skle arn.tree.DecisionTreeClassifier. [Accessed 29 07 2018].

[2] "svc," [Online]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html. [Accessed 29 07 2018].

[3] "naivebayes," [Online]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html. [Accessed 29 07 2018].

[4] "logreg," [Online]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.htm l. [Accessed 29 07 18].

analyzing-data-patterns-and-different-classification-methods's People

Contributors

amantewary avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.