Giter Club home page Giter Club logo

message-spam-classifier-uci-ml's Introduction

NOTICE

Message-spam-classifier-UCI-ML

This project involves building a spam classifier using the Naive Bayes algorithm. The dataset used is a set of SMS messages labeled as spam or ham.

Table of Contents

  • Technologies Used
  • Dataset Description
  • Project Description
  • Data Cleaning and EDA
  • Data Preprocessing
  • Model Building
  • Output
  • Conclusion

Technologies Used

  • Python 3.9.7
  • pandas 1.3.3
  • scikit-learn 0.24.2
  • matplotlib 3.4.3
  • seaborn 0.11.2
  • nltk 3.6.3

Dataset Description

The dataset used is a collection of SMS messages that are labeled as spam or ham.

Project Description

The main goal of this project is to build a Naive Bayes classifier that can accurately predict whether an SMS message is spam or ham.

Data Cleaning and EDA

The first step in the project involved cleaning the dataset by dropping unnecessary columns and renaming columns for better readability. I also checked for missing values and duplicate values, and removed any duplicates found. Exploratory data analysis was done to understand the distribution of spam and ham messages in the dataset. I used various visualization techniques such as a pie chart and histograms to visualize the distribution of the number of characters, words, and sentences in spam and ham messages.

Data Preprocessing

The next step involved preprocessing the text data to convert it into a format that can be fed into a machine learning model. I performed text normalization techniques such as converting all text to lowercase, removing stopwords and punctuation, and stemming the text. I also created a word cloud to visualize the most common words in spam and ham messages.

Model Building

Finally, I built a Naive Bayes classifier using the CountVectorizer and TfidfVectorizer to convert text into numerical features. I split the dataset into training and testing sets and evaluated the performance of the model using accuracy, confusion matrix, and precision. I used three types of Naive Bayes models: GaussianNB, MultinomialNB, and BernoulliNB. I found that MultinomialNB provided the highest accuracy and precision.

Output

  • SPAM

Screenshot 2023-04-28 at 3 13 48 PM

Screenshot 2023-04-28 at 3 14 11 PM

  • NOT A SPAM

Screenshot 2023-04-28 at 3 14 38 PM

Screenshot 2023-04-28 at 3 15 04 PM

Conclusion

I was able to build a Naive Bayes classifier that can accurately predict whether an SMS message is spam or ham with an accuracy of approximately 98%. The model can be further improved by using more advanced natural language processing techniques and trying out other classification algorithms.

message-spam-classifier-uci-ml's People

Contributors

ashutoshdevpura avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.