Light

by-ilya / sms-spam-classification Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 380 KB

SMS spam classification using naive bayesian model.

JavaScript 100.00%

sms-spam-classification's Introduction

sms-spam-classification

SMS spam classification data from Kaggle (https://www.kaggle.com/uciml/sms-spam-collection-dataset) using naive bayesian model.

Description

This algorithm performs the following steps:

Load and read .csv file with SMS Spam Collection Dataset | Kaggle data from disk.
Parse this file and extract v1 value as label (spam or ham) and v2 value as message.
Transform messages to tokens and then to lemmas, replace all numbers with constant token __NUMBER__.
Shuffle all messages.
Split messages into train and test sets.
Fit bayesian model with train set.
Predict labels on test set.
Calculate the following metrics:

accuracy,
precision,
recall,
F1-score,
Matthews correlation.

Requirements

Node JS library and NPM package manager.
Libraries installed from package.json file.

Install and configure

Go to the project root directory.
Run npm i or npm install command. This command installs necessary libraries.
Open .env file and configure the following parameters:

SMS_COLLECTION_PATH: string value, that specifies .csv file path to the SMS Spam collection data from Kaggle (absolute or relative path).
TRAIN_SIZE: float value, that specifies the size of train set.
COUNT_EXPERIMENTS: integer value, that specifies the number of experiments.

Running command

In the project root directory execute npm start command.

Output example

RESULTS:

Count experiments: 100
Train set size: 0.8
Avg accuracy: 0.9768671454219029
Avg precision (spam): 0.8840480938416516
Avg recall (spam): 0.9494494826142801
Avg F1-score (spam): 0.9153065782697162
Matthews correlation: 0.9028739302029823

Used `Node JS` libraries

csv-parser (version 2.3.2) is used for parsing .csv files.
natural (version 0.6.3) is used for tokenizing input texts from corpus to words.
lemmatizer (version 0.0.1) is used for creating lemmas from words.

sms-spam-classification's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.