Giter Club home page Giter Club logo

tweezy's Introduction

TWEEZY


Classify Twitter users based on different parameters

Basic Idea

A twitter user is classified into Anomalous, Non Anomalous and Intermediate using 5 parameters and each of these parameter will be given a rank:

  • Time Difference (denoted by a)
  • Similarity of Tweets (b)
  • URL Ranking (c)
  • Malware URL (d)
  • Adult Content (e)

each of these parameter will be assigned a value from 1-10 for each user and these parameters have a weight which together will decide whether a user is anomalous or not

Weights of each parameter are :

  • Time Difference: 0.15
  • Similarity of Tweets: 0.25
  • URL Ranking: 0.30
  • Malware URL: 0.30
  • Adult Content: 1

An FAL value is assigned combining all these parameters which is given by

Depending upon the FAL value , a user can be classified into Anomalous, Non Anomalous and Intermediate

Classification

  • This algorithm is applied on a dataset of twitter users from which a dataset of a,b,c,d,e and FAL values are obtained.
  • Onto this dataset different classification methods are applied.

Classification Methods Used

  • K-nearest neighbors (KNN)
  • Support Vector Machine (SVM)
  • Naive Bayes classifiers
  • Random Forest
  • Decision Tree

Structure

  • Files related to algorithm used is present in Twitter folder
  • Main.py is the root file to be run from which other functions are called
  • dataset_generator.py generates dummy data of values a,b,c,d,e,FAL,type into dataset_gen.csv
  • Classifier.py takes in the data present in the dataset_gen.csv and classify the users based on different Classification Algorithm
  • wot.py is used to calculate Web Of Trust Rank
  • similarity.py is used to calculate similarity of tweets
  • url.py is used to calculate Alexa rank of url's present in the tweets
  • checkTime.py is used to calculate time difference of tweets
  • checkContent.py is used to check for adult contents in tweets

How to run ?

To check whether a particular user is anomalous :-

  • clone this repo
  • run the following commands in the terminal from the cloned folder
  • virtualenv venv
  • source venv/bin/activate
  • pip install -r requirements.txt
  • python manage.py migrate
  • python manage.py runserver
  • open localhost:8000/main in your browser

To do the classification follow these steps :-

  • open Twitter folder in terminal
  • store the dataset of usernames which needs to be classified in dataset_gen.csv
  • run python main.py
  • output based on 5 classification algorithm will be displayed as the output

Screenshots

1 2 3 4

tweezy's People

Contributors

aswanthkoleri avatar aswinzz avatar druvalcr28 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.