Giter Club home page Giter Club logo

rutvikbhavsar / svm_classification_model_for_civil_unrest_relevant_tweets Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 4.71 MB

Classification of civil unrest relevant tweets in the Mexico region using SVM (Support Vector Machine) classifier.

License: MIT License

Python 100.00%
svm linearsvc support-vector-machine pickle sklearn numpy json cross-validation feature-extraction grid-search object-serialization object-deserialization machine-learning data-mining

svm_classification_model_for_civil_unrest_relevant_tweets's Introduction

Classification of civil unrest relevant tweets in the Mexico region

Background

Event analysis of the form considered here is an established concept in social science research. Civil unrest is a large concept intended to capture the myriad ways in which people express their protest against things that affect their lives and for which they assume that the government (local, regional or national) has a responsibility (e.g., cost of urban transportation, poor infrastructure, etc.). If the action is directed against private actors, there is normally a connection to government policy or behavior, e.g., a labor strike against a private company can disrupt the rhythm of everyday life for the rest of society, turn violent or lead to a series of disruptive strikes which require government involvement, and thus responsibility in the eyes of citizens. Civil unrest does not include acts by criminals for purely private gain. While authoritarian governments may outlaw civil protest and thus ‘criminalize’ the participants, social scientists would distinguish illegal political protests from illegal criminal activities. Gang members stopping public buses to extort payoffs from bus owners would not be a civil unrest event, though people protesting afterward against the government’s inability to control such gangs would be considered civil unrest.

This expansive definition of civil unrest means that one can find it everywhere, including European protests against austerity or marches against an oil pipeline from Canada across the US to the Gulf of Mexico. Latin America, nevertheless, offers some special characteristics that make it an excellent region for study in our project. The region experiences a plethora of civil unrest events every day (providing a sufficient number of GSR events to train machine learning models), is well covered by international and national news media (facilitating the task of generating ground truth), is the object of detailed empirical research and polling (permitting the description of the social, political and economic context within which civil unrest occurs) and has a significant and growing number of social network users (thus supporting the use of modern data mining algorithms).

Read the article "Beating the news with EMBERS- Forecasting Civil Unrest using Open Source Indicator.pdf" for more details

Installation

Python v2.7.14

Use the package manager pip to install packages.

pip install scikit-learn  #v0.21.2
pip install numpy         #v1.14.2

File Description

  1. train_posi_tweets_2007.txt (4000 positive tweets)

  2. train_nega_tweets_2007.txt (8000 negative tweets)

  3. unlabeled_tweetss_2007.txt (51647 unlabeled tweets)

  • Note: These unlabeled tweets are provided in case you want to try some unsupervised techniques (e.g., clustering) to generate your classifier.

Tweet Format

tweet = {"lang": "es", "city": "Puebla", "embersId": "90ec51abad3f32705b511f5d0cd58b49d11ac40c", "uid": 152810236, "Parent_embersId": "7f208b7f82b8f29ad2349779e774f419f2915ea4", "derivedFrom": "1e1d43c10095a980e0746ace31e4f946", "instantloc": {"city": [["heroica puebla de zaragoza", [-98.22, 19.05], 1467419]], "state": []}, "key_count": 2, "score": 0.0013994149053899736, "text": "\u201c@mlucascir: RT \u201c@mauolmedo: La marcha anti pe\u00f1a en #Puebla M\u00e9xico http://t.co/CXRjG59d\u201d...buena marcha gente de Puebla.", "date": "2012-07-22 20:30:23", "country": "Mexico", "text_items": [["rt", "march", "gent", "pen", "anti", "puebl", "mexic"], ["#puebla"], ["@mlucascir", "@mauolmedo"], ["http://t.co/cxrjg59d\u201d...buena"]], "screen_name": "DUEMIKE"}

The above is an example tweet. The following are the basic items

tweet['text'] == tweet text.
tweet['text_items'][0] == list of stemmed terms of the tweet text. 
tweet['text_items'][1] == list of hashtags in the text. 
tweet['text_items'][2] == list of user mentions in the text. 
tweet['text_items'][3] == list of links in the text. 

Output

Output result

Overview

  • A model is trained on the data provided in 'train_nega_tweets_2017.txt', 'train_posi_tweets_2017.txt' and formated it in to a simpler format and saved it in a 'training_tweets_2017.txt' file which was fed to creat X, and y.

  • The trained classifier model is now saved and can be reused to perform classification task on the similar relavent tweets so that we don't have to train the model every time we perform a similar task.

  • Model is saved by pickling it using pickle module.

  • Pickling is the name of the serialization process in Python. By pickling, we can convert an object hierarchy to a binary format (usually not human readable) that can be stored. To pickle an object we just need to import the pickle module and call the dumps() function passing the object to be pickled as a parameter.

  • Saved model then loaded by unpickling it.

  • The process that takes a binary array and converts it to an object hierarchy is called unpickling. The unpickling process is done by using the load() function of the pickle module and returns a complete object hierarchy from a simple bytes array.

  • 'saved_model.py' file reads the model and saves the model as 'saved_model.pkl' file.

  • 'predictions.txt' file contains class label for the given 'test_tweet.txt' file.

  • All the '.py' files, '.txt' files, and '.pkl' file must be kept inside the same folder when execute the 'svm_model_classifier.py'.

svm_classification_model_for_civil_unrest_relevant_tweets's People

Contributors

rutvikbhavsar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.