Giter Club home page Giter Club logo

cs6242-team48's Introduction

CS6242 - Group 48

Members

  • Carlos Aguilar
  • Harshal Gajjar
  • Akshay Jadiya
  • Arushi Agrawal
  • Ramin Dabirian

Usage

1. Install dependencies

  1. [Recommended] Create a new virtual environment
  2. Install all packages listed in requirements.txt using the command python -m pip install -r requirements.txt

2. Processing Tweet stances

Don't want to run this step? Feel free to use the output files we have already generated and skip to step 4.

2.1. Download the partial dataset

  1. Download the csv containing 20M tweet IDs (and some more information) of 2020 related election tweets from:
  2. Place the downloaded csv file named uselection_tweets_1jul_11nov.csv in the same folder as get_tweet_strings.py

2.2. Fetch tweet strings using Twitter API

Don't want to run? Look at the sample output of this step: ./data/processed_data/sample_processed_data_1_50000.csv or the complete output on OneDrive here.

  1. Run the following command to process the first 50k tweet IDs in uselection_tweets_1jul_11nov.csv and generate a new csv ./data/processed_data/processed_data_1_50000.csv containing tweet strings (and some more information dump).
python get_tweet_strings.py 1 50000 "<consumer_key>" "<consumer_secret>" "<access_token>" "<access_token_secret>"
  1. Repeat the above process to get all the tweet strings, next command in the sequence:
python get_tweet_strings.py 50001 100000 "<consumer_key>" "<consumer_secret>" "<access_token>" "<access_token_secret>"

Get your Twitter API keys here

Note: All our generated data sheets are batched in size of 50k. And hence all the start/end indices are of the form: i*50000+1, (i+1)*50000 where i belongs to the set {0,1,2,...,399}

2.3. Compute tweet string stance

Don't want to run? Look at the sample output of this step: ./data/processed_data_stance/sample_processed_data_stance_1_50000.csv or the complete output on OneDrive here.

  1. Once a csv file containing tweet strings has been placed in ./data/processed_data/, the following command can be executed to compute stance. It will process appropriate csv from ./data/processed_data to generate a similarly named new csv with stance in the folder ./data/processed_data_stance/
python get_tweet_stance.py 1 50000
# reads: ./data/processed_data/processed_data_1_50000.csv
# generates: ./data/processed_data_stance/processed_data_stance_1_50000.csv
  1. Repeat the above step to compute stance for each of the 20 million tweets.

2.4. Aggregate the tweet stance data

Don't want to run? Look at the sample output of this step: ./data/processed_data_stance_aggregated/sample_stance_democrat_50000_1.csv or the complete output on OneDrive here.

  1. Once csv files are populated in ./data/processed_data_stance/, following command can be executed to get new csv in ./data/processed_data_stance_aggregated/ with day-wise aggregated stance values (total stance, total count) for each day.
python aggregate_stance_values.py

Note: aggregate_stance_values.py processes all sheets present in ./data/processed_data_stance in a single go. Hence, looping is not required in this step.

2.5. Cleaning of aggregate data

  1. All the csv files generated in ./data/processed_data_stance_aggregated need to be unified in a single sheet.
  2. Later, days with multiple rows (i.e. spread across multiple 50k tweet sheets) need to be grouped together.
  3. [Optional] Data needs to be normalised. We subtracted the standard deviation and divided by the mean per day.
  4. Upon doing this, save the final result as ./data/final_chart_data.csv. Note that our processed output already exists at the same location.

3. Processing Trending Hashtags

Requires Google Chrome, and access to this website containing historical trends.
Don't want to run this step? Feel free to use the output files we have already generated and skip to step 4.

  1. Download ChromeDriver for your installed version of chrome from here. Upon downloading change the line 8 in ./hashtags/scrapper.py to point to the download driver's location. Also, line 20 can be changed from twitter to google to get trends from Google instead of Twitter.
  2. Run the following command to get trending hashtags on the dataset's range of dates. This will create a new file ./data/hashtags.csv with top trends for each row/date. Note that our output already exists at the same location.
cd ./hashtags
python scrapper.py
cd ..
  1. Run the following command to generate word cloud for each of the days. This will populate the folder ./assets/twitter with png files. Note that our output already exists at the same locations.
cd ./hashtags
python wordcloud_generator.py
cd..

4. Run Visualization Server

Upon generation of stance data (from step 2) and trending topics/hashtags (from step 3), run the following command to launch an interactive webpage that uses port 8080.

python launch_visualization_server.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.