Giter Club home page Giter Club logo

twitter-info-ops-pt2's Introduction

Under the Radar: Analyzing Recent Twitter Information Operations to Improve Detection and Removal of Malicious Actors, Part 2

R code, markdown document, and Gephi files used to create a social network analysis project examining three information operations removed from Twitter in 2021.

The full written report of my findings can be found on my website, https://wonksecurity.com. Here is the direct link to the pdf of the report: https://wonksecurity.com/wp-content/uploads/2022/12/Twitter_info_op_report_v2.pdf. Additionally, you can also find part 1 of this project here.

Abstract

This report builds upon the work done in part one of this series by examining the network structure of three information operations (IOs) that were removed from Twitter in 2021. The analysis that follows uses social network analysis (SNA) to explore the structure, key network statistics, and measures of centrality for network graphs created from Twitter mentions. Five data sets feature in this analysis, three IO networks and two control networks. Data for the three IO networks came from Twitter’s Transparency Center and contained tweets from a Russian, Chinese, and Iranian IO, respectively. In addition, two COVID-19 tweet data sets from Kaggle served as the controls. This project seeks to determine if it is possible to make cross-network comparisons that could enhance the early detection of IOs on social media platforms like Twitter. The analysis found that while each network was structurally unique, the key SNA statistics failed statistical significance testing when checking for differences between the IO group and control group. This may be the result of a small network sample size (n=5). However, this study also found that measures of centrality had statistically significant differences between the IO group and the control group. This suggests that measures of centrality, particularly eigenvector centrality and Pagerank, could be useful metrics for differentiating IOs from legitimate Twitter conversations.

How to use this project:

  • Go to Twitter's Transparency Center and acquire the original datasets used in this project. Given the rapidly evolving policies and changes at Twitter currently, I will not be making the raw data available while it remains easily accessible via Twitter's own service.

  • Acquire the "People’s Republic of China - Xinjiang (December 2021) - 2048 Accounts" dataset released in December 2021, particularly the Tweet Information file.

  • Acquire the Tweet Information file from the "Iran (February 2021) - 238 Accounts" dataset released in February 2021.

  • Acquire the Tweet Information file from the "Russia IRA (February 2021) - 31 Accounts" dataset also releaed in February 2021.

  • Then, go to Kaggle user Arunava Kumar Chakraborty's COVID-19 dataset page and download the 61MB file that contains both COVID-19 datasets that were used as controls.

  • Download the R scripts and Quarto markdown file, and Gephi files provided here to replicate, build upon, or fork the cleaning and analysis process that I used.

These datasets provided a massive wealth of possible information, and it is my goal to conduct additional analysis in future projects. I list out several potential ideas of how the data could be further analyzed in part one of this project, located here. If you try one of these or have additional ideas, feel free to drop me a line at cody [@] wonksecurity [.] com. I'd love to hear what you think.

About the files

  • Cleaning_Of_2021_Twitter_Info_Ops.R is the cleaning script used to prepare the raw Twitter data for analysis. It is the same file as the one used in part 1.

  • SNA_Of_2021_Twitter_Mentions.R contains all of the code to reproduce the results discussed in the report.

  • Twitter_info_op_report_v2.qmd is the Quarto markdown file that was used to produce the initial cut of the report before it was exported to Microsoft Word for finalization.

  • g_covid_2022.gephi contains the Gephi network graphs used to produce the Control 1 and Control 2 network visualizations.

  • mentions_ch_ru_ir_2021.gephi contains the Gephi network graphs to produce the three IO network visuals.

  • mentions_network_results.csv is the output of all the results used in the table shown in the report.

  • mentions_network_results2.csv contains these same results but organized differently for t-testing

  • mentions_network_results3.csv similarly contains results organized differently for ANOVA testing. Each network was assigned into either an "IO" group or a "control" group for easier testing.

License and Attribution

The code, markdown, and Gephi files in this project are released under a GPL-3.0 license. The data from Twitter's Transparency Center is bound by its terms of use, found here. The COVID-19 datasets made available by A. K. Chakraborty are available on Kaggle under a CC BY-NC-SA 4.0 license. Full credit for this dataset goes to: Chakraborty A.K., Das S., Kolya A.K. (2021) Sentiment Analysis of Covid-19 Tweets Using Evolutionary Classification-Based LSTM Model. In: Pan I., Mukherjee A., Piuri V. (eds) Proceedings of Research and Applications in Artificial Intelligence. Advances in Intelligent Systems and Computing, vol 1355. Springer, Singapore. https://doi.org/10.1007/978-981-16-1543-6_7.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.