Giter Club home page Giter Club logo

political-compass's Introduction

DOI

Navigating Multidimensional Ideologies with Reddit’s Political Compass: Economic Conflict and Social Affinity

This repository contains code and data to reproduce our results from "Navigating Multidimensional Ideologies with Reddit’s Political Compass: Economic Conflict and Social Affinity" by Ernesto Colacrai, Federico Cinus, Gianmarco De Francisci Morales and Michele Starnini, published at ACM Web Conference 2024 (WWW'24). If you use the provided data or code, we would appreciate a citation to the paper:

@inproceedings{10.1145/3589334.3645606,
author = {Colacrai, Ernesto and Cinus, Federico and De Francisci Morales, Gianmarco and Starnini, Michele},
title = {Navigating Multidimensional Ideologies with Reddit's Political Compass: Economic Conflict and Social Affinity},
year = {2024},
isbn = {9798400701719},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3589334.3645606},
doi = {10.1145/3589334.3645606},
booktitle = {Proceedings of the ACM on Web Conference 2024},
pages = {2582–2593},
numpages = {12},
keywords = {homophily, polarization, reddit, socio-demographic},
location = {, Singapore, Singapore, },
series = {WWW '24}
}

Here you will find (i) the (anonymized) Reddit dataset we presented in the paper and (ii) the code to reproduce our experiments.

Reddit Data Set

You can download our anonymized Reddit r/PoliticalCompass (and /r/PoliticalCompassMemes) data sets from here.

Each username is consistently replaced with an anonymized number.

  • blacklist_anonymized.joblib: list of users (anonymized) classified as bots.

For each of the analyzed subreddits (/r/PoliticalCompass as PC and /r/PoliticalCompassMemes as PCM), the data set contains these CSV files.

  • submissions_anonymized_SUBREDDIT.csv: each line corresponds to a submission on the SUBREDDIT, including the anonymized username of its author, flair associated with the author (ideology on the Political Compass), and the time of creation (UTC format) of the submission. Data go from 2012 to 2022, and in 0.1-Data-Pre-Processing-PC-PCM.ipynb only data in the period 2020-2022 (included) are selected.
  • comments_anonymized_SUBREDDIT.csv: each line corresponds to a comment on the SUBREDDIT, including the anonymized username of its author, flair associated with the author (ideology on the Political Compass), and the time of creation (UTC format) of the submission. Data go from 2012 to 2022, and in 0.1-Data-Pre-Processing-PC-PCM.ipynb only data in the period 2020-2022 (included) are selected.
  • edges_anonymized_SUBREDDIT.csv: each line corresponds to a comment on the SUBREDDIT done during 2020-2022. The file lists the author of the comment, the author of the parent comment to which this comment is replying, and the sentiment of the text of the interaction. This can be seen as a weighted graph among users.
  • popularity_anonymized_SUBREDDIT.csv: each line corresponds to the author and the list of the scores associated with each of his comments in the SUBREDDIT. Those data are used to analyze possible confounding effects of Reddit.
  • socio_demographics_anonymized_SUBREDDIT.csv: for each Reddit user of the SUBREDDIT included in the analysis, this file reports their anonymized username and their score on the age, gender, partisan, and affluence axes (included also ideologies flairs for analysis). Scores are quantile-normalized, so that i.e. a score of 0.25 indicates the 25th percentile. The axes respectively correspond to the probability of being young (low) or old (high), male or female, poor or rich, and left-leaning or right-leaning.
  • edges_anonymized_with_toxicity_SUBREDDIT.csv: each line corresponds to an edge of the interaction network of the SUBREDDIT with the author of the comment, the author of the parent comment to which this comment is replying, the body (as empty string for anonymization reasons), the social and economic ideologies of both the author and the parent author, and the toxicity value get from the original body of the comment.

See the paper for more details about how we extracted this information. The total number of nodes (users) and edges (interactions) for the interaction networks are:

SUBREDDIT /r/PC /r/PCM
N. nodes 18135 173672
N. edges 215111 6197901

Reproducibility

To reproduce our experiments, we provide all our notebooks to generate the analysis and the plot from the data set. In particular, you have to:

For further information or needed data, please contact [email protected], [email protected].

political-compass's People

Contributors

arnestc avatar federicocinus avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.