Giter Club home page Giter Club logo

top-fibers's Introduction

top-fibers

Code to find and rank the top superspreaders of misinformation on Twitter using the FIB-index.

Links

Creators

Top FIBers is a project of the Observatory on Social Media (OSoMe, pronounced "awesome") at Indiana University. The following individuals have contributed to this project: Matthew R. DeVerna, Pasan Kamburugamuwa, Nick Liu, Kaicheng Yang, Ben Serrette, and Filippo Menczer.

The best way to contact the team is by using the contact information found at the OSoMe website.

top-fibers's People

Contributors

mr-devs avatar nick-the-freak avatar pasan04 avatar

Watchers

Kostas Georgiou avatar  avatar

Forkers

bs22iu

top-fibers's Issues

Finalize documentation

I am adding all of us but likely I will handle most of this myself.

ToDo List

The below is for when the repo is ready and all data has been updated.

  • Make sure that the README docs are correct in all directories
  • Update the web-based documentation (see this) which is quite outdated at this point.
    • Nice to have: Can we create a diagram of the infrastructure and the data processing pipeline? (use https://app.diagrams.net/)

Add tests

  • data model
  • FIB-index function
  • etc..

Merge the two fib scripts into one

Currently, there are two scripts that are used to calculate the fib information.

They are:

They can be merged into one file if we do the following:

  • Update the argparse inputs to include:
    • Number of spreaders (currently hardcoded)
    • Type of spreader (currently hardcoded)
    • Platform
  • Move both data extraction functions to the package/top_fibers_pkg/fib_helpers.py
    • Another option would be to combine them into one function that does different things depending on a platform flag (which could be taken from above) — this would create a very large function though...
  • Other small things like: importing both data models, setting output files based on flags, etc.

Remove very old tweets

Right now, the code works by ingesting all tweet objects from base tweets, retweets, and quote tweets. This means that, if something is retweeted from a very old post, it could have had a year or more to gain retweets. This is not necessarily a fair comparison so we may want to filter out tweets that are not originated during the time frame observed.

Download FB posts for the past year Oct '21-Dec '22

We want to be able to calculate the FIB indices for all months in 2021.

In order to do this, we will need to have one file for each month, going back three months from Jan '21. Thus, the script needs to be called 15 different times for each month of data.

  • December (pull in January; offset -1)
  • November (offset -1)
  • October (offset -2)
  • September (offset -3)
  • August (offset -4)
  • July (offset -5)
  • June (offset -6)
  • May (offset -7)
  • April (offset -8)
  • March (offset -9)
  • February (offset -10)
  • January (offset -11)
  • December (offset -12)
  • November (offset -13)
  • October (offset -14)

Please address the cosmetic changes in #35

I merged #35 despite requesting some small cosmetic changes. I did this to get the pipeline ready for the cronjob that is going to occur over the weekend.

When you have a chance, please create a new PR to address what I left in the comments of that PR.

Thanks!

Re-pull all Twitter data from moe using the new list of domains

You can find the list of domains to use in data/iffy_files. Will probably be easier to do this after you merge #36 first (as the file is on that branch).

The data should be repulled for all months that we had previously. That is all of last year, up to the month prior to whatever is the previous month. As today is 4/28, that would be up to and including 2023-03. However, in two days, we are going to need the April data as well.

Please let me know if anything isn't clear. Thanks!

Crontab `MAILTO` variable

Once the project is finalized, do you think we need to add others to this variable? I know that many of the other projects send emails to the larger group email list and Fil gets updates.

I will let you decide how you'd like to manage this as it will ultimately become your project to manage after I am gone. That said, I would recommend adding at least one other person's email from the developers' team.

Also, I will add myself to keep an eye on this while I am still around.

Create a clean version of the iffy list

  • Create a clean version of the iffy list with https and wildcards removed
  • Update the code so it loads the clean version and doesn't need to clean the domains on the fly

Incorporate logging

Create a logging directory that has different subdirectories for each type of script running.

  • data collection
  • analysis
  • etc

Clickable account icons

On the "Accounts" page, it looks like we can no longer click the user icon to visit an account's twitter/facebook page. I think that Fil had asked you to make the text no longer look like a hyperlink when we added the "Unfollow" button but I would still like to be able to click the icon to visit. So we would be able to visit both ways, via the unfollow button but also the icon. Can you please add that functionality back in when you get a chance?

Thanks!

Update FAQ

Hi Pasan, can you please update the FAQ sections outlined below with the following text? These have changed since we've changed the list of domains slightly.

Thanks!


How do you define misinformation?

We adopt a common definition of misinformation utilized in academic research, which focuses on a source of information “that mimics news media content in form but not in organizational process or intent” (Lazer et al, Science, 2018). With this definition, we search for posts that contain at least one link to sources within a list that is curated by an independent third party, Iffy.news. Specifically, we include sources that have been marked by Media Bias Fact Check (MBFC) as having a "low" or "very-low" "MBFC Factual" score.

According to MBFC methodology, a source in these categories "rarely uses credible sources and is not trustworthy for reliable information" and "need[s] to be fact-checked for intentional fake news, conspiracy, and propaganda." Since a source's MBFC Factual score can change, we update our list of sources each month prior to releasing a new Top FIBers monthly report. These updates do not affect prior reports.

How do you collect your data?

Facebook

Facebook data are gathered using the CrowdTangle API . Specifically, we utilize the /posts/search/ endpoint. As a result of utilizing CrowdTangle, we are limited to collecting public posts (see the CrowdTangle documentation for more details).

Data for all months in 2022 as well as January--March of 2023 were gathered during the week of April 24, 2023. After that point, Facebook data are collected within the first week of the following month (depending on how long it takes for all data to download). For example, April 2023 data were collected during the first week of March 2023. We collect public posts linking to at least one of the low-credibility sources (see How do you define misinformation? for more details). please link the bold portion to the section above.

Twitter

Twitter data are collected with the enterprise-level Decahose endpoint. The Decahose delivers a 10% random sample of all tweets in real time. From this, we then collect all tweets that link to at least one of the credibility sources.

As Twitter's recent API changes have made continuing this data collection virtually impossible, we will no longer be able to continue analyzing Twitter's biggest superspreaders of misinformation.

Visualization ideas

Some ideas for account visualization:

  • Top ten domains shared
  • Top hashtags utilized
  • Misinfo links shared per day
  • Mean low-credibility tweets per day/week/etc

Make the cronjob send email reminders

We need to update the master bash script so that it sends an email to Nick when something fails.

We have already added Nick's email to the crontab file using the MAILTO variable but need to make all of the exit lines return 1.

I will take care of this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.