Giter Club home page Giter Club logo

wrangle_twitter_archive's Introduction

Wrangle Twitter Archive

Table of contents:

WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. They rate the dogs almost always with a denominator of 10. But numerators?? Most of them are greater than 10. But WHY??? WeRateDogs believes every dog is beautiful and almost all dogs deserve 10 and sometimes more than that. WeRateDogs has over 8 million followers and has received international media coverage. Our goal is to wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations.

Dataset:

The dataset consists of three parts.

  • Enhanced twitter archive: The WeRateDogs Twitter archive contains basic tweet data for all 5000+ of their tweets, but not everything. One column the archive does contain though: each tweet's text, which I used to extract rating, dog name, and dog "stage" (i.e. doggo, floofer, pupper, and puppo) to make this Twitter archive "enhanced." Of the 5000+ tweets, I have filtered for tweets with ratings only (there are 2356). This data is stored in twitter_archive_enhanced.csv file.

  • Additional Data via the Twitter API: Back to the basic-ness of Twitter archives: retweet count and favorite count are two of the notable column omissions. Fortunately, this additional data can be gathered by anyone from Twitter's API. Well, "anyone" who has access to data for the 3000 most recent tweets, at least. We have the WeRateDogs Twitter archive and specifically the tweet IDs within it, we can gather this data for all 5000+. We're going to query Twitter's API to gather this valuable data. Finally we store these data in tweet_json.txt file.

  • Image Predictions File: One more cool thing: I ran every image in the WeRateDogs Twitter archive through a neural network that can classify breeds of dogs*. The results: a table full of image predictions (the top three only) alongside each tweet ID, image URL, and the image number that corresponded to the most confident prediction (numbered 1 to 4 since tweets can have up to four images). We store this prediction data in image_predictions.tsv file.

What Software Do I Need?

One can do this project in jupyter notebook using python 3.x But one has to install the following python packages to wrangle dataset and query twitter api.

  • pandas
  • NumPy
  • requests
  • tweepy
  • json
  • sqlalchemy

Project Steps:

Basically data wrangling process consissts of three steps. These are follows -

  • Gather Data: Gather dataset for wrangling.
  • Assess Data: Note the issues regarding quality and tidiness of the dataset.
  • Clean Data: Here we fixing issues those are documented during data assessment process to make dataset ready for analysis.

Its recomended that after data wrangling process, clean data should be stored for future analysis purpose. Here we store the clean data in a flat file twitter_archive_master.csv and a sqlite database twitter.db.

wrangle_twitter_archive's People

Contributors

codeslash21 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.