twitter-scraping-with-python

This guide seeks to teach you how to get started using Python requests from the Twitter API using Tweepy, as well as store the information gained from those requests into a csv file. If you need additional information, Twitter also provides a course on how to use V2 of their API for academic research, as well as other resources.

GETTING STARTED WITH TWITTER DEVELOPER

Collecting Twitter data is done through the Twitter API. In order to use the Twitter API, you'll need to apply for developer access:
If you're an undergrad, you don't have access to the academic research track, so just apply for the standard developer track
In order to apply, you'll either need to log in to your account or create one
- It is recommended to apply using an existing Twitter account, in our experience they are more likely to get approved, and get approved faster

USING TWITTER DEVELOPER TO GAIN API ACCESS

Start at the Twitter developer portal. This is how you will create a project and app in order to get the necessary access and bearer tokens you need to write code that accesses information using the Twitter API
Gathering Twitter data to investigate trends related to toxicity, community building, platform governance and other related issues
Create a project, then an app, and you'll be presented with your keys. Here's the video walkthrough
After initial setup, your keys can be accessed by hitting the key icon next to the app name in your project dashboard.
- Your keys will be hidden after they're first displayed to you, so you should either take a screenshot or save them as variables in a file so you don't have to continue to regenerate them.

GETTING STARTED WITH PYTHON

Thus far, we've been using Python to work with the Twitter API. In order to use Python, you'll first have to install it onto your computer. I recommend two possible tracks for getting started:
- 1- If space on your computer isn't a concern, or you anticipate working with data science into the future, it's probably best to download Anaconda, which will install Python for you, and provide a dashboard allowing you to access Jupyter Notebooks, PyCharm, and other resources.
  - Download Anaconda:
  - Anaconda also makes things easier by pre-installing commonly used libraries like Pandas so you can skip that step
  - Once you download Anaconda, you should open the Anaconda-Navigator and download the community version of PyCharm (an IDE made specifically for (Python), unless you already have a preferred IDE you're attached to
    - IDE = Integrated Development Environment -- IDEs are to write code/programs as word processing software (Word, Google Docs) are to writing documents. Different IDEs are made for different purposes-- there are some made to work for a broad scope of languages (like Visual Studio Code or Aom), others optimized for a specific language (like PyCharm for Python), and others that are designed to help with a more niche task (like Spyder for data science)
- 2- If you prefer to save space, you can download Python by itself
  - You'd then need to download an IDE to use, we recommend either PyCharm or Visual Studio Code. If you have windows, you should already have visual studio installed
Once you have Python and an IDE installed, you'll want to install the necessary packages to web scraping using the Twitter API. You can either do this through terminal or PyCharm.
- 1- Using terminal, enter the command pip3 install [insert package name
  - (ex. pip3 install Tweepy)
  - video example using pip3 install in terminal (I already had Tweepy installed, so the output might look different)
- 2- Go through PyCharm: in bottom menu bar, select Python packages, search for desired package, and click install
  - video example installing in PyCharm
Packages to install (some might already be installed if you install through Anaconda, but entering the command to install them won't hurt anything)
- Tweepy
- csv
- JSON
- schedule
- time
- pandas

YOUR FIRST PYTHON PROGRAM

This example will use PyCharm, but the steps should be essentially the same no matter what IDE you choose to use in this hello world video walkthrough
In my example, I wrote code to print hello world -- print('hello world') -- in order to run the program, open terminal (either in your IDE or a normal terminal window), and enter the command: Python3 filename.py
- Common pitfall: if you are running the program outside of your IDE, you will get an error saying something like there is no file with that name. To fix this, you have to make sure your terminal is pointed at the right folder. For example, if I had a file called main.py stored in a file called PythonProjects which was saved onto my desktop, you would first have to enter the following command into terminal before running your code: cd Desktop/PythonProjects
If you're completely new to coding, it will probably be best to use a course to learn the basic structures you'll being using (ie. loops, conditional statements)
- Codeacademy is a great resource that offers both free and paid courses
- All SCU students have free access to LinkedIn Learning, which offers some really cool classes-- you'll have to fill out this form in order to get access
- If you prefer to read info rather than watch it, this course from Real Python will be very helpful
- While YouTube is also a great free resource, taking a course with structure is usually a better option to avoid having a scattered experience that repeats information and leaves out other bits
If you have experience with coding but not in Python, W3Schools is a great resource for understanding syntax

MAKING YOUR FIRST REQUEST USING Tweepy

Now that you have Python set up and your developer credentials set up, you are able to start writing programs to make requests from the Twitter API!
Start by writing a statement to import Tweepy : import Tweepy as tw
- The as tw part saves time because now when you would otherwise type out Tweepy, you can type tw instead
Now, authorize your request and connect to the Twitter API's endpoint:
- Save API key, API secret, access token, and access secret as variables.
- Use the Tweepy package's OAuthHandler function to input your API key and secret, and the set\_access\_token function to input your access token and secret
- Using the variable you've stored your keys and tokens in, use Tweepy's API function to connect to the Twitter API's endpoint

Importing Tweepy, storing keys and tokens, securing access and connecting to the Twitter API's endpoint
- Now, you're ready to make a request! Tweepy allows you to use different functions to search in different ways, but for now, we'll just work with the search method:
- To use a different method, check out the list of available ones in Tweepy's documentation
- You can filter your search results using the parameters below, as well as the tweet_mode parameter, and limit the number of tweets by appending .items()

From Tweepy's documentation
- Example search:
- The tweets variable will now store an iterator - in order to access the information itself, you'll want to loop through it
- To see what information is contained in each object, simply print the item:
- This is all the information for one tweet: clearly this isn't the most readable way to store data, so we'll want to store it in a better way:
  - If you want only particular pieces of information, you can use a dataframe, then store it into a csv file.
  - If you want to store the entire JSON object, you can store it directly into a **csv ** file.

USING DATAFRAMES

Dataframes are an aspect of the Pandas package that is an essential part of dealing with data in Python -- it essentially creates a table that organizes your information.
When using dataframes, you'll need to include the statement: import pandas as pd
In this example, let's say we're trying to collect the time a tweet is created, its author's Twitter handle, and the full text.
We'll want to create three lists that will store those pieces of information.
Then, we can loop through the iterator item returned by our search and add the information we want to those lists.
- In order to find how to reference and grab the information you want, print out the entire Tweepy object and check how the information you want is labelled.

Once lists are populated with the search information, we can make it into a dataframe.
As you can see below, when I print the data frame, the information is neatly organized.

However, we ideally want to be storing this information into a csv file, rather than only being able to view it in the terminal.
You can do this easily by using pandas's to_csv() method

Now, when you run this program, either a csv file will be created, or an existing one will be overwritten with the dataframe we created.

CONVERTING A JSON OBJECT INTO A csv FILE

In order to use this method, you'll have to import the csv package: import csv
First, you'll need to create a list to store the JSON objects and iterate through the search object to populate it
Then, open a csv file to write into
Next, you'll loop through the list of JSON objects and write the data from the JSON object into the csv file
- On the first iteration through the loop, you'll want to access the titles of the data contained in the JSON object and write them into the csv file so that the first line of each column indicates what information is contained in it

santa-clara-media-lab / twitter-scraping-with-python Goto Github PK

twitter-scraping-with-python's Introduction

twitter-scraping-with-python

GETTING STARTED WITH TWITTER DEVELOPER

USING TWITTER DEVELOPER TO GAIN API ACCESS

GETTING STARTED WITH PYTHON

YOUR FIRST PYTHON PROGRAM

MAKING YOUR FIRST REQUEST USING Tweepy

USING DATAFRAMES

CONVERTING A JSON OBJECT INTO A csv FILE

MORE TOPICS YET TO INCLUDE

twitter-scraping-with-python's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent