This script scrapes data from Twitter. Results are saved into a csv file for further analysis.
Tools used:
- jupyter noteboook
- Python 3.7
- tweepy
- pandas
- pymongo
The script does the following:
- a function to scrape data from the twitter site
- a function to save the data scrapped to a csv file.
-
Apply for a Twitter developer account
-
Clone this repo
-
Create an environment using :
conda create -n "env name" python=3.7
-
Activate the environment using:
conda activate "env name"
-
Install Packages using:
pip install -r requirements.txt
To store your API credentials:
- Duplicate
.env.example
file and create a new file name .env
- Twitter Search API
- Tweepy Documentation
- Pymongo Documentation
- Scraping Tweets with Tweepy Python - Python in Plain English
- How to Scrape More Information From Tweets on Twitter - Towards Data Science
The two functions:
- scape_tweets() - This function returns a dataframe containng the tweets extracted and has the following parameters:
- Search topic
- The number of tweets to download per request
- The number of requests
- Save_results_as_csv() - This function returns a csv file and has the following parameters:
- the dataframe from the scrape_tweets function
The csv file returned has the following naming format:
- tweets_downloaded_yymmdd_hhmmss.csv (where ‘yymmdd_hhmmss’ is the current timestamp)
- the dataframe from the scrape_tweets function
The csv file returned has the following naming format:
The following attributes of the tweets would be extracted:
- Tweet text
- Tweet id
- Source
- Coordinates
- Retweet count
- Likes count
- User info
- Username
Part 2 - A MongoDB database is created with the name Tweets_db and the extracted tweets are stored into a collection named: raw_tweets.