Light

clifflolo / etl_twitter Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 14 KB

Shell 2.33% Python 97.67%

etl_twitter's Introduction

ETL With Twitter Data

Project Overview

This script scrapes data from Twitter. Results are saved into a csv file for further analysis.

Tools used:

jupyter noteboook
Python 3.7
tweepy
pandas
pymongo

The script does the following:

a function to scrape data from the twitter site
a function to save the data scrapped to a csv file.

SetUp

Apply for a Twitter developer account
Clone this repo
Create an environment using :
```
conda create -n "env name" python=3.7
```
Activate the environment using:
```
conda activate "env name"
```
Install Packages using:
```
pip install -r requirements.txt 
```
Store env variables

To store your API credentials:

Duplicate .env.example file and create a new file name .env

Resources used

Activities done

The two functions:

scape_tweets() - This function returns a dataframe containng the tweets extracted and has the following parameters:
- Search topic
- The number of tweets to download per request
- The number of requests
Save_results_as_csv() - This function returns a csv file and has the following parameters:
- the dataframe from the scrape_tweets function The csv file returned has the following naming format:
- tweets_downloaded_yymmdd_hhmmss.csv (where ‘yymmdd_hhmmss’ is the current timestamp)

The following attributes of the tweets would be extracted:

Tweet text
Tweet id
Source
Coordinates
Retweet count
Likes count
User info
Username

Part 2 - A MongoDB database is created with the name Tweets_db and the extracted tweets are stored into a collection named: raw_tweets.

etl_twitter's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.