Giter Club home page Giter Club logo

etl_twitter's Introduction

ETL With Twitter Data

Project Overview

This script scrapes data from Twitter. Results are saved into a csv file for further analysis.

Tools used:

  • jupyter noteboook
  • Python 3.7
  • tweepy
  • pandas
  • pymongo

The script does the following:

  • a function to scrape data from the twitter site
  • a function to save the data scrapped to a csv file.

SetUp

  • Apply for a Twitter developer account

  • Clone this repo

  • Create an environment using :

    conda create -n "env name" python=3.7
    
    
  • Activate the environment using:

    conda activate "env name"
    
  • Install Packages using:

    pip install -r requirements.txt 
    
    

    Store env variables

To store your API credentials:

  • Duplicate .env.example file and create a new file name .env

Resources used

Activities done

The two functions:

  • scape_tweets() - This function returns a dataframe containng the tweets extracted and has the following parameters:
    • Search topic
    • The number of tweets to download per request
    • The number of requests
  • Save_results_as_csv() - This function returns a csv file and has the following parameters:
    • the dataframe from the scrape_tweets function The csv file returned has the following naming format:
    • tweets_downloaded_yymmdd_hhmmss.csv (where ‘yymmdd_hhmmss’ is the current timestamp)

The following attributes of the tweets would be extracted:

  • Tweet text
  • Tweet id
  • Source
  • Coordinates
  • Retweet count
  • Likes count
  • User info
  • Username

Part 2 - A MongoDB database is created with the name Tweets_db and the extracted tweets are stored into a collection named: raw_tweets.

etl_twitter's People

Contributors

clifflolo avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.