Giter Club home page Giter Club logo

pysparktwitter's Introduction

Getting started with Pyspark and Twitter Data Analysis

Installation

Prerequisites for this notebook are :

  • Python3
  • pip3
  • java 8
  • Scala ( I have used 2.11 for this example )

Next we have to install spark following the next steps :

  • Go to apache.spark.org
  • Download Spark built for Hadoop 2.7 and unzip it into the /home/ using a command as follows :
sudo tar -zxvf spark-x.x.x-bin-hadoopy.y.tgz

Of course, you need to replace x.x.x and y.y by the respective versions of Spark and Hadoop

  • Set environment variables to link Python with Spark and Pyspark with Jupyter notebooks :
export SPARK_HOME=’/home/spark-x.x.x-bin-hadoopy.y/’
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=“jupyter”
export PYSPARK_DRIVER_PYTHON=OPTS=“notebook”
export PYSPARK_PYTHON=python3
  • To prevent any permissions issues in the future, we run the following commands :
sudo chmod 777 home/spark-x.x.x-bin-hadoopy.y

We can verify whether this was successful or not by running python3 from the home/spark-x.x.x-bin-hadoopy.y and then try import pyspark. Then :

sudo chmod 777 home/spark-x.x.x-bin-hadoopy.y/python
sudo chmod 777 home/spark-x.x.x-bin-hadoopy.y/python/pypspark

Caution : To avoid frequent errors due to version conflicts, make sure Spark and Pyspark are of the same version. You can run the pip freeze command to check the version of Pyspark. My version is pyspark 2.4.4 therefore I have downloaded spark-2.4.4-bin-hadoop2.7.

Start the tweet reader :

Install the tweepy package required to connect with Twitter :

pip3 install tweepy

Then run the TweetRead.py from the terminal

Run the notebook to process the tweets

Check TwitterApplication.ipynb for more details

pysparktwitter's People

Contributors

godsonk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.