Giter Club home page Giter Club logo

twitter-scraper-1's Introduction

Twitter Scraper

A scraper to retrieve the conversation in tweets for a particular twitter user.

Introduction

“Twitter Scraper” (or simply “scraper”) is available under MIT License. It consists of two steps- Step 1 and Step 2. Both these steps are to run one after the other in order. The behaviour is controlled with a properties file, named application.properties.

application.properties

This is the configuration file and specifies many important properties. For all available options, please see this file.

Property Type Details
target.username String, required This specifies the twitter-handle to scrape conversations for.
target.step int, required Possible values are 1 or 2. This tells the scraper the step to run.
concurrent-threads int, required Should be greater than 0. Number of threads to run to fetch conversations. This is in-effect when step 2 is running.

Step 1

This step fetches tweetIds from the specified. More will be updated later.

Step 2

This step fetches the conversations for the tweetIds fetched in the first step. More will be updated later.

Prerequisites

Following software are needed to run the built JAR.

  1. Oracle JRE 1.8

To build from the source, one needs following pieces of software installed.

  1. git client
  2. maven 3
  3. Oracle JDK 1.8

To install these on an EC2 instance run following commands in order.

sudo apt-get update
sudo apt-get install git
sudo apt-get install maven
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get install oracle-java8-installer
sudo nano /etc/environment

And then append this line at the bottom of the opened file and save it.

JAVA_HOME=”/usr/lib/jvm/java-8-oracle”

And then run following command.

 sudo source /etc/environment

Now following command should be working.

echo $JAVA_HOME

And one can check the installed JDK by running following command.

java -version

Getting the scraper

To get the scraper on EC2 instance (or any other Ubuntu machine), follow the steps below. These steps are needed to run only once.

  1. Decide a directory to put the “twitter-scraper” repository. Let’s call this directory base_directory. Issue following command to change directory to this base directory. Please replace “base_directory” term with the actual path of this chosen directory.
cd base_directory
  1. Following steps assume that the commands are being run from the base_directory. Issue following command.
git clone https://github.com/clayfish/twitter-scraper
  1. Now you have clone (source-code) of the scraper on you machine. Notice git command does not need any username/password.

How to run

It consists of three high-level steps.

  1. Update the code
  2. Compile the code
  3. Run the built JAR file

Update the code

To update the code with latest changes run following commands.

cd base_directory/twitter-scraper
git pull

Compile the code

Before compiling the code, please check if application.properties is configured as per the needs. Please refer to this section for more information about the configuration. To compile the code run the following commands.

cd base_directory/twitter-scraper
mvn package

These commands will create base_directory/twitter-scraper/bin/twitter-scraper-0.1.0.one-jar.jar file.

Run the built JAR file

To run the built JAR file, you need to run following commands.

cd base_directory
java -jar twitter-scraper/bin/twitter-scraper-0.1.0.one-jar.jar

It will start the scraper and it will start working. To stop the scraper, simply hit control + c (Mac) or ctrl + c (Windows/Linux). If you want to exit from the terminal while the scraper still runs, please use following commands instead of the commands written above.

cd base_directory
java -jar twitter-scraper/bin/twitter-scraper-0.1.0.one-jar.jar &

Currently all the logs are directed to terminal hence running it as daemon, doesn’t spare you from the constant logs on the terminal but you can close the terminal without closing the scraper.

twitter-scraper-1's People

Contributors

shuklaalok7 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.