Giter Club home page Giter Club logo

reddit-scraper's Introduction

reddit-scraper

A tool for scraping and visualizing search results from Reddit.

Setting Up

Installing Node.js

Json-Server is utilized as the back-end for visualizing the scraped data, and Node.js is required in order to use it. The following bash commands can be used to install Node.js in Debian-based architectures. For other architectures, please refer to the official installation guide.

curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
sudo apt-get install -y nodejs

Setting up Json-Server

Inside the jsonserver directory, run the following bash command without modifying the existing files:

npm install --save json-server

Installing Beautiful Soup 4, Requests and LXML

Beautiful Soup 4 has to be installed along with the LXML parser. Also the requests library is required to access the HTML content of Reddit.

pip3 install beautifulsoup4
pip3 install requests
pip3 install lxml

Usage

This tool is made up of two parts; a web scraper and a dynamic web page for visualizing the results. The scraped data is stored in a file called product.json, and it is served by Json-Server to the front-end for visualization.

Scraping

You can make scraper limit its search by a specific subreddit, or you can make it search all subreddits.

Searching for a Keyword in All Subreddits

For example, in order to search for the keyword uzay in all subreddits, run the command below inside the root project folder:

python3 scraper.py --keyword="uzay"

Searching for a Keyword in All Subreddits

In order to search for the keyword ayn rand in the subreddit r/objectivism, run the command below inside the root project folder:

python3 scraper.py --keyword="ayn rand" --subreddit="objectivism"

Incremental Search

If there is an existing product.json file, the scraper will append the search results of a new keyword at the end of the file.

If the keyword already exists in product.json, the scraper will start searching from the date of the most recent post and append the new content at the end of the existing posts that had been earlier saved for that keyword.

Visualization

When the product.json file is ready for visualization, run the following command inside the jsonserver directory in order to start the Json-Server:

npm run json:server

Then, open up the reddit.html file inside a browser.

reddit-scraper's People

Contributors

utkuufuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

reddit-scraper's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.