Giter Club home page Giter Club logo

hepsiburada-review-scraper's Introduction

Made with Python

Hepsiburada Review Scraper HB Review Scraper Build Status GPLv3

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. Nearly 30M reviews with category and product links can be crawled and used for text classification, sentiment analysis, text mining, NLP models etc. Supported by multithreading, written in Python.

Prerequisites

$ pip3 install -r requirements.txt

Installation

$ git clone https://github.com/0x01h/hepsiburada-review-scraper.git
$ cd hepsiburada-review-scraper
$ python3 hepsiburada.py

Usage

Program provides an human-friendly interactive shell for users.

Features

  • Shutdown computer after finishing: Optional choice for deep and long scrapings.
  • Threads: Try to give a proper number. (Recommended value is 64.)
  • Timeout: Giving a large number could result in long waiting times, small numbers could lead connection failures. (Recommended time range is 15-30 seconds.)
  • Pagination Depth: Maximum number of paginated review pages for each product.

You can track your progress via progress bars. categories.txt, products.txt, hepsiburada.txt will be saving to your current directory.

Do not scrape aggressively! Otherwise, you will be caught by captcha challenge!

Say thanks.

For educational purposes only.

hepsiburada-review-scraper's People

Contributors

0x01h avatar atomdeniz avatar hakkiyagiz avatar noronist avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.