Amazon Scraper

Amazon-scraper is a command line application to collect reviews and questions/answers from amazon products.

Read the documentation here: amazon-scraper on readthedocs

Installation
Usage
Options
References

Installation

via pip

~ TODO

cloning this repository

$ git clone https://github.com/picorana/amazon-scraper.git   
$ cd amazon-scraper
$ pip install -r requirements.txt
$ python setup.py install

Usage

Run amazon-scraper via command line by running

$ amazon-scraper [asin]

asin is a unique identifier for a product on amazon. You can find it in the url:
A query to https://www.amazon.com/gp/product/B01H2E0J5M would look like this:

$ amazon-scraper B01H2E0J5M

You can also insert multiple asins:

$ amazon-scraper B01H2E0J5M B01GYLZD8C B0736R3W1F

or load them from file:

$ amazon-scraper --file asins.txt

the file needs to have each asin on one line, like this:

B01H2E0J5M
B01GYLZD8C
B0736R3W1F

Output

amazon-scraper downloads pages, reviews, questions and answers.
It will save its output in folders:

pages will contain the main pages of the product, useful for extracting more info about the product.
You can disable this function by using the option --save-pages

results will contain the reviews, organized in json files.
You can disable scraping the reviews by using the option --no-reviews

questions will contain the questions and answers, organized in json files.
You can disable scraping the questions by using the option --no-questions

Options

positional arguments:
  asin                  Amazon asin(s) to be scraped

optional arguments:
  -h, --help            show this help message and exit
  --file FILE, -f FILE  Specify path to list of asins
  --save-main-pages, -p
                        Saves the main pages scraped
  --verbose, -v         Logging verbosity level
  --no-reviews          Do not scrape reviews
  --no-questions        Do not scrape questions

References

instagram-scraper has been used as a reference for the structure of the program.
this blogpost has been very useful in understanding the issues of building a scraper.

huochequan / amazon-scraper Goto Github PK

amazon-scraper's Introduction

Amazon Scraper

Table of contents

Installation

via pip

cloning this repository

Usage

Output

Options

References

amazon-scraper's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent