Giter Club home page Giter Club logo

amazon-scraper-python's Introduction

amazon-scraper-python

Travis Coveralls github PyPI Docker Build Status License

Description

This package allows you to search for products on Amazon and extract some useful information (ratings, number of comments).

I wrote a French blog post about it here

Requirements

  • Python 3
  • pip3

Installation

pip3 install -U amazonscraper

Command line tool amazon2csv.py

After the package installation, you can use the amazon2csv.py command in the terminal.

After passing a search request to the command (and an optional maximum number of products), it will return the results as csv :

amazon2csv.py --keywords="Python programming" --maxproductnb=2
Product title,Rating,Number of customer reviews,Product URL
"Python Crash Course: A Hands-On, Project-Based Introduction to Programming",4.5,309,https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036
"A Smarter Way to Learn Python: Learn it faster. Remember it longer.",4.8,144,https://www.amazon.com/Smarter-Way-Learn-Python-Remember-ebook/dp/B077Z55G3B

You can also pass a search url (if you added complex filters for example), and save it to a file :

amazon2csv.py --url="https://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=python+scraping" > output.csv

You can then open it with your favorite spreadsheet editor (and play with the filters) :

snapshot amazon2csv

More info about the command in the help :

amazon2csv.py --help

Using the amazonscraper Python package

# -*- coding: utf-8 -*-
import amazonscraper

results = amazonscraper.search("Python programming")

for result in results:
    print("%s (%s out of 5 stars, %s customer reviews) :  %s" % (result.title, result.rating, result.review_nb, result.url))

print("Number of results : %d" % (len(results)))

Which will output :

Learning Python, 5th Edition (4.0 out of 5 stars, 293 customer reviews) : https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730
Fluent Python: Clear, Concise, and Effective Programming (4.6 out of 5 stars, 87 customer reviews) : https://www.amazon.com/Fluent-Python-Concise-Effective-Programming/dp/1491946008
[...]
Number of results : 3000

Attributes of the Product object

Attribute name Description
title Product title
rating Rating of the products (number between 0 and 5, False if missing)
review_nb Number of customer reviews (False if missing)
url Product URL

Docker

You can use the amazon2csv tool with the Docker image

You may execute :

docker run -it --rm thibdct/amazon2csv --keywords="Python programming" --maxproductnb=2

๐Ÿค˜ The easy way ๐Ÿค˜

I also built a bash wrapper to execute the Docker container easily.

Install it with :

curl -s https://raw.githubusercontent.com/tducret/amazon-scraper-python/master/amazon2csv \
> /usr/local/bin/amazon2csv && chmod +x /usr/local/bin/amazon2csv

You may replace /usr/local/bin with another folder that is in your $PATH

Check that it works :

On the first execution, the script will download the Docker image, so please be patient

amazon2csv --help
amazon2csv --keywords="Python programming" --maxproductnb=2

You can upgrade the app with :

amazon2csv --upgrade

and even uninstall with :

amazon2csv --uninstall

TODO

  • If no product was found with the CSS selectors, it may be a new Amazon page style => change user agent and get the new page. Loop on all the user agents and check all the CSS selectors again
  • Find a way to get the products without css selectors

amazon-scraper-python's People

Contributors

tducret avatar tducretcnes avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.