Amzscout-scrape

Generate a dataset ready for data science from AMZScout without paying a dime.

And it’s further described in this paragraph. See the docs 📚 for more info.

About

This uses undetected chromedriver and the AMZScout extension njopapoodmifmcogpingplfphojnfeea to extract information from Amazon product pages.

Using

Dependencies

Chrome or Chromium

Windows

winget install --id Google.Chrome

Ubuntu & Debian

sudo apt-get install chromium-browser

MacOS & Other

Download here like anywhere else

`chromedriver`

Make sure you install chromedriver on your system alongside a recent version of Chrome or Chromium.

brew (macOS only)

brew install chromedriver

Scoop (Windows)

Note: scoop only has chromedriver for win32 platforms. If you're on a 64-bit system, your mileage may vary.

scoop install chromedriver

Other platforms

Refer to https://sites.google.com/chromium.org/driver/downloads?authuser=0 for instructions downloading chromedriver on your respective system.

poetry

With a good version of Python installed, run the following:

pip install pipx
pipx install poetry

If you get a memo about it not being on your path, Google will be your best friend.

Installation

git clone https://github.com/regulad/amzscout-scrape.git
cd amzscout-scrape
poetry install

Usage

Usage is about as simple as it gets.

poetry run amzscout-scrape --help

It will keep you updated, although it will take a hot minute.

Once it's done, you'll have a CSV file in the current directory.

Proxy

Included is a simple Tailscale configuration that serves a SOCKS5 proxy on your local machine.

Copy the contents of proxy.env-example into .env, populate the fields and then perform the following command on a machine that has docker installed:

docker-compose up -d

Then, you can use the --proxy flag to use the proxy.

poetry run amzscout-scrape --proxy socks5://localhost:1055

Appendix

Licensed under the terms of the Apache License 2.0. New issues and pull requests are welcome. Please refer to the contributing guide and security policy. Generated with Tyrannosaurus.

regulad / amzscout-scrape Goto Github PK

amzscout-scrape's Introduction