Generate a dataset ready for data science from AMZScout without paying a dime.
And itβs further described in this paragraph. See the docs π for more info.
This uses undetected chromedriver and the AMZScout extension njopapoodmifmcogpingplfphojnfeea
to extract information from Amazon product pages.
winget install --id Google.Chrome
sudo apt-get install chromium-browser
Download here like anywhere else
Make sure you install chromedriver
on your system alongside a recent version of Chrome or Chromium.
brew install chromedriver
Note: scoop only has chromedriver for win32 platforms. If you're on a 64-bit system, your mileage may vary.
scoop install chromedriver
Refer to https://sites.google.com/chromium.org/driver/downloads?authuser=0 for instructions downloading chromedriver on your respective system.
With a good version of Python installed, run the following:
pip install pipx
pipx install poetry
If you get a memo about it not being on your path, Google will be your best friend.
git clone https://github.com/regulad/amzscout-scrape.git
cd amzscout-scrape
poetry install
Usage is about as simple as it gets.
poetry run amzscout-scrape --help
It will keep you updated, although it will take a hot minute.
Once it's done, you'll have a CSV file in the current directory.
Included is a simple Tailscale configuration that serves a SOCKS5 proxy on your local machine.
Copy the contents of proxy.env-example
into .env
, populate the fields and then perform the following command on a machine that has docker installed:
docker-compose up -d
Then, you can use the --proxy
flag to use the proxy.
poetry run amzscout-scrape --proxy socks5://localhost:1055
Licensed under the terms of the Apache License 2.0. New issues and pull requests are welcome. Please refer to the contributing guide and security policy. Generated with Tyrannosaurus.