Giter Club home page Giter Club logo

deals-scraper's Introduction

Python

Please star the repo if you found it useful! Thank you!

Deals Scraper

Table of Contents

Deals Scraper is a Canadian tool to find good deals on websites like Facebook Marketplace, Kijiji, Ebay, Amazon and Lespacs

  • Zooming fast
  • Specify a price range (Min & Max) per scraper
  • Blacklist keywords from the scraping
  • Strict mode so only ads containing your keywords are picked
  • Schedule the project to run on a recurring basis
  • Easily add your own website if you know what you're doing

Clone the repo

  git clone https://github.com/JustSxm/Deals-Scraper.git

Install dependencies

  pip install -r requirement.txt

Open the config file (config.ini) and find the section named DEFAULT

[DEFAULT]
Keywords = airpods pro
Exclusions = case
StrictMode = False
Interval = 1
  • Keywords: This config sets the keywords that the software will search for.
  • Exclusions: This config sets the words or phrases that the software will ignore.
  • StrictMode: This config determines whether the software will strictly match the keywords and exclusions or be more flexible in its search.
  • Interval: This config sets the time interval for how often the software will search for the keywords in minutes.

Open the config file (config.ini) and find the section named FACEBOOK

[FACEBOOK]
Enabled = True
CityId = 
MinPrice = 0
MaxPrice = 1000
SortBy = distance_ascend
; best_match, price_ascend, price_descend, distance_ascend, creation_time_descend
  • Enabled: This config determines whether the Facebook module is enabled (True) or disabled (False).
  • CityId: This config sets the ID for the desired city or location for the Facebook search.
  • MinPrice: This config sets the minimum price range for the Facebook search.
  • MaxPrice: This config sets the maximum price range for the Facebook search.
  • SortBy: This config sets the sorting method for the Facebook search results. Available options include "best_match" (sorted by Facebook's relevance algorithm), "price_ascend" (sorted by price in ascending order), "price_descend" (sorted by price in descending order), "distance_ascend" (sorted by distance in ascending order), and "creation_time_descend" (sorted by time in descending order).

To find your CityId:

After make sure to login to Facebook, this scraper uses your browser to scrape, it doesn't connect automatically.

Kijiji

Open the config file (config.ini) and find the section named Kijiji

[KIJIJI]
Enabled = True
CityUrl = 
Identifier = 
MinPrice = 20
MaxPrice = 100
Type = ownr
; ownr, delr, all
  • Enabled: This config determines whether the Kijiji module is enabled (True) or disabled (False).
  • CityUrl: This config sets the URL for the Kijiji website for the desired city or location.
  • Identifier: The identifier with the city url.
  • MinPrice: This config sets the minimum price range for the Kijiji search.
  • MaxPrice: This config sets the maximum price range for the Kijiji search.
  • Type: This config sets the type of Kijiji ads to search for, which can be "ownr" (owner-sold ads), "delr" (dealer-sold ads), or "all" (both types).

To find your CityId and Identifier:

  • go to kijiji and search for something random with ads results
  • Copy your city's id and identifier
    Image of Kijiji City's URL

Open the config file (config.ini) and find the section named Ebay

[EBAY]
Enabled = True
MinPrice = 20
MaxPrice = 100
  • Enabled: This config determines whether the eBay module is enabled (True) or disabled (False).
  • MinPrice: This config sets the minimum price range for the eBay search.
  • MaxPrice: This config sets the maximum price range for the eBay search.

The LesPACs configuration provided in this repository is not currently implemented due to the website either undergoing a rewrite or implementing security measures to prevent web scraping. Unfortunately, without more information from LesPACs themselves, it is not possible to provide an ETA for when the configuration will be functional again.

Amazon

If you are looking to use the Amazon configuration in this repository, please note that it is not included in the current version and is not planned to be included in the future. This is due to the implementation of anti-scraping measures on the Amazon website, such as CAPTCHAs, which make it difficult or impossible to retrieve data using a web scraper. As such, the Amazon configuration provided in previous versions of the repository may no longer be functional.

Running

Run the python script

  python main.py

Facebook uses a city to look around as it is not international, you can find the id by looking for your city on facebook and copy the id of their page. (usually facebook.com/..../place/id)

Facebook is a good website for scraping.

Kijiji

Kijiji is a good website for scraping

Ebay

Ebay is an okay website for scraping

Amazon

Since Amazon is a vast website, it is way harder to find new ads and to precise what we want, therefore you will most likely get garbage from it than what you're actually looking. It could be fine if you're looking for the cheapest price for a "popular" item

Amazon is a bad website for scraping

Lespacs

Lespacs is just like kijiji except it is more Quebec centered than Canada, therefore it can be a bad site for scraping if you are not from Quebec, otherwise it is a pretty good one

Outside of Quebec: Lespacs is a bad website for scraping

Inside of Quebec: Lespacs is a good website for scraping

This project was made with the help of scrapy

Contributors

README - ChatGPT

License

MIT

deals-scraper's People

Contributors

justsxm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deals-scraper's Issues

Feature Request: Ignore previous ads

First things first thanks for this project, it was very good! Could u think about this following feature request please? Instead show every ad everytime, ignore ads found previously

Price drop instead of title

I would like to report an issue, when every ad has a price drop, price drop stay on value of title, like print below

image

image

Facebook Scraping

Is this currently working with Facebook? I have tried multiple different configurations and get hits from Kijiji, but nothing from Facebook. Here's the latest config file I attempted, it is very broad (searching for all "phone" posts in toronto in the last week) but gets no hits.

[DEFAULT]
keywords = phone
exclusions =
maxprice = 2000
minprice = 0
enablefacebook = True
enablekijiji = False
enableebay = False
enableamazon = False
enablelespacs = False
strictmode = True
facebookcityid = 110941395597405
; facebook use the id of the closest city to you for the searches, if not set it will return no ads
interval = 2
; every minutes the bot should scrape

[FACEBOOK]
date_listed = 7
; 0 = off, 1 = last 24h, 7 = days and 30 = 30 days
sortby = distance_ascend
; best_match, distance_ascend, creation_time_descend, price_ascend, price_descend

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.