Giter Club home page Giter Club logo

kijiji-scraper's Introduction

kijiji-scraper

A lightweight node.js module for retrieving and scraping ads from Kijiji.

Features

  • Retrieve single ads as JavaScript objects given their URL
  • Retrieve the latest ads matching given search criteria

Dependencies

Installation

npm install kijiji-scraper

Documentation

Quick start: Use Ad.Get() to scrape an ad given its URL. Use search() to scrape many ads given a set of search parameters. Read on (or CTRL+F) for more detailed information. Documentation can also be found in the TSDoc comments in this module's TypeScript type definition files (.d.ts files).

Ad class

This class encapsulates a Kijiji ad and its properties. It also handles retrieving this information from Kijiji.

Properties

Property Type Description
title String Title of the ad
description String Ad description
date Date Date the ad was posted
image String URL of the ad's primary image
images String[] Array of URLs of the ad's images
attributes Object Properties specific to the category of the scraped ad
url String The ad's url
id String Unique identifier of the ad

The image URL given in image is the featured image for the ad. The image URLs given in images are all of the images associated with the ad.

Note: If the ad has not been scraped automatically, some of these properties may be null or empty. This happens when an Ad object is created manually using the constructor or by performing a search with the scrapeResultDetails option set to false. See the Ad.isScraped() and Ad.scrape() method documentation below for more information on this.

Methods

Ad.Get(url[, options, callback])

Will scrape the Kijiji ad at url and construct an Ad object containing its information.

Arguments
  • url - A Kijiji ad URL
  • options (optional) - Options to pass to the scraper. See Scraper Options for details
  • callback(err, ad) (optional) - A callback called after the ad has been scraped. If an error occurs during scraping, err will not be null. If everything is successful, ad will contain an Ad object
Return value

Returns a Promise which resolves to an Ad object containing the ad's information.

Example usage
const kijiji = require("kijiji-scraper");

// Scrape using returned promise
kijiji.Ad.Get("<Kijiji ad URL>").then(ad => {
    // Use the ad object
    console.log(ad.title);
}).catch(console.error);

// Scrape using optional callback paramater
kijiji.Ad.Get("<Kijiji ad URL>", {}, (err, ad) => {
    if (!err) {
        // Use the ad object
        console.log(ad.title);
    }
});
Ad(url[, info, scraped])

Ad constructor. Manually constructs an ad object. You should generally not need to use this save for a few special cases (e.g., storing ad URLs entered by a user for delayed scraping). Ad.isScraped() returns false for Ad objects constructed in this way unless scraped is passed as true or they are subsequently scraped by calling Ad.scrape(), which causes the scraper to replace the ad's information with what is found at its URL.

Arguments
  • url - Ad's URL
  • info (optional) - Object containing the ad's properties. Only keys in the properties table (above) may be specified. May be omitted (if not specified then images will be an empty array, attributes will be an empty object, and all other properties will be null)
  • scraped (optional) - If true, causes Ad.IsScraped() to return true regardless of whether or not Ad.scrape() has been called
Example usage
const kijiji = require("kijiji-scraper");

const ad = kijiji.Ad("<Kijiji ad URL>", { date: new Date() });
console.log(ad.isScraped()); // false
console.log(ad.date); // current date

ad.scrape().then(() => {
    // Use the ad object
    console.log(ad.date); // date ad was posted (initial value is overwritten)
}).catch(console.error);
Ad.isScraped()

Determines whether or not the ad's information has been retrieved from Kijiji.

Return value

Returns a boolean indicating whether or not an ad's information has been scraped from the page at its URL. This can be false if the Ad object was manually created using the constructor or if it was retrieved from a search with the scrapeResultDetails option set to false. Call Ad.scrape() to retrieve the information for such ads.

Example usage
const kijiji = require("kijiji-scraper");

const ad = kijiji.Ad("<Kijiji ad URL>");  // ad does not get scraped
console.log(ad.isScraped()); // false

ad.scrape().then(() => {
    console.log(ad.isScraped()); // true
}).catch(console.error);
Ad.scrape([options, callback])

Manually retrieves an Ad's information from its URL. Useful if it was created in a way that does not do this automatically, such as using the constructor or performing a search with the scrapeResultDetails option set to false.

Arguments
  • options (optional) - Options to pass to the scraper. See Scraper Options for details
  • callback(err) (optional) - A callback called after the ad has been scraped. If an error occurs during scraping, err will not be null
Return value

Returns a Promise which resolves once the ad has been scraped and the object has been updated.

Example usage
const kijiji = require("kijiji-scraper");

const ad = kijiji.Ad("<Kijiji ad URL>");  // ad does not get scraped
console.log(ad.isScraped()); // false

// Scrape using returned promise
ad.scrape().then(() => {
    // Use the ad object
    console.log(ad.isScraped()); // true
    console.log(ad.title);
}).catch(console.error);

// Scrape using optional callback paramater
ad.scrape({}, err => {
    if (!err) {
        // Use the ad object
        console.log(ad.isScraped()); // true
        console.log(ad.title);
    }
});
Ad.toString()

Returns a string representation of the ad. This is just meant to be a summary and may omit information for brevity or change format in the future. Access the Ad's properties directly if you need them for comparisons, etc. The current format is as follows:

[MM/dd/yyyy @ hh:mm] TITLE
URL
* property1: value1
* property2: value2
...
* propertyN: valueN

The date, title, and properties will be absent if the ad has not been scraped (isScraped() == false) unless they were manually specified when the object was constructed.

Example usage
const kijiji = require("kijiji-scraper");

kijiji.Ad.Get("<Kijiji ad URL>").then(ad => {
    console.log(ad.toString());
}).catch(console.error);

Searching for ads

Searches are performed using the search() function:

search(params[, options, callback])

Arguments
  • params - Object containing Kijiji ad search parameters.

    • Mandatory parameters:

      Parameter Type Default Value Description
      locationId Integer/Object 0 (all of Canada) Id of the geographical location to search in
      categoryId Integer/Object 0 (all categories) Id of the ad category to search in

      Values for locationId and categoryId can be found by performing a search on the Kijiji website and examining the URL that Kijiji redirects to. For example, after setting the location to Ottawa and selecting the "cars & vehicles" category, Kijiji redirects to http://www.kijiji.ca/b-cars-vehicles/ottawa/c27l1700185. The last part of the URL (c27l1700185) is formatted as c[categoryId]l[locationId]. So in this case, categoryId is 27 and locationId is 1700185.

      Location and category objects

      For convenience, objects containing all locationId and categoryId values Kijiji accepts have been defined in locations.ts and categories.ts, respectively. These objects are nested in the same way as those in the location and category selectors on the Kijiji website (e.g., the city of Montreal is located under "Quebec > Greater Montreal > City of Montreal"; coffee tables are located under "Buy and Sell > Furniture > Coffee Tables"), so their contents should be familiar.

      For example, instead of setting locationId to 1700281 (Montreal) and categoryId to 241 (coffee tables), you can set locationId to locations.QUEBEC.GREATER_MONTREAL.CITY_OF_MONTREAL and categoryId to categories.BUY_AND_SELL.FURNITURE.COFFEE_TABLES. You no longer need to know the ids, and you have a quick reference available. Any location/category object along the hierarchy will also work (e.g., locations.QUEBEC for all of Quebec, not just Montreal; categories.BUY_AND_SELL.FURNITURE for all furniture, not just coffee tables). The root objects themselves specify all locations/categories (id of 0). Location/category objects and locationIds/categoryIds are interchangeable - the search function will behave identically in either case. See locations.ts and categories.ts for all location and category objects.

    • Optional parameters: There are many different search parameters. Some of these can be used in any search (i.e., minPrice), but most are category-specific. Additionally, some parameters are specific to which scraperType is being used (see Scraper Options for details on how to switch).

      • Some known parameters available when using either the "html" (default) or "api" (currently broken) scraperType:

        Parameter Type Description
        minPrice Number Minimum price of returned items
        maxPrice Number Maximum price of returned items
        adType String Type of ad ("OFFER", "WANTED", or undefined - for both). If using the "api" scraperType then "OFFERED" must be used instead of "OFFER".
      • Some known parameters available when using the "api" (currently broken) scraperType:

        Parameter Type Description
        q String Search string
        sortType String Search results ordering (e.g., "DATE_DESCENDING", "DISTANCE_ASCENDING", "PRICE_ASCENDING", "PRICE_DESCENDING")
        distance Number Distance in kilometers
        priceType String Type of price (e.g., "SPECIFIED_AMOUNT", "PLEASE_CONTACT", "FREE", "SWAP_TRADE")
      • Some known parameters available when using the "html" (default) scraperType:

        Parameters to use with the scraperType="html" can be easily found by using your browser's developer tools and performing a custom search on the Kijiji website. After submitting your search on Kijiji or updating the filter being applied, use your browser's network monitoring tool to examine the request for https://www.kijiji.ca/b-search.html. Any parameter used in the query string for this request is able to be specified in params. A few examples include:

        Parameter Type Description
        keywords String Search string
        sortByName String Search results ordering (e.g., "dateDesc", "dateAsc", "priceDesc", "priceAsc")
  • options (optional) - Contains parameters that control the behavior of searching and scraping. Can be omitted. In addition to the options below, you can also specify everything in Scraper Options.

    Option Type Default Value Description
    pageDelayMs Integer 1000 Amount of time in milliseconds to wait between scraping each result page. This is useful to avoid detection and bans from Kijiji.
    minResults Integer 20 Minimum number of ads to fetch (if available). Note that Kijiji results are returned in pages of up to 20 ads, so if you set this to something like 29, up to 40 results may be retrieved. A negative value indicates no limit (retrieve as many ads as possible). If negative or not specified and maxResults > 0, minResults will take on the value of maxResults.
    maxResults Integer -1 Maximum number of ads to return. This simply removes excess results from the array that is returned (i.e., if minResults is 40 and maxResults is 7, 40 results will be fetched from Kijiji and the last 33 will be discarded). A negative value indicates no limit. If greater than zero and minResults is unspecified, or if minResults is negative, this value will also be used for minResults.
    scrapeResultDetails Boolean true When using the HTML scraper, the details of each query result are scraped in separate, subsequent requests by default. To suppress this behavior and return only the data retrieved by the initial query, set this option to false. Note that ads will lack some information if you do this and Ad.isScraped() will return false until Ad.scrape() is called to retrieve the missing information. This option does nothing when using the API scraper (default).
    resultDetailsDelayMs Integer 500 When scrapeResultDetails is true, the amount of time in milliseconds to wait in between each request for result details. A value of 0 will cause all such requests to be made at the same time. This is useful to avoid detection and bans from Kijiji.
  • callback(err, results) (optional) - A callback called after the search results have been scraped. If an error occurs during scraping, err will not be null. If everything is successful, results will contain an array of Ad objects.

Return value

Returns a Promise which resolves to an array of search result Ad objects.

Note: Ads may not appear in search results (or the Kijiji website, for that matter) for a short time after they are created (usually no more than 1 minute). This means that when searching, you are not guaranteed to receive extremely recent ads. Such ads will be returned in future searches but their date property will reflect the time that they were actually created.

Example usage
const kijiji = require("kijiji-scraper");

const options = {
    minResults: 20
};

const params = {
    locationId: 1700185,  // Same as kijiji.locations.ONTARIO.OTTAWA_GATINEAU_AREA.OTTAWA
    categoryId: 27,  // Same as kijiji.categories.CARS_AND_VEHICLES
    sortByName: "priceAsc"  // Show the cheapest listings first
};

// Scrape using returned promise
kijiji.search(params, options).then(ads => {
    // Use the ads array
    for (let i = 0; i < ads.length; ++i) {
        console.log(ads[i].title);
    }
}).catch(console.error);

// Scrape using optional callback parameter
function callback(err, ads) {
    if (!err) {
        // Use the ads array
        for (let i = 0; i < ads.length; ++i) {
            console.log(ads[i].title);
        }
    }
}
kijiji.search(params, options, callback);

Scraper options

Functions that involve retrieving data from Kijiji (Ad.Get(), Ad.scrape(), and search()) take an optional parameter for scraper options. The options are as follows:

Option Type Default Description
scraperType String "api" How to scrape Kijiji. "html" to scrape the website (default) and "api" to use the mobile API (currently broken). If you have trouble with one, try the other. It seems that the mobile API doesn't have a rate limit or lockout mechanism (yet; please don't abuse this).

kijiji-scraper's People

Contributors

arthurg avatar dependabot[bot] avatar malavv avatar mikedidomizio avatar mrdos avatar mwpenny avatar sockcrates avatar technoligest avatar vtsatskin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kijiji-scraper's Issues

Consideration for expanding repo to Kijiji API

Let me know what you think of the idea of expanding this repo to be a full Kijiji API. I understand if It might be too much work or out of the scope of this project, but let me know if you would open to the idea of me writing some functions like methods for posting ads.
This also possibly has the opportunity of turning Kijiji into a platform full of bots. But anyone can write code so why hasn't a private repo already be made to do that? Who knows, just leave it up for consideration and I can start writing methods for stuff like that. If the answer is yes, some documentation or a basic breakdown of how the codebase works would be cool because I personally cannot understand any of it.
It might make the repo better but if there is no need for it then there is no reason. Personally I just wanted to make a function to post ads but saw it was probably out of the scope of this project. Let me know your thoughts.

MinResults and MaxResults in Search() function implemented improperly, Doesn't allow for getting every listing available.

Currently there is no real way to get all the available listings using MinResult and MaxResult. You'd need to set MinResult to something like 60 and if there are only 59 listings, it will error with this message:
Error parsing Kijiji search results: Result ad has no URL. It is possible that Kijiji changed their markup. If you believe this to be the case, please open an issue at: https://github.com/mwpenny/kijiji-scraper/issues

With the way MinResults works, you specify the minimum and you get no more. While the way it should work is say setting MinResults to 40 and MaxResults to 50 and getting 50. This allows for all the functionality of selecting how many listings you can get and also getting as many as there is.

To summarize, even if you set MaxResults to -1 indicating no limit, you will only receive how many MinResults is rounded every 20. Instead it should get as many listings as it can before reaching the MaxLimit. This way you can set MinResults to 20 and MaxResults to 60 and get 60 listings.

Kijiji Mingle API Issue

Is anyone else having this issue? It appears the mingle api is no longer being used by the app?

Missing very new ads when using the Mobile API

Issue:
Ads aren't retrieved while using the Mobile API scrape method for up to a minute after they are posted.

Details:
I have been playing around with this module and have noticed an odd behavior while testing. Lets say we want to write to the console every time a new ad is posted. To do this, we compare the time of our last scrape against the value of the date stored in each ad object retrieved by the scraper. If any ads were posted after our last scrape time, notify the user about them.

  • Last scan was at: 5:10pm
  • Ad posted at: 5:12pm
  • Ad was posted after last scan, so ad is new. Show ad to user.

The issue here is that, at least when using the Mobile API, certain ads won't be picked up for up to a minute after their post date found in the Ad object. If we change the above example around a little bit:

  • Last scan was at 5:10:00pm
  • Ad posted at 5:09:30pm

There is a good chance this ad was not actually picked up during the 5:10pm scan. At that point the "last scan" value would have been something earlier, say 5:05pm, and the ad would have been seen as new. As the ad was not actually picked up during that scrape, its corresponding Ad object is not available to be examined at 5:10pm. Our current scrape at, lets say, 5:15pm picks up the ad, but at this point it is considered to be old since it was posted prior to the last time we ran a scan. The user never finds out about this new posting.

Notes:
I have yet to test this using the http scraper, that is next on the list. If the behavior is consistent I believe there would be two possible culprits:

  1. Ads just don't show up on the website for up to a minute after a user clicks post. This is the "easy" answer as it's out of our hands and just needs to be worked around.
  2. For some reason the module is ignoring or removing posts which are "too new". This feels pretty unlikely.

Of course if using the http scraper fixes the issue, then the problem would either be found in the Mobile API or code specific to its handling within the module.

I will update if I find out more.

getting unidentified every time i run

Hello, I am having a small issue and can't seem to figure out how to solve it, I follow the setup and such on git readme. but when I run the script, I get undefined. im using it in my angular cli project

let kijiji = require("kijiji-scraper");

let prefs = {
    "locationId": 9004,
    "categoryId": 54,
    "scrapeInnerAd":false
};

let params = {
  keywords: "Web" + "designer" + "developer",
  adType: "OFFER"
};

kijiji.query(prefs, params, function(err, ads) {
  console.log(kijiji.parse(ads));
});

can't seem to get this working

I'm trying to add this library to a laravel project

I ran npm install kijiji-scraper

then I add the example code to my app.js:

var kijiji = require("kijiji-scraper")

var prefs = {
    "locationId": 27,
    "categoryId": 1700185
}

var params = {
    "minPrice": 0,
    "maxPrice": 100000,
    "keywords": "toyota",
    "adType": "OFFER"
}

kijiji.query(prefs, params, function(err, ads) {
    //Use the ads array
    console.log(ads);
});

then when I do npm run watch I get these errors:

 ERROR  Failed to compile with 7 errors                                                                                                                                   21:54:15

These dependencies were not found:

* fs in ./~/kijiji-scraper/~/request/lib/har.js
* net in ./~/forever-agent/index.js, ./~/tough-cookie/lib/cookie.js and 1 other
* tls in ./~/forever-agent/index.js, ./~/kijiji-scraper/~/tunnel-agent/index.js

To install them, you can run: npm install --save fs net tls


This relative module was not found:

* ./package in ./~/cheerio/index.js

I did try installing all these dependencies manually although I don't think I should need to since they are dependencies of the kijiji-scraper itself

but even after adding these I still get these errors:

 ERROR  Failed to compile with 2 errors                                                                                                                                   21:49:11

This dependency was not found:

* fs in ./~/kijiji-scraper/~/request/lib/har.js

To install it, you can run: npm install --save fs


This relative module was not found:

* ./package in ./~/kijiji-scraper/~/cheerio/index.js

any help would be appreciated

Not working

Doesnt seem to be working. Has Kijiji changed something?

es6 notation

Would you be open to updating the js syntax to es6+? I could help with this.

Exception thrown if there is partner's ad between kijiji ads

Hello,

I am having issue of scrapping kijiji job ads with kijiji-scraper. it works well with other type of ads but if there are partner's ads between kijiji ads, it throws exception.

Here is code that I tried:

const kijiji = require("kijiji-scraper");

let options = {
    minResults: 40,
    maxResults: -1,
    keywords: "part time"
};

// https://www.kijiji.ca/b-part-time-student-jobs/calgary/c59l1700199
let params = {
    locationId: kijiji.locations.ALBERTA.CALGARY,
    categoryId: kijiji.categories.JOBS,
    sortByName: "dateAsc"
};

kijiji.search(params, options).then(function(ads) {
    // Use the ads array
    for (let i = 0; i < ads.length; ++i) {
        console.log('*Ad# ' + (i+1).toString());
        console.log(ads[i].toString());
        console.log('==========================================================================')
    }
}).catch(console.error);

And here is error for above code:

{ FetchError: request to https://www.kijiji.cahttps//www.ziprecruiter.com/clk/randstad-quebec-00000000-dual-ticket-millwright-electrician-7914_022695699?clk=J8r4wWhuP1tKESKoqRxo7xs8qGgqL7HPWR5y2wvYRChGoq_3bMoe83zQfitL_n7cts9PdPbKc2yKLleLEh4vRo1ggbhdJmtKbKYUKLpaKDlgUGLrCblfhk1UPvWJLKlw_h8rSEvPTTJIQ7VAMjpETPY_x0p-8H24nGUuGR7snq_GJcZ84Fhz6RiRxzbxItrlkeznWa4quL7aL9ZlB7myV2Q5DoUVtbf8JlTcTX-WWbSrfz1_d30bsPVwuPbhMY2_NS9aOq-4VodTlcWCF33VisQ1H8dRRDyryQSmZchrhxENiNpMHsmVtOrg8Ikb7-XutxioKDoyfiokIwirFnct9az8GOEAQwUCBe8X0VjXjX8GEB5RLcHVvSNJhzg7ZM4kj7FyRurgBUb6BZTCXzjmcUxU4EAw-Me_2eIpToOwO9GQRoMgZUcpFD8JQSxw2h9zKrpoT6I0r4AEA_43588plfsLwTcY453Rpyk5Is60pKkL_KKssmKDdrCYBhuzHz1naG4PfLrz-jRT-jXHBUAnrCV1czx5CsEWtoi3DuVybV4.9e5d32cbf3aaf685c73bb89c76eb0f30 failed, reason: getaddrinfo ENOTFOUND www.kijiji.cahttps www.kijiji.cahttps:443
    at ClientRequest.<anonymous> (/Users/wojung/partimer/node_modules/node-fetch/lib/index.js:1358:11)
    at emitOne (events.js:116:13)
    at ClientRequest.emit (events.js:211:7)
    at TLSSocket.socketErrorListener (_http_client.js:387:9)
    at emitOne (events.js:116:13)
    at TLSSocket.emit (events.js:211:7)
    at emitErrorNT (internal/streams/destroy.js:64:8)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
  message: 'request to https://www.kijiji.cahttps//www.ziprecruiter.com/clk/randstad-quebec-00000000-dual-ticket-millwright-electrician-7914_022695699?clk=J8r4wWhuP1tKESKoqRxo7xs8qGgqL7HPWR5y2wvYRChGoq_3bMoe83zQfitL_n7cts9PdPbKc2yKLleLEh4vRo1ggbhdJmtKbKYUKLpaKDlgUGLrCblfhk1UPvWJLKlw_h8rSEvPTTJIQ7VAMjpETPY_x0p-8H24nGUuGR7snq_GJcZ84Fhz6RiRxzbxItrlkeznWa4quL7aL9ZlB7myV2Q5DoUVtbf8JlTcTX-WWbSrfz1_d30bsPVwuPbhMY2_NS9aOq-4VodTlcWCF33VisQ1H8dRRDyryQSmZchrhxENiNpMHsmVtOrg8Ikb7-XutxioKDoyfiokIwirFnct9az8GOEAQwUCBe8X0VjXjX8GEB5RLcHVvSNJhzg7ZM4kj7FyRurgBUb6BZTCXzjmcUxU4EAw-Me_2eIpToOwO9GQRoMgZUcpFD8JQSxw2h9zKrpoT6I0r4AEA_43588plfsLwTcY453Rpyk5Is60pKkL_KKssmKDdrCYBhuzHz1naG4PfLrz-jRT-jXHBUAnrCV1czx5CsEWtoi3DuVybV4.9e5d32cbf3aaf685c73bb89c76eb0f30 failed, reason: getaddrinfo ENOTFOUND www.kijiji.cahttps www.kijiji.cahttps:443',
  type: 'system',
  errno: 'ENOTFOUND',
  code: 'ENOTFOUND' }

output is invalid json ?

Getting closer to getting this to work.. very easy and useful so far!

I'm parsing my ads and then returning the output to php so it can store it in a database.

But I'm running into a problem because the json that is returned in invalid. Below is an example of the json returned for 1 ad

{ title: 'EVGA GTX 1070 Founder&apos;s Edition',
  link: 'https://www.kijiji.ca/v-ordinateurs-de-bureau/ville-de-montreal/evga-gtx-1070-founders-edition/1276760658',
  description: 'Like-New condition EVGA GTX 1070 Founder&apos;s Edition for sale. The price is firm. Contact me if you are interested.',
  enclosure: '',
  pubDate: 'Tue, 27 Jun 2017 01:41:02 GMT',
  guid: 'https://www.kijiji.ca/v-ordinateurs-de-bureau/ville-de-montreal/evga-gtx-1070-founders-edition/1276760658',
  'dc:date': '2017-06-27T01:41:02Z',
  'geo:lat': '45.5029532',
  'geo:long': '-73.57979089999999',
  'g-core:price': '600.0',
  innerAd:
   { title: 'EVGA GTX 1070 Founder\'s Edition',
     image: 'https://i.ebayimg.com/00/s/ODAwWDgwMA==/z/qysAAOSwbtVZUbef/$_35.JPG',
     images: [ 'https://i.ebayimg.com/00/s/ODAwWDgwMA==/z/qysAAOSwbtVZUbef/$_57.JPG' ],
     info:
      { Brand: 'Other',
        'For Sale By': 'Owner',
        'Date Listed': 2017-06-27T01:41:02.000Z,
        Price: '$600.00',
        Address: 'Montreal, QC H3A1A8',
        Type: 'OFFER',
        Visits: 144 },
     desc: 'Like-New condition EVGA GTX 1070 Founder\'s Edition for sale. The price is firm. Contact me if you are interested.' } }

There's many things wrong here and it fails json validators (such as https://jsonlint.com/). The use of single quotes everywhere should be double quotes. And strings (such as title:, link:, description:, etc should all be enclosed within double quotes

I tried to fix this by using "json fixers" both in JS and PHP and I've come close, I could probably do it but it would be better if the json output was fixed at the source

Support location radius for search

Thanks for this great tool!

The kijiji url appears to pack the kilometer radius parameter before the query string, e.g. for 3 km radius:

https://www.kijiji.ca/b-appartement-condo/ville-de-montreal/1+salle+de+bains-3+1+2/c37l1700281a120a27949001r3.0?...

Is there a way to query with a radius currently? Or would the library need to be updated?

Can't retrieve ads using webpack proxy

Hi,

I'm trying to make this work with in a basic Vue project build with Vue CLI 3 and using webpack. Since I can't directly fetch https://www.kijiji.ca using fetch() (it's returning a CORS error), i changed in the source code the KIJIJI_BASE_URL variable to "/kijiji" and set a proxy in my webpack config:

"devServer": {
    proxy: {
      "/kijiji":  {
        target: "https://www.kijiji.ca",
        pathRewrite: { '^/kijiji': '' }
      }
    }
  }

Using this configuration, im getting a response.status = 200 in the fetch of function initFirstResultPagePath() but it seems there is a problem after that when trying to parse the body.
If I set a breakpoint on let results = parseResultsHTML(body);, what I have in body is:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width,initial-scale=1.0">
    <link rel="icon" href="/favicon.ico">
    <title>ad_scraper</title>
    <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:100,300,400,500,700,900">
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@mdi/font@latest/css/materialdesignicons.min.css">
  <link href="/js/app.js" rel="preload" as="script"><link href="/js/chunk-vendors.js" rel="preload" as="script"></head>
  <body>
    <noscript>
      <strong>We're sorry but ad_scraper doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>
    </noscript>
    <div id="app"></div>
    <!-- built files will be auto injected -->
  <script type="text/javascript" src="/js/chunk-vendors.js"></script><script type="text/javascript" src="/js/app.js"></script></body>
</html>

Any idea it's not working using a proxy for kijiji website?

Thanks

Ad #... does not exist

I am trying to search for ads and sometimes I get:

Error: Kijiji returned error: Ad #1553278802 does not exist
    at parseResponseXML (node_modules/kijiji-scraper/dist/lib/backends/api-scraper.js:49:15)
    at node_modules/kijiji-scraper/dist/lib/backends/api-scraper.js:124:16

How to skip it?

Filtering results

Hey!

I noticed that some category IDs are either broken or have changed (one example is id 214 -- supposed to be two-bedroom apartments but gives out listings of cars, etc.)

In any case, I'm having trouble with the filtering params. I'd appreciate if you could point me in the right direction with them. For instance:

searchParams["attributeMap[numberbedrooms_s]"] = "[2]"

I can see that will help me set the attribute called numberbedrooms to 2 but I have no idea how the template works here. What's the _s for, why are we passing an array-like structure in the value? Also, how would this work with more than one value, ex. if I had to filter results to 2 or 3 bedrooms, or "at least 2" bedrooms, how would I do that?

I was trying to filter real-estate ads based on "offer" or "wanted" but that doesn't seem to be working. Here's how I've been trying to do it:

searchParams["attributeMap[type_s"] = "[OFFER]"

Based on this KijijiAd object:

{
    title: 'Grande Chambre',
    description: '...',
date: 2019 - 07 - 29 T13: 45: 54.000 Z,
    image: 'https://...$_35.JPG',
    images: ['https://...$_57.JPG',
    ],
    attributes: {
         forrentbyhousing: 'ownr',
         unittype: 'apartment',
         numberbedrooms: 1,
         numberbathrooms: 10,
         petsallowed: 0,
         dateavailable: 2019-08-05T00:00:00.000Z,
         areainfeet: 625,
         yard: 1,
         balcony: 1,
         smokingpermitted: 2,
         elevator: 0,
         hydro: 0,
         heat: 0,
         water: 0,
         cabletv: 1,
         internet: 1,
         landline: 0,
        numberparkingspots: 0,
        furnished: 1,
        price: 600,
        location: {
            latitude: ...,
            longitude: ...,
            mapAddress: '...QC, Canada',
            province: 'quebec',
            mapRadius: 0
        },
        type: 'OFFER',
        visits: 139
    },
    url: 'https://www.kijiji.ca/...',
    scrape: [Function],
    isScraped: [Function]
}

Also -- sorting by relevancy doesn't seem to be an option. If I were to set the results to sort by priceAsc a lot of strange (irrelevant) results show up.

Any help would be appreciated!

Getting more results than expected

Hi there,

I really like the module! I have been using it to collect data on motorcycles. I ran into a situation where one of my searches returned duplicate results. I looked further and the array size returned from search() is 68 ads but there is only 34 ads in the true results. Below are the parameters i'm using. Is this normal function? Or maybe i'm doing something weird. Thanks!

let options = {
minResults: 40
};
let params = {
locationId: 9003,
categoryId: 30,
sortByName: "priceAsc",
keywords: 'GSXR750'
};

"npm install" fails

Is there an issue building this? I have not used npm before. I've tried npm rebuild followed by npm install.

$ npm install

[email protected] prepare D:\data\Dropbox\work\kijiji\kijiji-scraper
npm run build

[email protected] build D:\data\Dropbox\work\kijiji\kijiji-scraper
tsc

lib/backends/api-scraper.ts:16:24 - error TS2339: Property 'attribs' does not exist on type 'Element'.
Property 'attribs' does not exist on type 'TextElement'.

16 const type = (item.attribs.type || "").toLowerCase();
~~~~~~~

lib/backends/api-scraper.ts:17:34 - error TS2339: Property 'attribs' does not exist on type 'Element'.
Property 'attribs' does not exist on type 'TextElement'.

17 const localizedLabel = (item.attribs["localized-label"] || "").toLowerCase();
~~~~~~~

lib/backends/api-scraper.ts:60:26 - error TS2339: Property 'attribs' does not exist on type 'Element'.
Property 'attribs' does not exist on type 'TextElement'.

60 const url = item.attribs.href;
~~~~~~~

lib/backends/api-scraper.ts:69:27 - error TS2339: Property 'attribs' does not exist on type 'Element'.
Property 'attribs' does not exist on type 'TextElement'.

69 const name = item.attribs.name;
~~~~~~~

Found 4 errors.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] build: tsc
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] build script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\Peter-laptop\AppData\Roaming\npm-cache_logs\2020-12-11T02_19_18_268Z-debug.log
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] prepare: npm run build
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] prepare script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\Peter-laptop\AppData\Roaming\npm-cache_logs\2020-12-11T02_19_18_322Z-debug.log

search() not working as expected

When I call search() using the parameters you provided, I get a error saying "Kijiji failed to redirect to search results". Looking at the code, it only returns that message when its not a redirect (statuscode != 301).
screen shot 2018-10-29 at 12 14 58 pm

However, looking at the response, the status code received was a 200.
screen shot 2018-10-29 at 12 11 32 pm

My code:
image

Invalid Kijiji HTML on search results page.

So I was running my script for 24+ hours successfully. Then came home to this error.

Error: Invalid Kijiji HTML on search results page at /home/loze/node_modules/kijiji-scraper/lib/search.js:129:19 at process._tickCallback (internal/process/next_tick.js:68:7)

I'm not sure why it ran flawlessly for so long before returning this error. Any clue?

Question!

Hey! How do you find location id's and category id's? Thanks!!

Category Attributes

Many of the category attributes (at least under real estate listings) return as numerical values as opposed to what you read on the website. For some attributes this is simply Yes/No answers being translated to 1 or 0, but other attributes that have more than two alphanumeric options can return with values 0, 1, 2, etc.. In #55 I believe you referred to these as internal values. This was never an issue with the categories I've scraped in the past, but the real estate listings have a lot of attributes. Have you ever done work to translate these internal values to their human-readable counterparts? I wanted to check before I do it myself by hand.

kijiji.search() Randomly Returns an Amount of Ads Under Specified Amount via minResults

I'm running into an issue where sometimes minResults will be set to a value like 40, and the ads returned will sometimes be 40 or sometimes be 20. Say I set minResults to 120, the amount of ads returned can be anything between 20 and 120 and it is completely random from the testing I've done. I've included a video below for reproduction steps using version 6.1.0:

This behavior is new so it might have been introduced a few versions ago. Also, I apologize for the bad quality but I couldn't be bothered to find out why OBS is making my screen look blurry. The breakpoints are used to rule out my bad programming as a cause and verify the values as what they should be.

If you want me to write a code sample for you, I could do that.

Issues with search for services?

I'm trying to search for services as oppose to buy and sell, but results empty.
Tried different category Ids and manually typing out the name.
E.g. "categoryId: 84" and "categoryId: SERVICES.WEDDING" both come up empty.

Appreciate any help on this!

Vulnerability in Lodash

There is a vulnerability Prototype Pollution in the version of lodash that your version of cheerio is using. Please update cheerio.

Impossible to fetch ads

Hi,

I started to use the lib, thanks for the work you made.
I have an error when I try to get an ad or to use search function.
Did I misunderstand or forgot something ?

Code

image

Error

image

Have a nice day

Migration to Typecript

Hi! Would you like me to work on migrating the code to Typescript? I think that would help in improving the interface for users and will make it easier to add tests and extend the project in the future. It will be substantial work so I wanted to check with you before working on it and raising a pr.

Random Error:Invalid Kijiji HTML on search results page

Hi! I'm doing a recurring scraping of Kijiji which scrapes it every 10 minutes. It works sometimes but sometimes it throws that error in the title.
Search params are the same through each recursion,
keyword used was Harley Davidson.

 console.log("Kijiji parse");
      let options = {
        minResults: 5000,
      };
      let params = {
        locationId: province.id,
        categoryId: 30, 
        keywords: keyword,
      };
      let kijijiRes = await kijiji.search(params, options);
      let listings = [];
      let cleanList = [];
      kijijiRes.forEach((ad) => {
        let price = ad.attributes.price;
        let listingName = ad.title;
        let listingDescription = ad.description;
        let url = ad.url;
        let imgUrl = ad.image;
        let listing = {
          price,
          listingName,
          listingDescription,
          url,
          imgUrl,
        };
        listings.push(listing);
      });
      cleanList = listings.filter(function (el) {
        return el.price != 0;
      });
      console.log("End parse");
      return cleanList;
    } catch (err) {
      console.log(err);
      return cleanList;
    }

Search() with URL

A good feature would be making Search() be able to take in a query URL over specifying parameters for more convenience. This also helps with Kijiji's relative location filter which takes your location and applies a radius to search in.

Package broken: Kijiji-scraper returns no results

When running the kijiji-parser library with these params and options:

{
  locationId: 1700273,
  categoryId: 42,
  sortByName: 'dateDesc',
  ll: '43.6629,-79.3957',
  distance: 3
}
{ minResults: 3 }

let listings = await kijiji.search(params, options);

We see there are no results returned. We find the same issue when toggling with various parameters for locationId, ll, distance

we suspect that Kijiji could be preventing these requests, going through. Can you invesitgate this?

Kijiji.Ad.Get() Doesn't return to callback

const kijiji = require("kijiji-scraper");

//Random ad url i got off kijiji
var url = "https://www.kijiji.ca/v-cell-phone/calgary/iphone-x-64gb/1553098375"

// Scrape using optional callback paramater
kijiji.Ad.Get(url, function(err, ad) {
    if (!err) {
        // Use the ad object
        console.log(ad.title);
    }
});

This code sample was taken straight out of the documentation and the function in the parameter never gets executed. You can see this by putting a breakpoint anywhere inside the function and it will not trigger. Done using version 6.10 of the module.

As a side note, some ad URLs may end with ?undefined which does return this error from the module:
Error: Invalid Kijiji ad URL. Ad URLs must end in /some-ad-id.

However, the return promise version of this code works fine so this isn't a priority.

Create helper types for enums, and expose expected response attributes

Better attribute handling would make this library easier to use and avoid the need to guess/experiment as much when using it. Additionally, the meaning of some attribute values is not obvious (see #65).

This issue tracks two related enhancements:

  1. Some ad attributes are enums. Currently, kijiji-scraper exposes the internal value. Create helper types (similar to those that exist for location and category IDs) for exposing all possible enums, their entries, and the corresponding internal values.
  2. The possible attributes an ad can have depend on its category. Provide a way to determine all possible attributes that may be returned when scraping/searching (ideally at compile time via TypeScript, with a run-time API for JavaScript users).

recommended proxy rotation to avoid getting blacklisted?

I know this is a bit out of scope but just thought I'd ask to see if maybe you're already aware of something that works well for this purpose

basically i'm going to be doing a lot of queries from a single server with static ip and want to avoid getting blacklisted. Can you recommend a service and a package (node-friendly I imagine) that would wrap your kijiji-scraper well and easily?

thanks!!

Not Parsing Ads as of July 20, 2017

It looks like ad-scraper.js stopped working today, presumably due to a change in the ad format.

ad-scraper.js is definitely getting the correct URL for the ad, but the information it is returning (price, image, etc.) is undefined.

Scraper fails to return search results (Kijiji scraper detection + blocking)

I have a script scraping bicycle ads that I've been using for a few months. It had been working fine until today. Here's the script:

const kij = require("kijiji-scraper")

const bikes = [
   "Marin Hawk Hill",
   "Giant Stance",
   "Vitus Mythique",
   "Canyon Neuron",
   "Calibre Bossnut",
   "Giant Fathom",
   "Salsa Rangefinder",
   "Nukeproof Scout",
   "Kona Mahuna",
   "Nishiki Colorado",
   "Specialized Pitch",
]

const options = {
   minResults: 40
}

const params = {
   locationId: kij.locations.ONTARIO.TORONTO_GTA,
   categoryId: kij.categories.BUY_AND_SELL.BIKES.MOUNTAIN,
   adType: "OFFER"
}

console.log('--------------------------------------------------------------------')
bikes.forEach((bike, index) => {
   params.keywords = bike
   kij.search(params, options).then((ads) => {
      for (let i = 0; i < ads.length; i++) {
         const frame = ads[i].attributes.framesize
         if (frame)
            console.log(`Frame: ${ads[i].attributes.framesize.toUpperCase()}`)
         else
            console.log('Frame: Unknown')
         console.log(`Price: \$${ads[i].attributes.price}`)
         console.log(`Title: ${ads[i].title}`)
         console.log(`URL: ${ads[i].url}`)
         console.log('--------------------------------------------------------------------')
      }
   }).catch(console.error)
})

I get the following output now when I run the script:

Error: Kijiji failed to return search results. It is possible that Kijiji changed their results markup. If you believe this to be the case, please open a bug at: https://github.com/mwpenny/kijiji-scraper/issues
    at /home/abbas/Documents/scripts/kijiji/node_modules/kijiji-scraper/lib/search.js:105:24
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

For some odd reason, this also disables the kijiji website. When I try navigating to kijiji.ca in my browser, it returns the following page:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>CJDTCGAH4X5HAW9W</RequestId>
<HostId>
6QIALNdZ8MKReyXvQraQQl30II/Nm88OrSES3l2hsGgH5TQhnOv7UiNEbkjdt3AdeSa51QCa+CA=
</HostId>
</Error>

Once I clear my cache, and cookies, I'm able to access the website again

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.