Giter Club home page Giter Club logo

google-shopping-scraper-puppeteer's Introduction

Google Shopping Scraper

Google Shopping Scraper is an Apify actor for extracting data from Google Shopping web site, in any country domain. It scrapes the first result page and details about each product and its sellers. It is build on top of Apify SDK and you can run it both on Apify platform and locally.

Input

Field Type Description
queries Array of Strings (Required if you don't use inputUrl) List of queries to search for
inputUrl Array of Strings (Required if you don't use queries) Here you can provide a list of search URLs.
countryCode String (required) Provide the country to search in (choose from the country list when using the editor, provide the country code when using JSON)
maxPostCount Integer Limit of the results to be scraped per page, 0 means no limit. Currently the actor scrapes only the 1st page (20 results)
isAdvancedResults Boolean Check this if you want to scrape more data. Your dataset items will have more fields including merchantName and reviews
extendOutputFunction string Function that takes a JQuery handle ($) as argument and returns data that will be merged with the default output. More information in Extend output function

INPUT Example:

{
  "queries": [
    "iphone 11 pro"
  ],
  "countryCode": "US",
  "maxPostCount": 10,
  "isAdvancedResults": true
}

Output

Output is stored in a dataset. Example of one output item:

{
  "query": "iphone 11 pro",
  "productName": "Apple iPhone 11 Pro - 64 GB - Space Gray - Unlocked - CDMA/GSM",
  "productLink": "http://www.google.com/shopping/product/7412086993790421270?q=iphone+11+pro&hl=en&gl=us&uule=w+CAIQICINVW5pdGVkIFN0YXRlcw&prds=epd:12986884032099345386,prmr:1&sa=X&ved=0ahUKEwiVxNfskdTqAhVmTRUIHZZBByMQ8gII0QQ",
  "price": "$999.00",
  "description": "5,453 product reviews",
  "merchantName": "Apple",
  "merchantMetrics": "93% positive seller rating",
  "merchantLink": "http://www.google.com/aclk?sa=L&ai=DChcSEwiRydvskdTqAhWF7u0KHbFVAVsYABAEGgJkZw&sig=AOD64_3XQE0ANMdXdV-A13_3UoAq7QojIA&ctype=5&q=&ved=0ahUKEwiVxNfskdTqAhVmTRUIHZZBByMQg-UECNME&adurl=",
  "shoppingId": "7412086993790421270",
  "reviewsLink": "http://www.google.com/shopping/product/7412086993790421270?q=iphone+11+pro&hl=en&gl=us&uule=w+CAIQICINVW5pdGVkIFN0YXRlcw&prds=epd:12986884032099345386,prmr:1&sa=X&ved=0ahUKEwiVxNfskdTqAhVmTRUIHZZBByMQ9AII1gQ#reviews",
  "reviewsScore": "4.7 out of 5 stars",
  "reviewsCount": "5,453 product reviews",
  "positionOnSearchPage": 2,
  "productDetails": null
},

Note about price format

  • Different countries has different price formats, currently the actor leaves the price format as it is found on the page.

Note about the results

  • Google results are affected by your internet history. The results from the scraper might differ from the results in your browser.

Google SERP

The actor uses Google SERP Proxy to scrape localized results. For more information, check the documentation.

Extend output function

You can use this function to update the default output of this actor. This function gets a JQuery handle $ as an argument so you can choose what data from the page you want to scrape. The output from this will function will get merged with the default output.

The return value of this function has to be an object!

You can return fields to achieve 3 different things:

  • Add a new field - Return object with a field that is not in the default output
  • Change a field - Return an existing field with a new value
  • Remove a field - Return an existing field with a value undefined

The following example will add a new field:

($) => {
    return {
        comment: 'This is a comment',
    }
}

Expected CU consumption

Expected compute units is 0.394 every 10 products.

Open an issue

If you find any bug, please create an issue on the actor Github page.

google-shopping-scraper-puppeteer's People

Contributors

emastra avatar metalwarrior665 avatar mstephen19 avatar olehveselov92 avatar pocesar avatar zpelechova avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.