pevers / images-scraper Goto Github PK

View Code? Open in Web Editor NEW

225.0 225.0 70.0 1.05 MB

Simple and fast scraper for Google

License: ISC License

JavaScript 100.00%

google images nodejs puppeteer scraper

images-scraper's Introduction

Hi there 👋

images-scraper's People

Contributors

Stargazers

Watchers

images-scraper's Issues

Advanced option 'color' does not work

If changed in the advanced options 'color' has no effect. You can see when 'show' is set to true that it does not change in the search options.

Great tool by the way!

NPM Test is failing


  4 passing (21s)
  1 failing

  1) Google Tests
       should return the correct length with pagination:

      AssertionError: expected 0 to equal 300
      + expected - actual

      -0
      +300

      at Context.<anonymous> (test/google.js:31:31)

SyntaxError: Unexpected token ILLEGAL

I grabbed the code from github and ran the example.js but got an error. I'm running on ubuntu and installed node through apt-get. This was the output I saw from node example.js


/mnt/ssd/more_scrape/node_modules/nightmare/lib/nightmare.js:94
    debug(`WARNING:  load timeout of ${options.loadTimeout} is shorter than go
          ^
SyntaxError: Unexpected token ILLEGAL
    at Module._compile (module.js:439:25)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/mnt/ssd/more_scrape/node_modules/images-scraper/lib/google-images-scraper.js:6:17)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/mnt/ssd/more_scrape/node_modules/images-scraper/index.js:1:87)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)

Do I have to do anything special to make this work?

Larger Images

Hi.
When i try to get larger images i only get the url for the thumbnail.
Seems like resolution: 'l' doesn't have an impact or am I doing something wrong?
Greetz

Google and Picsearch "result" event never gets triggered

Used the exact code on the readme. Works for bing and yahoo.

Electron failed to install correctly, please delete node_modules/electron and try installing again

When I am typing var Scraper = require('images-scraper') and running a server appears an Electron error. I tried to reinstall this module, but it doesnt work. Is this module comptible with Electron?

Not Exiting

For some reason it's not calling end() on the nightmare portion of the script and hanging.

Headless Navigator

Can i use a Headless Navigator, if yes, wich?

Possible to save/download images scraped?

Is it possible to save the image that results from the search? Alternatively to couple your output with the "request" node package, which could download the image based on the URL, how would one pass/return only the url of the output your application provides?

Apologies if the "issues" section isn't the right place for what is essentially a feature request, I realise it's not your job to teach me how to code :) but just thought you guys would be the best to ask. Thanks for you work!

bing vs yahoo

is it just me or are both of these returning the same links? Searching dog or trump returns the same images.

Repl.it: failed to install images-scraper

Hi, i tried to download your package into a new project on repl and it doesn't work. Can you please fix it? :)

Add Typescript definitions

Add Typescript definitions so that it is more usable in Typscript.

Bing Scrape Not Working

I tried using the scraper today for bing but it could only get around 20 results. Working for yahoo.

Google and Bing scraper's "end early" for big numbers of images

I watch in Nightmare and it seems they don't even finish scrolling (when run for 1000 items) and just terminate.

I want to get 1000 pictures with them!!!

Yahoo works fine etc.

Error on heroku

When i use the images-scraper api on heroku i have this error :
Error: Failed to launch the browser process!

No output with Google Example.

Im getting absolutely no output with the Google Example. Bing works great.

how to avoid nsfw content

i want to turn of nsfw content so that it doesn't popup on discord how can i do that

Unable to Install

run this command npm install images-scraper --save
got this error in console

> [email protected] postinstall /Users/kashifeqbal/Development/accio.brands.version.1/node_modules/electron
> node install.js

Downloading electron-v1.8.4-darwin-x64.zip
[============================================>] 100.0% of 48.34 MB (917.1 kB/s)
/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/electron/install.js:47
  throw err
  ^

Error: Generated checksum for "electron-v1.8.4-darwin-x64.zip" did not match expected checksum.
    at ChecksumMismatchError.Error (native)
    at ChecksumMismatchError.ErrorWithFilename (/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/sumchecker/build.js:41:124)
    at new ChecksumMismatchError (/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/sumchecker/build.js:56:133)
    at Hash.<anonymous> (/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/sumchecker/build.js:203:22)
    at emitNone (events.js:86:13)
    at Hash.emit (events.js:185:7)
    at emitReadable_ (_stream_readable.js:432:10)
    at emitReadable (_stream_readable.js:426:7)
    at readableAddChunk (_stream_readable.js:187:13)
    at Hash.Readable.push (_stream_readable.js:134:10)
⸨                ░░⸩ ⠏ postinstall: info lifecycle [email protected]~postinstall: Failed to exec postinstall script




[email protected] /Users/kashifeqbal/Development/accio.brands.version.1
└── UNMET PEER DEPENDENCY [email protected]

npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/expand-brackets/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/express/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/loopback-connector/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/loopback-phase/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/loopback-swagger/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/send/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/snapdragon/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/stream-parser/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/strong-remoting/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/x-ray-crawler/package.json'
npm WARN [email protected] requires a peer of chai@>= 2.1.2 < 4 but none was installed.
npm ERR! Darwin 17.4.0
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "images-scraper" "--save"
npm ERR! node v6.11.4
npm ERR! npm  v3.10.10
npm ERR! code ELIFECYCLE

npm ERR! [email protected] postinstall: `node install.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] postinstall script 'node install.js'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the electron package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node install.js
npm ERR! You can get information on how to open an issue for this project with:
npm ERR!     npm bugs electron
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!     npm owner ls electron
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /Users/kashifeqbal/Development/accio.brands.version.1/npm-debug.log
npm ERR! code 1

Scraper hangs if page takes too long to render

Running the latest version of node.js (12), images-scraper, on OS X Mojave.

When running the code, sometimes the page takes too long to render - so when it is instructed to scroll there are no scrollbars and it cannot reach the end of the page. I've "fixed" this by adding a one-second delay:

await page.setUserAgent(self.userAgent);
await page.waitFor(1000); // Add one second delay to ensure scrollbars are present

I'm not all that familiar with puppeteer, so am not sure if this works the way I think it does, and not sure if there is a better way to do this. However, I can no reliably run results w/o it stopping (100 test runs so far).

update package versions

Please upgrade the following, it should remedy the following issues

Issue adding options to constructor

Trying to add tbs search options to the constructor returns ReferenceError (fc is not defined)

const google = new Scraper({
puppeteer: {
headless: false,
},
tbs: {
sur: fc,
},
});

ReferenceError: fc is not defined

Can't install images-scraper on rep.lit

bypassable safe mode by final user

Using "/search?q=${searchQuery}${this.safe}" gives the window for the final user to add "&safe=off" to the searchQuery disabling the google safe search feature, given that the google engine looks for the first defined criteria and ignores subsequent definitions of the same criteria.

I have tried to use "safe=active" before "search?q=" without success as it return a 404 error page or to the main google search page.

If anyone finds a good and reasonable clue of how to avoid this behavior, please reply to this issue.

returns an empty string for description

I set 300 but scraper is only getting 200 images.

var gi = new GIScraper({ puppeteer: { headless: false } });
let results = await gi.scrape("melony pokemon", 300);
console.log(`${results.length} results`);

It's always 200.

Unable to install

Hi,
I am requiring it in my API and below is the code:
var Scraper = require ('images-scraper')
, google = new Scraper.Google();

google.list({
keyword: 'banana',
num: 10,
detail: true,
nightmare: {
show: true
}
})
.then(function (res) {
console.log('first 10 results from google', res);
}).catch(function(err) {
console.log('err', err);
});

// you can also watch on events
google.on('result', function (item) {
console.log('out', item);
});

but it is throwing me an error when I hit "npm install images-scraper"
Error: socket hang up
at TLSSocket.onHangUp (_tls_wrap.js:1124:19)
at TLSSocket.g (events.js:292:16)
at emitNone (events.js:91:20)
at TLSSocket.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9)
npm WARN Error: EPERM: operation not permitted, rmdir 'C:\Users\Akshay\Desktop\d
arwin\node_modules@types'
npm WARN at Error (native)
npm WARN { Error: EPERM: operation not permitted, rmdir 'C:\Users\Akshay\Deskto
p\darwin\node_modules@types'
npm WARN at Error (native)
npm WARN stack: 'Error: EPERM: operation not permitted, rmdir 'C:\Users\Aks
hay\Desktop\darwin\node_modules\@types'\n at Error (native)',
npm WARN errno: -4048,
npm WARN code: 'EPERM',
npm WARN syscall: 'rmdir',
npm WARN path: 'C:\Users\Akshay\Desktop\darwin\node_modules\@types' }
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] postinstall: node install.js
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] postinstall script.
npm ERR! This is probably not a problem with npm. There is likely additional log
ging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\Akshay\AppData\Roaming\npm-cache_logs\2017-09-22T11_47_48
_512Z-debug.log

Please help.
Thanks in advance !

Google Gives Error: spawn EPERM

Limit reached

Hi !

I just tried your code but; it seems to be limited to only 600 images per scrapper, is it normal ?

var Scraper = require('images-scraper'), google = new Scraper.Google();
['banana', 'apple', 'cherry'].map(category => {
    google
        .list({
            keyword: category,
            num: 5000,
            detail: true,
            nightmare: {
                show: true
            }
        }).then(function (res) {
        console.log('first 10000 results from google', res.length);
    })
        .catch(function (err) {
            console.log('err', err);
        })
});

Output gets 'Truncated'

Everytime I run this in Windows Powershell command line "node bing-images-scraper-2.js" with 100+ items, the output gets 'truncated'. I tried using > to output to a file, I try editing Powershell settings, etc. The output gets 'truncated' and ends at line ~700 with:

... 224 more items ]

Just using the basic Bing scraper provided by Pevers. Help?!

add test cases

Need to have some test cases because headless scraping can be really really buggy!

Specific resolution of images

there's some options can i add to get images with some spesification of resolution?

Get thumbnail images

Hi there - great project!

Is there any way to grab those thumbnail images, since it downloads them anyway? Does electron or nightmare have some kind of cache?

Cheers,
Oliver

No result.

Not working?

var Scraper = require('google-images-scraper');
 
var scraper = new Scraper({
    keyword: 'banana',
    rlimit: 10  // 10 p second 
});
 
scraper.list(10).then(function (res) {
    console.log(res);
}).catch(function(err)
{
  console.log(err);
});

The output:

[ undefined,
  undefined,
  undefined,
  undefined,
  undefined,
  undefined,
  undefined,
  undefined,
  undefined,
  undefined ]

Also no errors?
What am i doing wrong?

google.on('result', cb) not working

try to listening event 'result'

var google = new Scraper.Google()
google.list({
	keyword: 'banana',
	num: 10
})

google.on('result', function(item) {
	console.log('result', item);
});

it doesn't
output anything
But when i try using bing it give me the result (from bing of course)

var bing = new Scraper.Bing()
bing.list({
	keyword: 'banana',
	num: 10
})

bing.on('result', function(item) {
	console.log('result', item);
});

Nightmare memory leak

Google scraping fails completely. This is part of the console log:

(node) warning: possible EventEmitter memory leak detected. 11 error listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at EventEmitter.addListener (events.js:252:17)
    at EventEmitter.once (events.js:278:8)
    at EventEmitter.<anonymous> (D:\my\images-scraper\node_modules\nightmare\lib\runner.js:249:14)
    at emitOne (events.js:90:13)
    at EventEmitter.emit (events.js:182:7)
    at process.<anonymous> (D:\my\images-scraper\node_modules\nightmare\lib\ipc.js:28:10)
    at emitTwo (events.js:105:20)
    at process.emit (events.js:185:7)
    at handleMessage (internal/child_process.js:718:10)
    at Pipe.channel.onread (internal/child_process.js:444:11)

Google result count maxes out at 100, 200, or 300

hey @pevers, thoughts on this by chance?

If I try to get 1000 images, it will only return 100, 200, or 300. I think it has to do with the scrolling emulation and the timeout built into this mechanism.

About to dive back in to see how to get 1000 or more images at a time.

"Failed to launch the browser process" on Ubuntu

When trying to scrape images from my Ubuntu machine, I get the following error:

Error: Failed to launch the browser process!
[0220/172827.691150:ERROR:zygote_host_impl_linux.cc(90)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.


TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

Unable to scrape URLs more than 100

Hi,
I have attached the console output whenever I try to input the limit more than 100.

advanced option for 'Usage rights' filter?

Is there an appropriate filter for 'Usage rights' options on Google Image Search similar to the options below: ?

advanced: {
      imgType: 'photo', // options: clipart, face, lineart, news, photo
      resolution: 'l', // options: l(arge), m(edium), i(cons), etc.
      color: undefined // options: color, gray, trans
  }

Not working on Heroku

Not returning anything while scrapping from Google on Heroku platform.
images are scraped finely in my local environment, but when code is deployed in Heroku nothing is happening no error no result from promise.

Is there a way to prevent the Google image scraper from scrolling down and loading tons of images?

It takes a long time and I only want to load the first ten or so images in any case.

Thank you.

Failed to boot on Meteor v1.3+React app

App fails to boot , error is following

Administrador@IT65441 C:\wamp\www\mskeor-react
$ meteor
[[[[[ C:\wamp\www\mskeor-react ]]]]]

=> Started proxy.
=> Started MongoDB.
=> Meteor 1.3.2.4 is available. Update this project with 'meteor update'.
W20160425-17:45:55.558(-3)? (STDERR)
W20160425-17:45:55.559(-3)? (STDERR) packages\modules.js:74364
W20160425-17:45:55.559(-3)? (STDERR)   this.proc.stdout.pipe(split2()).on('data', (data) => {
W20160425-17:45:55.559(-3)? (STDERR)                                                      ^
W20160425-17:45:55.559(-3)? (STDERR) SyntaxError: Unexpected token >
W20160425-17:45:55.560(-3)? (STDERR)     at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:278:30
W20160425-17:45:55.560(-3)? (STDERR)     at Array.forEach (native)
W20160425-17:45:55.560(-3)? (STDERR)     at Function._.each._.forEach (C:\Users\Administrador\AppData\Local\.meteor\packages\meteor-tool\1.3.1\mt-os.windows.x86_32\dev_bundle\server-lib\node_modules\underscore\underscore.js:79:11)
W20160425-17:45:55.560(-3)? (STDERR)     at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:133:5
=> Exited with code: 8
W20160425-17:46:02.312(-3)? (STDERR)
W20160425-17:46:02.312(-3)? (STDERR) packages\modules.js:74364
W20160425-17:46:02.312(-3)? (STDERR)   this.proc.stdout.pipe(split2()).on('data', (data) => {
W20160425-17:46:02.312(-3)? (STDERR)                                                      ^
W20160425-17:46:02.312(-3)? (STDERR) SyntaxError: Unexpected token >
W20160425-17:46:02.313(-3)? (STDERR)     at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:278:30
W20160425-17:46:02.313(-3)? (STDERR)     at Array.forEach (native)
W20160425-17:46:02.313(-3)? (STDERR)     at Function._.each._.forEach (C:\Users\Administrador\AppData\Local\.meteor\packages\meteor-tool\1.3.1\mt-os.windows.x86_32\dev_bundle\server-lib\node_modules\underscore\underscore.js:79:11)
W20160425-17:46:02.313(-3)? (STDERR)     at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:133:5
=> Exited with code: 8

Stop image scraping immediately when there are no results

Hi, I just found if google doesn't give any image result "Your search - (search) - did not match any image results.", the method would wait until the timeout. Would like it to end immediately.

Repl.it: Failed at the [email protected] install script

I tried to install and it just said failed to install at the puppeteer

Infinite loop when setting a limit over 150 images

Hi,
I just ran in a bug, it seems that the scrapper goes crazy when changing pages.
I added some logs and monitored the results and it seems the results array in the while loop is doing something like :

0 results
50 results
100 results
150 results
100 results
150 results
etc...
The limit is never reached and the while is never exited... Maybe a bug or a missing test (if the limit is over the count of avaibable results?)

let SEARCH_SCRAPPER = new Scraper({ puppeteer: { headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox'] } });
let res = await SEARCH_SCRAPPER.scrape(patternSearch, 250);

Only Sending One Image

So I implemented this into a discord bot, and its only sending the same image, even after adding 200 after the search query in the code. How can I fix this?

Error: Failed to launch chrome! [7540:10164:0129/180956.284:FATAL:content_main_runner_impl.cc(526)] --single-process is not supported in chrome multiple dll browser.

I'm getting this error with the default example:
Windows 7 64bit
Node.js 12.13.0

var Scraper = require ('images-scraper');



let google = new Scraper.Google({
    keyword: 'cat',
    limit: 5,
    puppeteer: {
        headless: false
    },
  tbs: {
        // every possible tbs search option, some examples and more info: http://jwebnet.net/advancedgooglesearch.html
    isz: 'm', 				// options: l(arge), m(edium), i(cons), etc. 
    itp: undefined, 				// options: clipart, face, lineart, news, photo
        ic: undefined, 					// options: color, gray, trans
        sur: undefined,					// options: fmc (commercial reuse with modification), fc (commercial reuse), fm (noncommercial reuse with modification), f (noncommercial reuse)
  }
});
 
(async () => {
    const results = await google.start();
   console.log('results',results);
  // sendMessage({message: 'image', body: results});
})();

(node:10852) UnhandledPromiseRejectionWarning: Error: Failed to launch chrome!
[7540:10164:0129/180956.284:FATAL:content_main_runner_impl.cc(526)] --single-process is not supported in chrome multiple dll browser.
Backtrace:
	ovly_debug_event [0x000007FEC5505E12+15744418]
	ovly_debug_event [0x000007FEC55053F2+15741826]
	ovly_debug_event [0x000007FEC5518AD3+15821411]
	ovly_debug_event [0x000007FEC5481ADC+15202924]
	ovly_debug_event [0x000007FEC54AC960+15378672]
	ovly_debug_event [0x000007FEC548142A+15201210]
	ChromeMain [0x000007FEC46011BD+293]
	Ordinal0 [0x000000013F712767+10087]
	Ordinal0 [0x000000013F71182D+6189]
	GetHandleVerifier [0x000000013F819092+673122]
	BaseThreadInitThunk [0x0000000076F959CD+13]
	RtlUserThreadStart [0x00000000771CA561+33]



TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

    at onClose (***\node_modules\puppeteer\lib\Launcher.js:348:14)
    at ChildProcess.<anonymous> (***\node_modules\puppeteer\lib\Launcher.js:338:60)
    at ChildProcess.emit (events.js:215:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12)

This works, on the other hand:

puppeteer: {
        headless: true,
        ignoreDefaultArgs: ['--single-process']
    }

navigation error

Hey, using images-scraper, I keep on getting navigation error everytime!
Also, if I try to push the results in an array it throws me with error!

`var express = require('express'),
	bodyparser = require('body-parser'),
	scraper = require('images-scraper');

var app = express();
var google = new scraper.Google();

app.set('port', process.env.PORT || 3000);
app.set('view engine', 'ejs');

app.use(bodyparser.urlencoded({ extended: false }))

app.get('/', function(req,res){
	res.send("This is the home page");
})

app.get('/:id', function(req,res){
	var id = req.params.id;
	console.log(id);
	google.list({
		keyword: id,
		num: 10,
		detail: true,
		nightmare: {
			show: true
		}
	})
	.then(function(res){
		console.log('first 10 results from google', res);
	}).catch(function(err){
		console.log(err);
	})
	// var list = [];
	google.on('result', function (item) {
    console.log('out', item);
    // list.push(item);
	});

	// res.send(list);
})

app.listen(app.get('port'), function(err){
	if(err) throw err;
	console.log('App is connected...');
})

`
The error is:

`{ message: 'navigation error',
  code: -7,
  details: 'Navigation timed out after 30000 ms',
  url: 'https://www.google.com/search?q=hey%20there&source=lnms&tbm=isch&sa=X' }`

Example code returns an empty array.

I went through a debugging hell of things, until I realized that even your example will only return an empty array, at least on my computer. With a fresh install of Node 12, a new project initialized with npm init and just your dependency and your example code, I was not able to get ANY result what so ever.

Terminal Things.

➜  test node:(v12.16.1) npm start

> [email protected] start /Users/alt/Documents/Code/_private/test
> node index.js

results []

How can i specify usage rights, and picture size in google picture searche

Hi, nice work, is there any option to specify the usage right, and the picture size when querying the google images api ?

pevers / images-scraper Goto Github PK

images-scraper's Introduction

Hi there 👋

images-scraper's People

Contributors

Stargazers

Watchers

Forkers

images-scraper's Issues

Can i use a Headless Navigator, if yes, wich?

Recommend Projects

Recommend Topics

Recommend Org