pevers / images-scraper Goto Github PK
View Code? Open in Web Editor NEWSimple and fast scraper for Google
License: ISC License
Simple and fast scraper for Google
License: ISC License
If changed in the advanced options 'color' has no effect. You can see when 'show' is set to true that it does not change in the search options.
Great tool by the way!
4 passing (21s)
1 failing
1) Google Tests
should return the correct length with pagination:
AssertionError: expected 0 to equal 300
+ expected - actual
-0
+300
at Context.<anonymous> (test/google.js:31:31)
I grabbed the code from github and ran the example.js but got an error. I'm running on ubuntu and installed node through apt-get. This was the output I saw from node example.js
/mnt/ssd/more_scrape/node_modules/nightmare/lib/nightmare.js:94
debug(`WARNING: load timeout of ${options.loadTimeout} is shorter than go
^
SyntaxError: Unexpected token ILLEGAL
at Module._compile (module.js:439:25)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.require (module.js:364:17)
at require (module.js:380:17)
at Object.<anonymous> (/mnt/ssd/more_scrape/node_modules/images-scraper/lib/google-images-scraper.js:6:17)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.require (module.js:364:17)
at require (module.js:380:17)
at Object.<anonymous> (/mnt/ssd/more_scrape/node_modules/images-scraper/index.js:1:87)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
Do I have to do anything special to make this work?
Hi.
When i try to get larger images i only get the url for the thumbnail.
Seems like resolution: 'l' doesn't have an impact or am I doing something wrong?
Greetz
Used the exact code on the readme. Works for bing and yahoo.
When I am typing var Scraper = require('images-scraper')
and running a server appears an Electron error. I tried to reinstall this module, but it doesnt work. Is this module comptible with Electron?
For some reason it's not calling end() on the nightmare portion of the script and hanging.
Is it possible to save the image that results from the search? Alternatively to couple your output with the "request" node package, which could download the image based on the URL, how would one pass/return only the url of the output your application provides?
Apologies if the "issues" section isn't the right place for what is essentially a feature request, I realise it's not your job to teach me how to code :) but just thought you guys would be the best to ask. Thanks for you work!
is it just me or are both of these returning the same links? Searching dog or trump returns the same images.
Hi, i tried to download your package into a new project on repl and it doesn't work. Can you please fix it? :)
Add Typescript definitions so that it is more usable in Typscript.
I tried using the scraper today for bing but it could only get around 20 results. Working for yahoo.
I watch in Nightmare and it seems they don't even finish scrolling (when run for 1000 items) and just terminate.
I want to get 1000 pictures with them!!!
Yahoo works fine etc.
When i use the images-scraper api on heroku i have this error :
Error: Failed to launch the browser process!
Im getting absolutely no output with the Google Example. Bing works great.
i want to turn of nsfw content so that it doesn't popup on discord how can i do that
run this command npm install images-scraper --save
got this error in console
> [email protected] postinstall /Users/kashifeqbal/Development/accio.brands.version.1/node_modules/electron
> node install.js
Downloading electron-v1.8.4-darwin-x64.zip
[============================================>] 100.0% of 48.34 MB (917.1 kB/s)
/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/electron/install.js:47
throw err
^
Error: Generated checksum for "electron-v1.8.4-darwin-x64.zip" did not match expected checksum.
at ChecksumMismatchError.Error (native)
at ChecksumMismatchError.ErrorWithFilename (/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/sumchecker/build.js:41:124)
at new ChecksumMismatchError (/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/sumchecker/build.js:56:133)
at Hash.<anonymous> (/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/sumchecker/build.js:203:22)
at emitNone (events.js:86:13)
at Hash.emit (events.js:185:7)
at emitReadable_ (_stream_readable.js:432:10)
at emitReadable (_stream_readable.js:426:7)
at readableAddChunk (_stream_readable.js:187:13)
at Hash.Readable.push (_stream_readable.js:134:10)
βΈ¨ βββΈ© β postinstall: info lifecycle [email protected]~postinstall: Failed to exec postinstall script
[email protected] /Users/kashifeqbal/Development/accio.brands.version.1
βββ UNMET PEER DEPENDENCY [email protected]
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/expand-brackets/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/express/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/loopback-connector/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/loopback-phase/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/loopback-swagger/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/send/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/snapdragon/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/stream-parser/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/strong-remoting/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/Users/kashifeqbal/Development/accio.brands.version.1/node_modules/x-ray-crawler/package.json'
npm WARN [email protected] requires a peer of chai@>= 2.1.2 < 4 but none was installed.
npm ERR! Darwin 17.4.0
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "images-scraper" "--save"
npm ERR! node v6.11.4
npm ERR! npm v3.10.10
npm ERR! code ELIFECYCLE
npm ERR! [email protected] postinstall: `node install.js`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] postinstall script 'node install.js'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the electron package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node install.js
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs electron
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! npm owner ls electron
npm ERR! There is likely additional logging output above.
npm ERR! Please include the following file with any support request:
npm ERR! /Users/kashifeqbal/Development/accio.brands.version.1/npm-debug.log
npm ERR! code 1
Running the latest version of node.js (12), images-scraper, on OS X Mojave.
When running the code, sometimes the page takes too long to render - so when it is instructed to scroll there are no scrollbars and it cannot reach the end of the page. I've "fixed" this by adding a one-second delay:
await page.setUserAgent(self.userAgent);
await page.waitFor(1000); // Add one second delay to ensure scrollbars are present
I'm not all that familiar with puppeteer, so am not sure if this works the way I think it does, and not sure if there is a better way to do this. However, I can no reliably run results w/o it stopping (100 test runs so far).
Please upgrade the following, it should remedy the following issues
Trying to add tbs search options to the constructor returns ReferenceError (fc is not defined)
const google = new Scraper({
puppeteer: {
headless: false,
},
tbs: {
sur: fc,
},
});
ReferenceError: fc is not defined
Can't install images-scraper on rep.lit
Using "/search?q=${searchQuery}${this.safe}" gives the window for the final user to add "&safe=off" to the searchQuery disabling the google safe search feature, given that the google engine looks for the first defined criteria and ignores subsequent definitions of the same criteria.
I have tried to use "safe=active" before "search?q=" without success as it return a 404 error page or to the main google search page.
If anyone finds a good and reasonable clue of how to avoid this behavior, please reply to this issue.
var gi = new GIScraper({ puppeteer: { headless: false } });
let results = await gi.scrape("melony pokemon", 300);
console.log(`${results.length} results`);
It's always 200.
Hi,
I am requiring it in my API and below is the code:
var Scraper = require ('images-scraper')
, google = new Scraper.Google();
google.list({
keyword: 'banana',
num: 10,
detail: true,
nightmare: {
show: true
}
})
.then(function (res) {
console.log('first 10 results from google', res);
}).catch(function(err) {
console.log('err', err);
});
// you can also watch on events
google.on('result', function (item) {
console.log('out', item);
});
but it is throwing me an error when I hit "npm install images-scraper"
Error: socket hang up
at TLSSocket.onHangUp (_tls_wrap.js:1124:19)
at TLSSocket.g (events.js:292:16)
at emitNone (events.js:91:20)
at TLSSocket.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9)
npm WARN Error: EPERM: operation not permitted, rmdir 'C:\Users\Akshay\Desktop\d
arwin\node_modules@types'
npm WARN at Error (native)
npm WARN { Error: EPERM: operation not permitted, rmdir 'C:\Users\Akshay\Deskto
p\darwin\node_modules@types'
npm WARN at Error (native)
npm WARN stack: 'Error: EPERM: operation not permitted, rmdir 'C:\Users\Aks
hay\Desktop\darwin\node_modules\@types'\n at Error (native)',
npm WARN errno: -4048,
npm WARN code: 'EPERM',
npm WARN syscall: 'rmdir',
npm WARN path: 'C:\Users\Akshay\Desktop\darwin\node_modules\@types' }
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] postinstall: node install.js
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] postinstall script.
npm ERR! This is probably not a problem with npm. There is likely additional log
ging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\Akshay\AppData\Roaming\npm-cache_logs\2017-09-22T11_47_48
_512Z-debug.log
Please help.
Thanks in advance !
Hi !
I just tried your code but; it seems to be limited to only 600 images per scrapper, is it normal ?
var Scraper = require('images-scraper'), google = new Scraper.Google();
['banana', 'apple', 'cherry'].map(category => {
google
.list({
keyword: category,
num: 5000,
detail: true,
nightmare: {
show: true
}
}).then(function (res) {
console.log('first 10000 results from google', res.length);
})
.catch(function (err) {
console.log('err', err);
})
});
Everytime I run this in Windows Powershell command line "node bing-images-scraper-2.js" with 100+ items, the output gets 'truncated'. I tried using > to output to a file, I try editing Powershell settings, etc. The output gets 'truncated' and ends at line ~700 with:
... 224 more items ]
Just using the basic Bing scraper provided by Pevers. Help?!
Need to have some test cases because headless scraping can be really really buggy!
there's some options can i add to get images with some spesification of resolution?
Hi there - great project!
Is there any way to grab those thumbnail images, since it downloads them anyway? Does electron or nightmare have some kind of cache?
Cheers,
Oliver
Not working?
var Scraper = require('google-images-scraper');
var scraper = new Scraper({
keyword: 'banana',
rlimit: 10 // 10 p second
});
scraper.list(10).then(function (res) {
console.log(res);
}).catch(function(err)
{
console.log(err);
});
The output:
[ undefined,
undefined,
undefined,
undefined,
undefined,
undefined,
undefined,
undefined,
undefined,
undefined ]
Also no errors?
What am i doing wrong?
try to listening event 'result'
var google = new Scraper.Google()
google.list({
keyword: 'banana',
num: 10
})
google.on('result', function(item) {
console.log('result', item);
});
it doesn't
output anything
But when i try using bing it give me the result (from bing of course)
var bing = new Scraper.Bing()
bing.list({
keyword: 'banana',
num: 10
})
bing.on('result', function(item) {
console.log('result', item);
});
Google scraping fails completely. This is part of the console log:
(node) warning: possible EventEmitter memory leak detected. 11 error listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
at EventEmitter.addListener (events.js:252:17)
at EventEmitter.once (events.js:278:8)
at EventEmitter.<anonymous> (D:\my\images-scraper\node_modules\nightmare\lib\runner.js:249:14)
at emitOne (events.js:90:13)
at EventEmitter.emit (events.js:182:7)
at process.<anonymous> (D:\my\images-scraper\node_modules\nightmare\lib\ipc.js:28:10)
at emitTwo (events.js:105:20)
at process.emit (events.js:185:7)
at handleMessage (internal/child_process.js:718:10)
at Pipe.channel.onread (internal/child_process.js:444:11)
hey @pevers, thoughts on this by chance?
If I try to get 1000 images, it will only return 100, 200, or 300. I think it has to do with the scrolling emulation and the timeout built into this mechanism.
About to dive back in to see how to get 1000 or more images at a time.
When trying to scrape images from my Ubuntu machine, I get the following error:
Error: Failed to launch the browser process!
[0220/172827.691150:ERROR:zygote_host_impl_linux.cc(90)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md
Is there an appropriate filter for 'Usage rights' options on Google Image Search similar to the options below: ?
advanced: {
imgType: 'photo', // options: clipart, face, lineart, news, photo
resolution: 'l', // options: l(arge), m(edium), i(cons), etc.
color: undefined // options: color, gray, trans
}
Not returning anything while scrapping from Google on Heroku platform.
images are scraped finely in my local environment, but when code is deployed in Heroku nothing is happening no error no result from promise.
It takes a long time and I only want to load the first ten or so images in any case.
Thank you.
App fails to boot , error is following
Administrador@IT65441 C:\wamp\www\mskeor-react
$ meteor
[[[[[ C:\wamp\www\mskeor-react ]]]]]
=> Started proxy.
=> Started MongoDB.
=> Meteor 1.3.2.4 is available. Update this project with 'meteor update'.
W20160425-17:45:55.558(-3)? (STDERR)
W20160425-17:45:55.559(-3)? (STDERR) packages\modules.js:74364
W20160425-17:45:55.559(-3)? (STDERR) this.proc.stdout.pipe(split2()).on('data', (data) => {
W20160425-17:45:55.559(-3)? (STDERR) ^
W20160425-17:45:55.559(-3)? (STDERR) SyntaxError: Unexpected token >
W20160425-17:45:55.560(-3)? (STDERR) at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:278:30
W20160425-17:45:55.560(-3)? (STDERR) at Array.forEach (native)
W20160425-17:45:55.560(-3)? (STDERR) at Function._.each._.forEach (C:\Users\Administrador\AppData\Local\.meteor\packages\meteor-tool\1.3.1\mt-os.windows.x86_32\dev_bundle\server-lib\node_modules\underscore\underscore.js:79:11)
W20160425-17:45:55.560(-3)? (STDERR) at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:133:5
=> Exited with code: 8
W20160425-17:46:02.312(-3)? (STDERR)
W20160425-17:46:02.312(-3)? (STDERR) packages\modules.js:74364
W20160425-17:46:02.312(-3)? (STDERR) this.proc.stdout.pipe(split2()).on('data', (data) => {
W20160425-17:46:02.312(-3)? (STDERR) ^
W20160425-17:46:02.312(-3)? (STDERR) SyntaxError: Unexpected token >
W20160425-17:46:02.313(-3)? (STDERR) at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:278:30
W20160425-17:46:02.313(-3)? (STDERR) at Array.forEach (native)
W20160425-17:46:02.313(-3)? (STDERR) at Function._.each._.forEach (C:\Users\Administrador\AppData\Local\.meteor\packages\meteor-tool\1.3.1\mt-os.windows.x86_32\dev_bundle\server-lib\node_modules\underscore\underscore.js:79:11)
W20160425-17:46:02.313(-3)? (STDERR) at C:\wamp\www\mskeor-react\.meteor\local\build\programs\server\boot.js:133:5
=> Exited with code: 8
I tried to install and it just said failed to install at the puppeteer
Hi,
I just ran in a bug, it seems that the scrapper goes crazy when changing pages.
I added some logs and monitored the results and it seems the results array in the while loop is doing something like :
let SEARCH_SCRAPPER = new Scraper({ puppeteer: { headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox'] } });
let res = await SEARCH_SCRAPPER.scrape(patternSearch, 250);
So I implemented this into a discord bot, and its only sending the same image, even after adding 200 after the search query in the code. How can I fix this?
I'm getting this error with the default example:
Windows 7 64bit
Node.js 12.13.0
var Scraper = require ('images-scraper');
let google = new Scraper.Google({
keyword: 'cat',
limit: 5,
puppeteer: {
headless: false
},
tbs: {
// every possible tbs search option, some examples and more info: http://jwebnet.net/advancedgooglesearch.html
isz: 'm', // options: l(arge), m(edium), i(cons), etc.
itp: undefined, // options: clipart, face, lineart, news, photo
ic: undefined, // options: color, gray, trans
sur: undefined, // options: fmc (commercial reuse with modification), fc (commercial reuse), fm (noncommercial reuse with modification), f (noncommercial reuse)
}
});
(async () => {
const results = await google.start();
console.log('results',results);
// sendMessage({message: 'image', body: results});
})();
(node:10852) UnhandledPromiseRejectionWarning: Error: Failed to launch chrome!
[7540:10164:0129/180956.284:FATAL:content_main_runner_impl.cc(526)] --single-process is not supported in chrome multiple dll browser.
Backtrace:
ovly_debug_event [0x000007FEC5505E12+15744418]
ovly_debug_event [0x000007FEC55053F2+15741826]
ovly_debug_event [0x000007FEC5518AD3+15821411]
ovly_debug_event [0x000007FEC5481ADC+15202924]
ovly_debug_event [0x000007FEC54AC960+15378672]
ovly_debug_event [0x000007FEC548142A+15201210]
ChromeMain [0x000007FEC46011BD+293]
Ordinal0 [0x000000013F712767+10087]
Ordinal0 [0x000000013F71182D+6189]
GetHandleVerifier [0x000000013F819092+673122]
BaseThreadInitThunk [0x0000000076F959CD+13]
RtlUserThreadStart [0x00000000771CA561+33]
TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
at onClose (***\node_modules\puppeteer\lib\Launcher.js:348:14)
at ChildProcess.<anonymous> (***\node_modules\puppeteer\lib\Launcher.js:338:60)
at ChildProcess.emit (events.js:215:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12)
This works, on the other hand:
puppeteer: {
headless: true,
ignoreDefaultArgs: ['--single-process']
}
Hey, using images-scraper, I keep on getting navigation error everytime!
Also, if I try to push the results in an array it throws me with error!
`var express = require('express'),
bodyparser = require('body-parser'),
scraper = require('images-scraper');
var app = express();
var google = new scraper.Google();
app.set('port', process.env.PORT || 3000);
app.set('view engine', 'ejs');
app.use(bodyparser.urlencoded({ extended: false }))
app.get('/', function(req,res){
res.send("This is the home page");
})
app.get('/:id', function(req,res){
var id = req.params.id;
console.log(id);
google.list({
keyword: id,
num: 10,
detail: true,
nightmare: {
show: true
}
})
.then(function(res){
console.log('first 10 results from google', res);
}).catch(function(err){
console.log(err);
})
// var list = [];
google.on('result', function (item) {
console.log('out', item);
// list.push(item);
});
// res.send(list);
})
app.listen(app.get('port'), function(err){
if(err) throw err;
console.log('App is connected...');
})
`
The error is:
`{ message: 'navigation error',
code: -7,
details: 'Navigation timed out after 30000 ms',
url: 'https://www.google.com/search?q=hey%20there&source=lnms&tbm=isch&sa=X' }`
I went through a debugging hell of things, until I realized that even your example will only return an empty array, at least on my computer. With a fresh install of Node 12, a new project initialized with npm init
and just your dependency and your example code, I was not able to get ANY result what so ever.
Terminal Things.
β test node:(v12.16.1) npm start
> [email protected] start /Users/alt/Documents/Code/_private/test
> node index.js
results []
Hi, nice work, is there any option to specify the usage right, and the picture size when querying the google images api ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.