Giter Club home page Giter Club logo

google-images-scraper's Introduction

Google-Images-Scraper

About

This is a selenium dynamic web page scraper to extract image URLs and download them to the local machine. This was originally intended to collect training data for the ZotBins waste image recognition project, but could be used for other purposes.
The script will download up to 400 images for a given search term. Runtime can vary depending on physical machine power and internet latency. During testing, the script would complete in 3-4 minutes for a single search term.

Prerequisites

Running the script

python download_images.py <search terms>

For example, to download images for utensils and water bottles, run:

python download_images.py utensils "water bottles"

Remember to enclose multi-word arguments in quotes.

Limitations

Web scraping is by nature dependent on the target. Therefore, any updates on google images may invalidate the script. View the download_images.py script documentation to see what can be done to keep the script updated.

Common errors

Element could not be scrolled into view

During testing, we encountered the following error a few times:

selenium.common.exceptions.ElementNotInteractableException: Message: Element <option> could not be scrolled into view.

Somehow, we recieved the error the first one or two times we ran the scripts, and running the script afterwards did not reproduce the error.
We suggest that users who run into this error just run the script multiple times, and hopefully the error will disappear. The cause of this error is still unknown...

google-images-scraper's People

Contributors

joshuacao avatar caojoshua avatar

Stargazers

 avatar

Watchers

Primal Pappachan avatar James Cloos avatar Danny avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.