Giter Club home page Giter Club logo

shutterscrape's Introduction

ShutterScrape

ShutterScrape is a web scrapper for bulk downloading images and videos from Shutterstock with speed. โšก
It implements Selenium for browser automation and Beautiful Soup for parsing.


Setting up

  1. Configure shutterscrape.py to your Python version.

  2. Install requirements from Terminal:

pip install beautifulsoup4
pip install selenium
pip install lxml
  1. Install ChromeDriver.

  2. (Optional) Configure environment variables paths for python.exe and chromedriver.exe.


Running

Open terminal in the directory of shutterscrape.py and enter:

python shutterscrape.py

Go grab a cup of coffee while waiting... oh wait, it's already done!


Definitions

  • Search mode: Enter i for scraping images and v for scraping videos .
  • Number of search terms: For example, if you want to search for drone single person, enter 3.
  • Search term: Keyword(s) for searching on Shutterstock.
  • Number of pages to scrape: Higher number of pages means greater quantity of content with lower keyword precision.

Updates

10/1/2020
Updated for new shutterstock page layout as of 10/1/2020.

4/26/2019
Updated for new shutterstock page layout as of 4/26/2019.

10/1/2018
Added GUI for save directory selection.

07/31/2018
More stability fixes.

07/25/2018
Added gettyscrape.py for scraping videos from Getty Images.

07/23/2018
Stability fixes.

shutterscrape's People

Contributors

bhaskar-29 avatar chuanenlin avatar mouthoftiger avatar oninsomnus avatar rai220 avatar umairahmadh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

shutterscrape's Issues

ERROR:data_channel.cc(44)]

I have this error and nothing gets downloaded:

DevTools listening on ws://127.0.0.1:54653/devtools/browser/95f0ac6b-a67a-4d57-baa6-b29cc3412005 Page 1 [20232:25456:0621/042844.217:ERROR:data_channel.cc(44)] Accepting maxRetransmits = -1 for backwards compatibility [20232:25456:0621/042844.217:ERROR:data_channel.cc(49)] Accepting maxRetransmitTime = -1 for backwards compatibility [20232:25456:0621/042845.736:ERROR:data_channel.cc(44)] Accepting maxRetransmits = -1 for backwards compatibility [20232:25456:0621/042845.736:ERROR:data_channel.cc(49)] Accepting maxRetransmitTime = -1 for backwards compatibility Page 2

Chromedriver error

I am attempting to scrape and continue to get this error. Any ideas?

Message: session not created: Chrome version must be between 70 and 73
(Driver info: chromedriver=73.0.3683.86,platform=Linux 4.18.0-18-generic x86_64)

Script runs through pages but does not scrape any images

I am having a wired issue running this script. It has worked fine before, but now all of a sudden the script seems to visit however many pages I tell it to but it does not scrape any images from it (refer to screenshot below). The only thing I have modified in the script is, under def imageScrape: I have commented out the line driver.maximize_window() since the chromedriver is having trouble maximizing the screen and that line seems to crash the script, but otherwise the script is exactly the same. I have already tried copying and pasting the original script from here and just commenting that line out to make sure it was the only change. The script has worked before perfectly fine, I have no idea why it started doing this. What could be the problem?

Terminal Screen Shot

screen shot 2019-03-01 at 9 14 59 am

Lenght of image container

img_container = scraper.find_all("div", {"class":"z_c_b"})

img_container value gets stored as 1 .

So not able to retrieve all images in the page.

how do I solve this ?

Video Scraping not working

Image scraping is working but for video scraping the videos are not downloading and its looping in the first page itself. any fix?? Thanks

crawl data with full resolution

i can using your code to crawler data from shutterstock, but i get the thumbnail of image - image with low resolution (300x300 pixel). how can i crawl the data with full resolution?

Use requests, not selenium

You do not need to use selenium at all and just use requests, this will make your scripts run way faster.
Here is an example I created to show off how for gettyimages: https://gist.github.com/xtream1101/090aab1e00e245284a15af3f7cfaab05

Also for shutterstock you can hit this url where I searched for house
https://www.shutterstock.com/sstk/api/footage/search?language=en&q=house&page%5Bsize%5D=50&page%5Boffset%5D=0&recordActivity=true&fields%5Bvideos%5D=description%2Cpreview_video_urls%2Cpreview_image_url%2Cduration%2Csizes%2Cuploaded_date

Which will yield nice json data of all the results.

You can also thread the downloads to be even faster.

import error urllib

Hi Chuan,

when I try to run shutterscrape.py it prompts me an import error.

Alexanders-MacBook-Pro:shutterscrape alexandersantiago$ python shutterscrape.py
Traceback (most recent call last):
File "shutterscrape.py", line 6, in
from urllib import urlopen
ImportError: cannot import name 'urlopen'

Can you help me on that one please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.