Giter Club home page Giter Club logo

shutterscrape's Issues

Chromedriver error

I am attempting to scrape and continue to get this error. Any ideas?

Message: session not created: Chrome version must be between 70 and 73
(Driver info: chromedriver=73.0.3683.86,platform=Linux 4.18.0-18-generic x86_64)

crawl data with full resolution

i can using your code to crawler data from shutterstock, but i get the thumbnail of image - image with low resolution (300x300 pixel). how can i crawl the data with full resolution?

import error urllib

Hi Chuan,

when I try to run shutterscrape.py it prompts me an import error.

Alexanders-MacBook-Pro:shutterscrape alexandersantiago$ python shutterscrape.py
Traceback (most recent call last):
File "shutterscrape.py", line 6, in
from urllib import urlopen
ImportError: cannot import name 'urlopen'

Can you help me on that one please?

ERROR:data_channel.cc(44)]

I have this error and nothing gets downloaded:

DevTools listening on ws://127.0.0.1:54653/devtools/browser/95f0ac6b-a67a-4d57-baa6-b29cc3412005 Page 1 [20232:25456:0621/042844.217:ERROR:data_channel.cc(44)] Accepting maxRetransmits = -1 for backwards compatibility [20232:25456:0621/042844.217:ERROR:data_channel.cc(49)] Accepting maxRetransmitTime = -1 for backwards compatibility [20232:25456:0621/042845.736:ERROR:data_channel.cc(44)] Accepting maxRetransmits = -1 for backwards compatibility [20232:25456:0621/042845.736:ERROR:data_channel.cc(49)] Accepting maxRetransmitTime = -1 for backwards compatibility Page 2

Use requests, not selenium

You do not need to use selenium at all and just use requests, this will make your scripts run way faster.
Here is an example I created to show off how for gettyimages: https://gist.github.com/xtream1101/090aab1e00e245284a15af3f7cfaab05

Also for shutterstock you can hit this url where I searched for house
https://www.shutterstock.com/sstk/api/footage/search?language=en&q=house&page%5Bsize%5D=50&page%5Boffset%5D=0&recordActivity=true&fields%5Bvideos%5D=description%2Cpreview_video_urls%2Cpreview_image_url%2Cduration%2Csizes%2Cuploaded_date

Which will yield nice json data of all the results.

You can also thread the downloads to be even faster.

Script runs through pages but does not scrape any images

I am having a wired issue running this script. It has worked fine before, but now all of a sudden the script seems to visit however many pages I tell it to but it does not scrape any images from it (refer to screenshot below). The only thing I have modified in the script is, under def imageScrape: I have commented out the line driver.maximize_window() since the chromedriver is having trouble maximizing the screen and that line seems to crash the script, but otherwise the script is exactly the same. I have already tried copying and pasting the original script from here and just commenting that line out to make sure it was the only change. The script has worked before perfectly fine, I have no idea why it started doing this. What could be the problem?

Terminal Screen Shot

screen shot 2019-03-01 at 9 14 59 am

Video Scraping not working

Image scraping is working but for video scraping the videos are not downloading and its looping in the first page itself. any fix?? Thanks

Lenght of image container

img_container = scraper.find_all("div", {"class":"z_c_b"})

img_container value gets stored as 1 .

So not able to retrieve all images in the page.

how do I solve this ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.