Giter Club home page Giter Club logo

moaminsharifi / image_scraper Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 2.97 MB

You do want to find high quality image with custom setting? here is what you need with Search Engines Image Scraper You can easily find and download image with custom size , color and ratio. Let's start and Start project if you want help to improve It's My honor to check Your pull request.

License: MIT License

Python 100.00%
duckduckgo scrap aspect-ratios image image-processing image-scraper image-scrapping

image_scraper's Introduction

Search Engines Image Scraper

You do want to find high quality image with custom setting? here is what you need with Search Engines Image Scraper You can easily find and download image with custom size , color and ratio. Let's start and Start project if you want help to improve It's My honor to check Your pull request.

wellcome.gif

Table of Content

  1. requirement and installation
  2. Usage
    • Cli
    • Pythn module
  3. Defindes
    • image ratio
    • image size
    • image color
    • table of paramters
    • Why DuckDuckGo?
  4. helpful link

requirement:

  1. python 3+ (it can use also for python 2 but need to change)
  2. python packges: selenium , numpy , cv2

install requirement (freeze version):

  • install with pip install -r requirement.txt' or pip3 install -r requirement.txt

Usage:

1. CLI

open Your shell in repo dir and enter (python3 also for some cases)

CLI version just accept two paramanter

python image_scraper.py --qeury string --amount int

for sample here I am download amazon logo

python image_scraper.py --qeury "amazon logo" --amount 10

for get help wirte:

python image_scraper.py -h

2. python module

  1. Import it as normal module
from image_scraper import Scraper_Config , Image_Scraper
  1. Create config (also you can use default config)
config = Scraper_Config(SEARCH_QUERY = 'duckduckgo',
                 NUMBER_OF_PICTURES = 5,
                 IMAGE_RATIO = (16, 9),
                 IMAGE_SIZE = (200 , 100),
                 CHECK_RATIO_AND_RESIZE = False,
                 EPSILON_RATIO_ERROR = 0.5 ,
                 MAKE_GRAY = False,
                 SEARCH_ENGINE_TYPE='duckduckgo',
                 MINIMUM_ELEMENT_ERROR = 0.2,
                 SLEEP_TIME = 0,
                 SCROLL_PAUSE_TIME = 2)
  1. Create Image_Scraper isntace and call scrap function
scraper = Image_Scraper(config)
scraper.scrap(verbose=True)

Defindes

aspect ratios

what is aspect ratio and why is important?

aspect ratios Maybe you known aspect ratio in images but why is important to us?

Let's check an example, our task is find great mohammad ali picture from google or duckduckgo. ratio.png in this system you can easily define image ratio and which size if ratio with some gap is neer to ratio accept it else next one. look at paramter table for more details.

parameter table for better config

Parameter Type Usage Example
SEARCH_ENGINE_TYPE one of 'google', 'duckduckgo' Chose use which search engine duckduckgo
NUMBER_OF_PICTURES Int > 5 how much will be amount of images 5
CHECK_RATIO_AND_RESIZE Boolean If True then check ratio and resize each image True
IMAGE_RATIO tuple each number If CHECK_RATIO_AND_RESIZE true then need to spesify it what is ratio (16, 9)
EPSILON_RATIO_ERROR float greater than 0.1 If CHECK_RATIO_AND_RESIZE true then need to check until how much bigger ratio can resize
GOOGLE_URL String Url for open in browser as is in default
DUCKDUCKGO_URL String Url for open in browser as is in default
SCROLL_PAUSE_TIME Int -> second Wait between every scroll for load in browser 2 (seconds)
PATH_OF_THIS_FILE String -> path image_scraper.py file path as is in default
IMAGE_BASE_DIR String -> path where image save as is in default
GOOGLE_DRIVER_PATH Int -> second If need to stop for download every image then put greater than 1 10 (seconds)
GOOGLE_CHROME_DRIVER_PATH String -> path Path of google chrome driver as is in default
IMAGE_RATIO_PERSENT Float convert ratio to float number 1.7 as mohammd ali picture
MAKE_GRAY Boolean If CHECK_RATIO_AND_RESIZE true then make images gray False
MINIMUM_ELEMENT_ERROR Float if you need 100 image then at least check exist of 120 image this var is gap between 100 to 120. also when check CHECK_RATIO_AND_RESIZE need up to 2 time more elemnt to check and download needed image with custom ratio 0.2
MINIMUM_EMELMET Int calculate number of minimum elemnt must be exist as is in default
HTML_EMELMT_CLASS_NAME String -> html tag Html tag in page for check and donwload image 10 (seconds)
USE_MULTI_THREADING Boolean Use multi threading or not True
RANDOM_SEED Int What is Your random seed for stop between threads or after done processes 420

Why DuckDuckGo?

duckduckgo say "Privacy, simplified ..." why here I am not talking about Privacy here we need Scrap web. Google Check ip, Behivaer in Browser and almost all thing with Privacy option you can even scrap it with TOR network

helpful link

some issue I got maybe helpful for you:

todo list

  • add more example
  • add google search engine
  • add more browser driver
  • add batch image downloader
  • add image instant augment
  • add multi threading support
  • add just resize without checking ratio
  • add colab support and colab file exmaple

image_scraper's People

Contributors

moaminsharifi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.