Giter Club home page Giter Club logo

vis2's Introduction

Vis2 - OCR(), ImageIdentify()

Automation using Computer Vision. Convert images on screen, image files, or an image URL to text.

Super Quick Start

Run demo.ahk.

Quick Start

  1. Download Vis2.
  2. Create a new AHK script in the same folder as Vis2.ahk, copying the code below.
    #include <Vis2>
    MsgBox % OCR("https://i.stack.imgur.com/sFPWe.png")
  1. Run the new AHK script. You should see a MsgBox with OCR Text. Press Enter to exit. Visit the image link to confirm the OCR is working correctly.

How to use

When you see the popup "Optical Character Recognition Tool", click and drag. If you press the right mouse button while holding down LButton you can reposition the rectangle. My personal suggestion is to bind OCR() to a mouse button instead of #c.

Using Additional Languages

Go to https://github.com/tesseract-ocr/tessdata_best and place your desired languages in bin/tessdata_best. Go to https://github.com/tesseract-ocr/tessdata_fast and place your desired languages in bin/tessdata_fast.

Fast is used in the interactive GUI implementation, while best will be used for other cases. See below for what I mean.

#c:: OCR(, "fra")      ; French (requires fast fra.traineddata)
#x:: OCR(, "eng+fra")  ; English and French


MsgBox % OCR("https://i.imgur.com/T7WMxMs.png", "rus+eng")  ; Requires best eng.traineddata and rus.traineddata. 

Advanced Interactive Mode

While using #c:: OCR() you can press Ctrl, Alt, or Shift to enter Advanced Mode. (You should see a pink pop up.) While in this mode, press Ctrl + Space to see a preview of the preprocessed image. Press Alt + Space to get the coordinates of the grey rectangle. Holding Ctrl and LButton will allow you to resize the corners of the box. Shift and LButton will resize edges. Alt and LButton to draw a new rectangle.

Documentation

Input Data Types

The same rules apply for ImageIdentify()

OCR() - Launches an interactive GUI.

Example: Pressing Ctrl + Win + c will allow the user to manually select an area on screen to OCR.

#^c:: OCR()

OCR([x, y, w, h]) - Screen Coordinates as an Array

To input a set of known coordinates, try inputting an array of 4 values, [x, y, w, h]

text := OCR([0, 0, 430, 150])

This will search the screen from point (0, 0) extending in a rectangle of width 430 pixels and height 150 px.

OCR( file ) - Path to File

File name can be an absolute or relative path

text := OCR("myImage.jpg")
text := OCR("C:\image.png")

OCR( url ) - Website

The image will be downloaded and OCRed. You may experience a delay depending on the image size.

text := OCR("https://www.blog.google/static/blog/images/google-200x200.7714256da16f.png")

OCR( WinTitle ) - Window Title

You may enter a native AHK window type such as "ahk_class notepad", "ahk_exe", "ahk_id", "ahk_pid", or the exact name of the window. Reference

Example: 1) Open a new Notepad window. Type some text. Then run the following code.

MsgBox % OCR("Untitled - Notepad")

Note that only the client area is extracted, so the window border of Notepad is ignored.

OCR( hWnd ) - Unique Window ID

If you know the window ID, or hwnd, you may use it as well. Note that this is equivalent to OCR("ahk_id" hWnd).

OCR( base64 ) - Base64 encoded image string

OCR( GDI Bitmap ) - Pointer to a memory bitmap

OCR( HBITMAP ) - Handle to a memory bitmap

A sample script where you have to search for the text 'Vis2' on screen.

if ((text := OCR()) = "Vis2")
    MsgBox You have successfully used OCR!
else
    MsgBox You have found [ %text% ] `, try finding 'Vis2' instead. 

Need more help?

Be sure to visit https://autohotkey.com/boards/viewtopic.php?f=6&t=36047 for help and support.

vis2's People

Contributors

iseahound avatar livog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vis2's Issues

About the management of "Gdip_All" libary.

I added the translation feature to the Vis2 OCR system, but I cannot use the "Btt/Gdip_All" library as regularly as you do.I tried to add my own function to the "class Vis2" field in Vis2.ahk, but I have basic knowledge of autohotkey and I couldn't understand how you use "tooltip" so I couldn't add it.

I request you to inform me about the regular display of Translation results at your convenient time. Thanks

License

Thanks for your work on this library! It makes using tesseract from AHK super quick and easy.

I was wondering if you could add a license to the repository so that a copy can be redistributed in other open source projects?

Compile error

Hi, I facing an issue when compile demo.ahk file.
image
AHK v2.0.2

Support for multi monitor

I have a two-monitor setup where my external monitor is my primary display and my laptop screen (1920x1080) is my secondary display. I used WindowSpy.ahk to identify the coordinates on my laptop screen, but the OCR does not appear to work with negative coorindates, e.g.: OCR([-1849, 362, 64, 17]).

Is there a way to OCR a region from a secondary display?

Reading some numbers outputs symbols

Hello.
The numbers I am trying to capture with your tool can only contain numbers, not letters or symbols. However with the current version it sometimes reads some numbers as symbols instead. I suppose that if it was not looking for symbols, but only numbers, it would have a higher success rate in my case.

Is there a way to specify to only look for numbers (or maybe numbers and letters) and nothing else?

Super great tool!
Thank you.

How does leptonica_util.exe work?

Can you talk a bit about how leptonica_util.exe works? What algorithms does it use? It looks like it's called like this:
leptonica_util.exe input_path output_path 2 0.5 1 3.5 1 5 2.5 1 2000 2000 0 0 0.0
What do the numeric parameters mean?

Vis2 error

Hello,

Let me start by saying I want to honestly say that I am not using Vis2 through AHK, but through Pullover Macro Creator. I know how annoying it can be to receive a help request from something completely different than what you developed it for, so i would like to apologize in advance, but unfortunately I have run into a complete dead end and reaching out to you is my last resort to figure out what the issue is and (if even possible) solve it by circumventing the programming. Even the developer of PMC as well as our own IT department/company security has no clue how to solve this.

The function used was OCR / 'read text from screen and convert to variable'

Have you seen the error below before? Could you enlighten me on what it means and if there is a way to solve this? It seems like PMC/Vis2 is trying to save a temporary file in a restricted location. The computer i am trying to get this to work on is heavily secured. It seems to work 'okay' in admin mode, but this is not a long term solution since the user can not have access to this and needs to call IT support everytime for a temporary fix. If you could enlighten me on the issue, that would already help a bunch so we might be able to program around the security issue.

image

Question: will this be affected by monitor power saving mode?

Hi, I'm testing this on a timer right now so that it scrapes a tiny part of the screen every second. My question is, what happens if my monitor goes into sleep mode -- will this continue to function, or does the monitor have to be on in order for it to do its OCR thing?

Cannot be opened

Hi i have many problem, the demo.ahk doesn't work, the libs don't wan't be opened

Get found text position

Hi,

is there any way of getting the found text's position ?

For example if if search in a certain area :
text := OCR([0, 0, 430, 150])

How can I get the text X and Y ?
Thanks!

Tilted/skewed text

This might be a feature request. But is it possible to capture the image pre-skewed / tilt-corrected so that tesseract receives it horizontal?

image

Vis2 error

Hello,

I am using VIS2 through Macro Creator. The developer indicates that he has no idea why this is not working for me. Ive used this succesfully on other computers, so i am trying to narrow down why this error is occuring and what i can do to solve it. Is this error familiar to you?

image

If needed i can test with AHK too

Call to nonexistent function

Call to nonexistent function.
Wis2-meta\Vis2-meta\Lib\Graphics.ahk*:
Specifically: GetMonitorlnfo(image)

Line#
2328: M:= GetMonitorlnfo(image}

Getting this when starting demo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.