Giter Club home page Giter Club logo

captr's Introduction

captr: R Client for the Captricity API

Build Status Build status CRAN_Status_Badge Coverage Status Research software impact Github Stars

OCR text and handwritten forms using Captricity. Captricity's big advantage over Abbyy Cloud OCR is that it allows the user to easily specify the position of text-blocks that want to OCR; they have a simple web-based UI. The quality of the OCR can be checked using compare_txt from recognize.

Installation

To get the latest version on CRAN:

install.packages("captr")

To get the current development version from GitHub:

install.packages("devtools")
devtools::install_github("soodoku/captr", build_vignettes = TRUE)

Using captr

Read the vignette:

vignette("using_captr", package = "captr")

or follow the overview below.

Start by getting an application token and setting it using:

set_token("token")

Then, create a batch using:

create_batch("batch_name")

Once you have created a batch, you need to get the template ID (it tells Captricity what data to pull from where). Captricity requires a template. These templates can be created using the Web UI.

set_template_id("id")

Next, assign the template ID to a batch:

set_batch_template("batch_id", "template_id")

Next, upload image(s) to a batch

upload_image(batch_id="batch_id", path_to_image="image_path")

Next, check whether the batch is ready to be processed:

test_readiness(batch_id="batch_id")

You may also want to find out how much would processing the batch set you back by:

batch_price(batch_id="batch_id")

Once you are ready, submit the batch:

submit_batch(batch_id="batch_id")

Captricity excels in nomenclature confusion. So once a batch is submitted, it is then called a job. The id for the job can be obtained from the list that is returned from submit_batch. The field name is related_job_id.

To track progress of a job, use:

track_progress(job_id ="job_id")

List all forms (instance sets) associated with a job:

list_instance_sets(job_id="job_id")

If you want to download data from a particular form, use the list_instance_sets to get the form (instance_set) id and run:

get_instance_set(instance_set_id="instance_set_id")

Get csv of all your results from a job:

get_all(job_id="job_id")

License

Scripts are released under the MIT License.

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

captr's People

Contributors

soodoku avatar

Stargazers

Hadj H. avatar AM avatar Jas Sohi avatar Andrey Ogurtsov avatar Marouane Zellou avatar Matthew Henderson avatar  avatar  avatar Roman Tsegelskyi avatar Gergely Daróczi avatar  avatar Alex Bresler avatar  avatar Ibrahim Mutlay avatar  avatar

Watchers

James Cloos avatar Ibrahim Mutlay avatar

Forkers

fxcebx

captr's Issues

Vignette example grows list

which has O(n^2) behaviour:

img_dir_path <- paste0(path.package("captr"), "/inst/extdata/wisc_ads/")
upimage <- list()
j <- 1
for(i in dir(img_dir_path)){
    upimage[[j]] <- upload_image(batch_id= batch$id, path_to_image= paste0(img_dir_path, i))
    j <- j + 1
}

names(upimage[[5]])

Either preallocation, or use lapply():

path <- system.file("extdata/wisc_ads", package = "captr")
files <- dir(path, full.path = TRUE)

upimage <- lapply(files, upload_image, batch_id = batch$id)

(also note the use of system.file())

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.