lukevanin / ocrai Goto Github PK

Optical Character Recognition Artificial Intelligence iOS app for Udacity nanodegree

License: MIT License

Swift 91.04% Objective-C 8.96%

optical-character-recognition natural-language-processing contacts ios swift mapkit monkeylearn google-vision udacity-nanodegree monkeylearn-api

ocrai's People

Contributors

Stargazers

Watchers

Forkers

dmellop mspviraj tarsbase idevashish

ocrai's Issues

Show type selector when adding a new field

The data type must be selected before a new field entity is created.
The UI should reflect the type of data being created.
The data should be checked for conformance when the data is entered.
UI features should match the data being entered.

Types of data:

Person (Faces, Names, Roles, Departments)
Organisation (Name, Logo)
Phone number
Email
URL
Network account (service, account name, public URL)

App crashes when taking photo

Reproduce:

Launch app
Tap on camera (accept permissions if prompted).
Tap on photo icon.
Tap use photo
App crashes

Parse machine readable codes

Identify and parse machine readable codes using CoreImage.

Identify and extract logos from images with Google vision API

Identify dates in images

Add capability using existing data detector.

Documents should be coloured according the primary intent (organisation, person, event)

Use structured data for fields

After scanning data is stored as key value pairs. It would be beneficial to store certain kinds of data, such as addresses, in specialised data structures.

Structured data
Addresses consist of multiple components, and can be used to derive additional data, such as geographical coordinates. The current key-value storage schema prevents this.

Unstructured data
Unstructured data, such as names and untagged text, should may be stored as key value pairs. The data may be tagged to indicate its intent. E.g. name (first and last if possible), organisation, department, salutation.

Semi-sructured data
Semi-structured data, such as phone numbers, URLs, email addresses, and social media names, may also be stored as plain text. These values may be labelled (e.g. home, work, fax, etc) to indicate their role. It would be beneficial to provide UI functions specific to the type of data. e.g. Call a phone number, send a message to a phone number or email address, or open a web page. All of these can be shared. This kind of data should be validated for conformance to accepted protocols. When the user edits information it should be checked for conformance. If the data does not conform, it should be saved and a warning shown.

Tags for phone numbers: home, work, fax
Tags for email: home, work
URLs are not tagged, although they can be labelled: Blog, web site, home page, news, twitter, Facebook.
Social media names should be associated with recognised social media providers (Twitter, Facebook). It should be possible to derive a profile URL from the name. The user should be allowed to convert an unrecognised social media name into a URL. Social media accounts may be a specialised form of URL (i.e. the account name is converted to a URL, which is labelled automatically to indicate a social media account).

Toolbar overlaps last field on document.

See attached screenshot. The app is in edit mode. Note the toolbar occludes the last "address" field which is an empty placeholder.

Blank field added to document

Reproduce:

Select document from list (empty or pre-populated).
Tap edit.
Tap on empty field.
Do not enter any text.
Tap on another field.
Note the first field is saved and a new empty field appears.

Expected:
Empty field should not be saved.

Improve scanner user feedback

Current: Scanner process works atomically. Document is scanned in full, then imported into the data store.

Problem: User must wait for the entire scanning process to complete before seeing results.

Goal: Scanner should update data store incrementally as soon as data becomes available.

Implementation: Create a builder interface for composing document. Scanners send detected data to the builder. The builder updates the data store. View controller observes the data store and updates the view when the data store changes.

Scan button becomes unresponsive after scrolling

Tap to view a document (with or without fields).
Tap scan button. Note scanning functions normally (progress indicator appears).
Scroll document.
Tap scan button again. Nothing happens.

Wrong colour shown when moving item between sections

Reproduce:

Tap on an item to go to the detail view.
Scan the item or add an entity.
Tap to edit the item.
Drag an item to a different section.

The item keeps has the colour of the section it was in.

after

before

Image should be resized to maximum dimensions before uploading to API

Prevent inline editing on document screen

Existing information is overwritten when scanning

Create a document.
Add fields to the document by scanning, or manual entry.
Tap scan button.
Existing fields are removed and replaced with scanned information.

Expected:

App should prompt user before overwriting information.

iPad layout

Show document as slide out detail view.

Add keyword detection

Add support in scanner for keyword detection api.
Show keywords in scanned document.
Allow keywords to be edited, added, and removed.

Context aware actions for phone numbers

Call
Send message

Shared contact does not include a postal address

Add a document.
Add a name and address to the document.
Tap share.
Save to contacts.
View contact.
Contact has name, but no postal address.

App crashes on device when camera button is tapped on device without a camera

Launch app on device without a camera (iPod or simulator).
Tap camera button.
App crashes.

List screen should show indicator when scanning is in progress

Add a document from the camera or photo library, or tap on an existing document and tap on the scan button.
Scanning begins.
Tap the back button (while scanning is underway).
List appears.
Wait for scanning to complete.
List is updated.

Expected:
Message or activity indicator should appear to show that scanning is in progress.

Google Vision Api Key

Hello I'm getting an issue in this code .
The GoogleVisionApi key is expired, So what can i do .

Update images for toolbar buttons

Library button
Camera button
Scan button

Show prompt when document is empty.

Add "role" to fragment types.

Role describes the position a person fills at an organisation.

Add "Add new field" button on document screen

UI: Improve field type indicators

Coloured dots are shown next to each field. The dots are intended to indicate the field type. The colour is ambiguous without context.

Goal: Add a legend to indicate the field type, or add icons instead of dots, or remove indicators entirely and rely on section headers.

Context aware actions for email address field

Send email

Context aware actions

Actions which can be performed on any field:

Copy
Share
Delete

Define abstract interface to be implemented by model objects. Interface should define the actions which the object can perform.

Define abstract interface for actions. Actions do not have state. An action is simply an interface to a task which can be executed. Actions may need to be aware of the view hierarchy (i.e. view controller) to present UI. Do actions need to notify the application on completion? An action may be shown as a table view action (delete), or as an activity. Actions may need to define a presentation intent.

Support raw image format to capture losses images

App extension for scanning images from imaging apps (ie photos app and camera roll).

Address sometimes parsed as two separate parts

Occurs when the scanned text data contains recognisable address data interleaved with other data. The app does not recognise that the two parts of data are related.

The addresses should be merged into a single entity. Separate addresses should stay disjointed.

Possible solutions:

Use coordinate proximity to determine relationship.
Merge by matching data with corresponding missing fields. E.g. If A has a street but no country, and B has a country but no street, then the addresses can be merged.

This may be resolved using Microsoft Vision API which groups information differently.

Alternatively, allow user to select addresses to merge. Use case:

Tap on address.
Tap merge button on context menu.
List of all other addresses appears.
Tap address to merge into.
Show preview of merged address. Corresponding fields which both contain content are concatenated. Alternatively user can control the field merging by selecting the fields to be included.
A new object is created with the merged data. The merged objects are deleted.

Library images do not appear after granting app access permission

Reproduce:

Add an image to the photo library or camera roll on the device.
Install app
Launch app
Tap library button
Permissions alert appears
Grant permissions
List is empty
Dismiss library
Tap library button again
List contains items

Identify faces in images

Support face identification and cropping (Google, Microsoft, etc)

Search

Search documents from listing screen

In-app text recognition using Tesseract

Improve organisation name detection

Possible solutions:

Use lexical analysis to determine use of common and possessive nouns to determine if a name is an organisation (false positives such as "Bill Hammer", won't work for acronyms "NASA").
Use context: relative text coverage (big names are probably organisations), names near logos are probably organisations.

Swipe to delete field on document screen.

Add support for Haven On-demand for entity extraction

Add support for Microsoft entity linking API

Support Microsoft entity linking API for entity extraction.

https://www.microsoft.com/cognitive-services/en-us/entity-linking-intelligence-service

Show grid layout on iPad (also landscape view on iPhone)

Currently this uses a table view which spans the width of the screen.

Normalize image orientation

Image orientation metadata is not used when rendering annotation overlays. The image should be rendered to remove the orientation, or the annotations should be rendered using orientation.

Integrate MonkeyLearn data extraction for phone numbers, email addresses, postal addresses, dates.

https://app.monkeylearn.com/main/extractors/ex_dqRio5sG/

Refactor data model

Current: Raw data from data detector is stored as fragments with annotations demarcating the detected data. The user views and edits fragments directly.

Problem: Fragment data does not correspond directly to the user's needs. E.g. changing a field to a different type, inserting a new field, or removing a field, causes the data to no longer correspond to the scanned data.

Goal: Decouple scanned data from user data. Data should be modelled to better fit the intent of user modification. Original data should be retained if needed separately from user modification.

Context aware actions for addresses

Open in maps

Add support for Microsoft Computer Vision API for text recognition

Support Microsoft computer vision API for extracting text from images.

https://www.microsoft.com/cognitive-services/en-us/computer-vision-api

Improve editing

Current: Fields are grouped by type. Fields are edited inline. Field type is changed by dragging to a different section.

Problem: Editing controls (edit, add, move) makes the view feel busy and crowded, which impedes usability. Dragging fields is problematic (sections may be off screen requiring scrolling while dragging which is hard to do reliably, user may not know which direction to drag a field to).

Goal: Tap on a field to show an edit screen for that field. Show a picker with field types. Customise the view to accommodate the data being edited (allow multilines for addresses, disallow multiline for phone numbers and email).

lukevanin / ocrai Goto Github PK

ocrai's People

Contributors

Stargazers

Watchers

Forkers

ocrai's Issues

Recommend Projects

Recommend Topics

Recommend Org