Giter Club home page Giter Club logo

camscan's Introduction

#############################################################################
# CAMSCAN --
#   An application for producing useful documents from images.
#
# Contents:
#   1. Project set-up
#   2. Design overview
#   3. Testing
#   4. Additional resources
#
# Authors and Contact:
#  1. Michelle Micallef
#     [email protected]
#     [email protected]
#  2. David Storch
#     [email protected]
#     [email protected]
#  3. Sam Birch
#     [email protected]
#     [email protected]
#  4. Stylianos Anagnostopoulos
#     [email protected]
#     [email protected]
#
# BROWN UNIVERSITY
# May 2011
#
#############################################################################



#############################################################################
# 1. Project set-up
#############################################################################

a) core
===
This is the source code folder containing the core of the system. The Core
consists of a main delegator class (CoreManager.java), and classes which
implement our document model (e.g. Document.java and Page.java).

b) gui
===
The gui/ directory is a second source code location which contains the code
implementing CamScan's graphical user interface.

c) managers
===
This third and final source code directory contains all of the sub-modules
involved in CamScan. Each module is implemented inside its own package.
As of this writing there are four modules:
	-export
	-ocr
	-search
	-vision
	
d) libraries
===
The libraries/ directory contains most of CamScan's dependencies.
For example, libraries/jar contains sever jar files which must be included
in the java build path in order for CamScan to run. The libraries/icons
folder contains the images that are appear on the GUI as icons.

e) workspace
===
This is the directory where CamScan automatically keeps user data.
Specifically, raw image files imported by the user are always copied
to workspace/raw. Processed versions of the raw file are temporarily
written to workspace/processed. Temporary products produced by Tesseract
are written to workspace/temp. The document metadata itself is stored
in workspace/docs in a series of XML files.

f) tests
===
Contains test data used in development of CamScan. See Section 3 of this
readme for more on testing.


#############################################################################
# 2. Design overview
#############################################################################

a) Introduction
===
Our main design principle is that we have a GUI which calls methods from
the core of the system, and that the core delegates to independent submodules.
These modules can be executed independently from the rest of the system, and
do not rely on one another or call each other's functions.

Another important design principle is that the core represents the state
of the system at any given time. However, the core also provides serialization
functionality which writes the state of the system to disk according to an XML-based
specification. By calling serialization functions at appropriate times, the state
of the system is kept synchronized with data written on the disk.

b) Languages
===
The core part of the system is written in Java. The computer vision module
makes use of a Java wrapper for openCV, so the image processing is outsourced
to C/C++.

The OCR module makes use of a Python script in order to extract useful
data from the output of Tesseract. This was useful because a lightweight
Python script based on regular expressions could do the same work as a much
more IO-intensive Java program.

The PDF export functionality is written almost entirely in Python. We are
using ReportLab, a convenient PDF drawing library for Python.

#############################################################################
# 3. Testing
#############################################################################

a) Unit tests
===
The modules have been separately unit tested. The tests can be executed
simultaneously by invoking the "testall" shell script. This script can
be found in the tests/ subdirectory of the project folder. This script
just calls a series of unit testing scripts. See the commenting inside
testall.sh for details.

b) System tests
===
The user interface and system integration were tested interactively on
a set of standard test data. Our test data includes both images and test
XML documents, located respectively in tests/xml/ and tests/images/.

For more information on system tests, see tests/TESTING.txt.

#############################################################################
# 4. Additional resources
#############################################################################

For instructions on how to use CamScan, see our user guide!


camscan's People

Contributors

sbirch avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.