kalp695 / camscan Goto Github PK
View Code? Open in Web Editor NEWThis project forked from dstorch/camscan
This project forked from dstorch/camscan
############################################################################# # CAMSCAN -- # An application for producing useful documents from images. # # Contents: # 1. Project set-up # 2. Design overview # 3. Testing # 4. Additional resources # # Authors and Contact: # 1. Michelle Micallef # [email protected] # [email protected] # 2. David Storch # [email protected] # [email protected] # 3. Sam Birch # [email protected] # [email protected] # 4. Stylianos Anagnostopoulos # [email protected] # [email protected] # # BROWN UNIVERSITY # May 2011 # ############################################################################# ############################################################################# # 1. Project set-up ############################################################################# a) core === This is the source code folder containing the core of the system. The Core consists of a main delegator class (CoreManager.java), and classes which implement our document model (e.g. Document.java and Page.java). b) gui === The gui/ directory is a second source code location which contains the code implementing CamScan's graphical user interface. c) managers === This third and final source code directory contains all of the sub-modules involved in CamScan. Each module is implemented inside its own package. As of this writing there are four modules: -export -ocr -search -vision d) libraries === The libraries/ directory contains most of CamScan's dependencies. For example, libraries/jar contains sever jar files which must be included in the java build path in order for CamScan to run. The libraries/icons folder contains the images that are appear on the GUI as icons. e) workspace === This is the directory where CamScan automatically keeps user data. Specifically, raw image files imported by the user are always copied to workspace/raw. Processed versions of the raw file are temporarily written to workspace/processed. Temporary products produced by Tesseract are written to workspace/temp. The document metadata itself is stored in workspace/docs in a series of XML files. f) tests === Contains test data used in development of CamScan. See Section 3 of this readme for more on testing. ############################################################################# # 2. Design overview ############################################################################# a) Introduction === Our main design principle is that we have a GUI which calls methods from the core of the system, and that the core delegates to independent submodules. These modules can be executed independently from the rest of the system, and do not rely on one another or call each other's functions. Another important design principle is that the core represents the state of the system at any given time. However, the core also provides serialization functionality which writes the state of the system to disk according to an XML-based specification. By calling serialization functions at appropriate times, the state of the system is kept synchronized with data written on the disk. b) Languages === The core part of the system is written in Java. The computer vision module makes use of a Java wrapper for openCV, so the image processing is outsourced to C/C++. The OCR module makes use of a Python script in order to extract useful data from the output of Tesseract. This was useful because a lightweight Python script based on regular expressions could do the same work as a much more IO-intensive Java program. The PDF export functionality is written almost entirely in Python. We are using ReportLab, a convenient PDF drawing library for Python. ############################################################################# # 3. Testing ############################################################################# a) Unit tests === The modules have been separately unit tested. The tests can be executed simultaneously by invoking the "testall" shell script. This script can be found in the tests/ subdirectory of the project folder. This script just calls a series of unit testing scripts. See the commenting inside testall.sh for details. b) System tests === The user interface and system integration were tested interactively on a set of standard test data. Our test data includes both images and test XML documents, located respectively in tests/xml/ and tests/images/. For more information on system tests, see tests/TESTING.txt. ############################################################################# # 4. Additional resources ############################################################################# For instructions on how to use CamScan, see our user guide!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.