As an aside, coding for this, initially toy project, began during a course I took during my senior year at Transylvania University.
- OCR - edit, mine and store text from scanned images of handwritten, typewritten or printed text.
- File Manager - including support for *.pdf and *.docx
- Text Analytics - bag-of-words and beyond, supporting per page term frequency counts. Allows users to easily structure the semantic content of even their most lengthy, dense documents.
- Radial Document Visualization - visualize document-term matrices, including per page term frequency counts, in an easy-to-read circular format courtesy of the Circos Graph API. Imagine having 1000 unread documents in your workspace, each with 1000 pages of unread text. How quickly could you determine which page is most relevant to your investigation?