Giter Club home page Giter Club logo

sambox's Introduction

SAMBox PDF processor

Build Status License

An Apache PDFBox fork intended to be used as PDF processor for Sejda and PDFsam related projects

What's different from PDFBox?

  • Requires JDK8
  • Lazy loading/parsing of PDF objects. Only the document xref table(s)/stream(s) is(are) initially parsed and information to lookup objects are retrieved, when later a PDF object is requested, the object is retrieve/parsed using the lookup information. This allows minimal memory footprint when you only need part of the document (Ex. you only need the information dictionary or the number of pages of the document).
  • Multiple I/O implementations to read from. SAMBox uses Sejda-io allowing to use one of the provided implementation based on java.nio.channels.FileChannel, java.io.InputStream and java.nio.MappedByteBuffer (buffered or not).
  • Minimized GC through the use of a pool of java.lang.StringBuilder.
  • PDF streams are read directly from the underlying source through the concept of bounded views.
  • Use and discard of lazy PDF objects. PDF objects can be written (sync/async) and discarded as soon as they have been written, this is particularly useful with existing documents where objects are lazy loaded, written and then discarded, keeping a low memory footprint.
  • All the I/O and parsing logic has been refactored to smaller classes which are nearly 100% unit tested.
  • Some of the PDFBox features are currently of no use for Sejda or PDFsam and they have been removed from SAMBox (preflight validator, fdf, digital signature... ).
  • Documents can be saved using objects stream to store PDF objects.

Are PDFBox commits merged to SAMBox?

SAMBox is a fork of a SNAPSHOT of PDFBox 2.0.0 and we try to keep it aligned with it. We performed massive changes on the original codebase and the same did the PDFBox guys since the time of the forking so merging back stuff from the PDFBox trunk is sometime challenging, we do our best.

How do I load a document?

Here is a snippet to load a document from a file and write it back to a newFile.

		try(PDDocument document = PDFParser.parse(SeekableSources.seekableSourceFrom(file))){
            document.writeTo(newFile, WriteOption.XREF_STREAM);
        }

Tuning

Some system properties are available to modify SAMBox default behaviour. Take a look at org.sejda.io.SeekableSources and org.sejda.sambox.SAMBox to find out which are currently available.

sambox's People

Contributors

ediweissmann avatar jahewson avatar jmaerki avatar jukka avatar lehmi avatar thausherr avatar torakiki avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.