Giter Club home page Giter Club logo

Comments (1)

galkahana avatar galkahana commented on July 20, 2024

Hi @mscdex,

Thanks for your question.
I am not in posession of banchmark results for either PDFWriter or HummusJS.

Back at the time, when I first started PDFWriter, and used it in one of my projects to replace a certain spawn of PDFLib.
I did test it and saw that it performs much better. This was about 4 years ago. The company I was working at the time was XMPie, which is a branch of Xerox. They have since used PDFWriter as their PDF writing component in any of their projects. They may have numbers. Maybe you can contact them to get them.
Either of the libs performed better than Adobe PDFLib at the time. And for certain both were cheeper.

I did recently compare HummusJS to PhantomJS based PDF creation, and it clearly performed better, not involving a heavy lifting full fledged browser behind it.
But that may be a little unfair.

In any case, other than my own experiance, Im using "high performance" when describing either due to some very hard performance oriented design princples that the library uses.
I'll iterate some:

  1. PDFWriter writes the PDF in one go, and on the go. It writes the content as soon as the user calls the command to draw the content. This is as oppose to other PDF writing models that prefer
    to build a certain structure in memory and flush it to a PDF at the end. This helps maintaining a fairly constant low memory signature. This is very good in server environments where one should be able
    to start many multiple co existing processes, and be able to maintain them. In addition, the library makes sure to write all content in one go not going be to rewrite elements, allowing for a non-seekable output option. no seeks, less work, better performance.
  2. Same goes to parsing - In both PDF embedding scenarios and plain parsing, the parser element of PDFWriter reads exactly what is required and no more. Other models read the whole PDF and only then you can start asking them for favors. Not Hummus. Hummus reads the minimal catalog of the PDF, and then it reads only whats absolutely necesary per the request. If a page is to be parsed, only the top level page element is read.
    This makes parsing and embedding (which required parsing) perform significantly better than other solutions. Oh, and let's not forget pdf modification, which is also an option in PDFWriter/HummusJS, and obviously requires some parsing.
  3. jpeg, tiff decoding is minimal - apart from PDF, Hummus also supports jpeg and tiff images. for both formats the images are written pretty much as is relying mostly on PDF image capabilities. This is instead of parsing and decoding the images and writing them in some median format. Something which may enable some permutation on the image, but costs in performance.
  4. fonts are written as their closest matched format - when embedding fonts PDFWriter picks the closest possible match in PDF font formats (type42, CFF). This allows for match in quality, as well as minimal computation force. For instance, i saw already rendering engines that convert type 42 to type 1, and that's less favorable.
  5. no mediators - Not using PhantomJS or any other middle-man engine. just direct PDF writing. This means that i have to work hard for every features, but anything that comes out, is direct, and hopefully fast. if not...the code is there to improve.
  6. custom streams writing - Other solutions may require that you write to a file, only to then later write to your real target. Hummus let's you provide a custom stream object, allowing you to write directly to say a response stream, instead of having the unpleasent IO exchange. Most of the reading that hummus does (images, fons) allows for something similar.
  7. optimizations welcome - last but not least, if someone reports about a performance issue I make sure to look into it. As said "high performance" is something that we look at very carefully. I prefer it over features. As an example, i didn't go into lineraized PDFs becasue it hinders the writing performance by design, requiring at least an additional writing phase.

Hope this helps,
Best regards,
Gal.

from pdf-writer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.