Basic PDF parser whose functionality is limited at parsing out text and image bounding rectangles.
Generate demo.pdf
by running chrome --headless --print-to-pdf="demo.pdf" demo.html
.
npx serve .
https://skia.org/dev/design/pdftheory
https://en.wikipedia.org/wiki/PDF#File_structure
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf (last free and available version before ISO started selling the standard)
https://www.pdfexaminer.com can be used to visualize the internal structure and cross-reference it with my code.
src/display/canvas.js
: CanvasGraphics
For now using the browser for ArrayBuffer
and DataView
.
https://wiki.tcl-lang.org/page/Parsing+PDF might have info on this.
https://skia.org/dev/design/pdftheory
Be stricter about allowed value formats.
Ensure the number of entries is equal to the count by using a for loop and then expecting the trailer.
For this, a proper formatting of values needs to be implemented first, so that things like names and strings do not get interrupted by otherwise significant characters and the slash can be used as a stop character when parsing the value.