Giter Club home page Giter Club logo

pdf-writer's Introduction

Welcome to PDF-Writer, Also known as PDFHummus.
PDFHummus is a Fast and Free C++ Library for Creating, Parsing an Manipulating PDF Files and Streams.

Documentation is available here. Project site is here.

There is also a NodeJS module named MuhammaraJS wrapping PDFHummus PDF library and making it available for that language. It is the current and still supported version of a now deprecated HummusJS project of mine, which julianhille maintains.

First time around

This is a C++ Project using CMake as project builder. To build/develop You will need:

  1. a compiler. for win you can use vs studio. choose community version - https://visualstudio.microsoft.com/
  2. cmake - download from here - https://cmake.org/

Given that this is a Library and not an executable, you can ignore this cmake setup and just use the code as is by copying the folders into your own project. There are, however, better ways to include the code in your own project. The project cmake setup defines PDFHummus as a package and allows you to import the project directly from this repo (remotely) or by pre-installing the package. The instructions below contains information about building the project locally, testing and installing it, as well as explaining how to use CMake FetchContent functionality in order to import the project automatically from this repo

For documentation about how to use the library API you should head for the Wiki pages here.

Short tour of the project

There are 8 folders to this project:

  • FreeType, LibAesgm, LibJpeg, LibPng, LibTiff, Zlib: 6 libraries that are dependencies to PDFWriter. They are bundled here for convenience. You don't have to use them to compile PDFWriter, but rather use what versions you have installed on your setup.
  • PDFWriter: main folder, includes the library implementation
  • PDFWriterTesting: test folder, includes test source code that's used with cmake testing application - ctest.

Building, Installing and testing the project with CMake

Once you installed pre-reqs, you can now build the project.

Create the project files

To build you project start by creating a project file in a "build" folder off of the cmake configuration, like this:

mkdir build
cd build
cmake ..

options for generating Cmake project files

The project defines some optional flags to allow you to control some aspects of building PDFHummus.

  • PDFHUMMUS_NO_DCT - defines whether to exclude DCT functionality (essentially - not include LibJpeg) from the library. defaults to FALSE. when setting TRUE the library will not require the existance of LibJpeg however will not be able to decode DCT streams from PDF files. (note that this has no baring on the ability to include JPEG images. That ability does not require LibJpeg given the innate ability of PDF files to include DCT encoded streams).
  • PDFHUMMUS_NO_TIFF - defines whether to exclude TIFF functionality (essentially - not include LibTiff) from the library. defaults to FALSE. when setting TRUE the library will not require the existance of LibTiff however will not be able to embed TIFF images.
  • PDFHuMMUS_NO_PNG - defines whether to exclude PNG functionality (essentially - not include LibPng) from the library. defaults to FALSE. when setting TRUE the library will not require the existance of LibPng however will not be able to embed PNG images.
  • USE_BUNDLED - defines whether to use bundled dependencies when building the project or use system libraries. defaults to TRUE. when defined as FALSE, the project configuration will look for installed versions of LibJpeg, Zlib, LibTiff, FreeType, LibAesgm, LibPng and use them instead of the bundled ones (i.e. those contained in the project). Note that for optional dependencies - LibJpeg, LibTiff, LibPng - if not installed the coniguration will succeed but will automatically set the optional building flags (the 3 just described) according to the libraries avialability. As for required dependencies - FreeType, LibAesgm, Zlib - the configuration will fail if those dependencies are not found. see USE_UNBUNDLED_FALLBACK_BUNDLED for an alternative method to deal with dependencies not being found.
  • USE_UNBUNDLED_FALLBACK_BUNDLED - Defines an alternative behavior when using USE_BUNDLED=OFF and a certain dependency is not installed on the system. If set to TRUE then for a dependency that's not found it will fallback on the bundled version of this dependency. This is essentially attempting to find installed library and if not avialable use a bundled one to ensure that the build will succeed.

You can set any of those options when calling the cmake command. For example to use system libraries replaced the earlier sequence with:

cd build
cmake .. -DUSE_BUNDLED=FALSE

Build

Once you got the project file, you can now build the project. If you created an IDE file, you can use the IDE file to build the project. Alternatively you can do so from the command line, again using cmake.

The following builds the project from its root folder:

cmake --build build [--config release]

This will build the project inside the build folder. what's in brackets is optional and will specify a release onfiguration build. You will be able to look up the result library files per how you normally do when building with the relevant build environment. For example, for windows, the build/PDFWriter/Release folder will have the result PDFWriter file.

Testing

This project uses ctest for running tests. ctest is part of cmake and should be installed as part of cmake installation. The tests run various checks on PDFHummus...and I admit quite a lot of them are not great as unitests as they may just create PDF files without verifying they are good...one uses ones eyes to inspect the test files to do that...or revert to being OK with no exceptions being thrown, which is also good. They are decent as sample code to learn how to do things though 😬.

To run the project tests (after having created the project files in ./build) go:

ctest --test-dir build [-C release]

This should scan the folders for tests and run them. Consider appending -j22 to the command in order to run tests in parallel to speed things up.

You should be able to see result output files from the tests under ./build/Testing/Output.

Note that ctest does NOT build the project. It may fail if there's no previous build, or will not pick up on your changes if you made them since the last build. For This purpose there's an extra target created in the project to make sure the project and test code is built (as well as recreating the output folder to clear previous runs output):

cmake --build build --target pdfWriterCheck [--config release]

Installing

If you want, you can use the install verb of cmake to install a built product (the library files and includes). Use the prefix param to specify where you want the result to be installed to

cmake --install build --prefix ./etc/install [--config release]

This will install all the library files in ./etc/install. You should see an "include" folder and a "lib" folder with include files and library files respectively.

Using PDFHummus in your own project

If you want to use PDFHummus there are several methods:

  • copying the sources to your project
  • installing the project and including the result in your project
  • using PDFHummus package in your cmake project

Not much to say about the first option. 2nd option just means to follow the installation instructions and then pointing to the resultant lib and include folders to build your project.

3rd option is probably the best, especially if you already have cmake in your project. This project has package definition for PDFHummus package, which means you can have cmake look for this package and include it in your project with find_package. Then link to the PDFHummus::PDFWriter target and you are done. Another option is to do this + allow for fetching the project content from the repo with FetchContent. Here's an example from the PDF TextExtraction project of mine:

include(FetchContent)

FetchContent_Declare(
  PDFHummus
  GIT_REPOSITORY https://github.com/galkahana/PDF-Writer.git
  GIT_TAG        v4.6.2
  FIND_PACKAGE_ARGS
)
FetchContent_MakeAvailable(PDFHummus)

target_link_libraries (TextExtraction PDFHummus::PDFWriter)

This will either download the project and build it or use an installed version (provided that one exists and has a matching version). Change the GIT_TAG value to what version you'd like to install. You can use tags, branches, commit hashs. anything goes. Includes are included haha.

You may consider an alternative form that uses URL instead of GIT_REPOSITORY, like this:

include(FetchContent)

FetchContent_Declare(
  PDFHummus
  URL https://github.com/galkahana/PDF-Writer/archive/refs/tags/v4.6.2.tar.gz
  URL_HASH SHA256=0a36815ccc9d207028567f90039785c824b211169ba5da68de84d0c15455ab62
  DOWNLOAD_EXTRACT_TIMESTAMP FALSE
  FIND_PACKAGE_ARGS
)

FetchContent_MakeAvailable(PDFHummus)

This has the benefit of fetching the archive URL rather than cmake runnig git clone on the specified target. PDFWriter archives since version v4.6.2 do not include PDFWriterTesting folder and its materials, making it a singificantly smaller download. You can find the archive urls in the Releases area for this repository.

Note that when installing PDFHummus with the bundled libraries built (this is the default behvaior which can be changed by setting USE_BUNDLED variable to FALSE) there are additional targets that PDFHummus includes:

  • PDFHummus::FreeType - bundled freetype library
  • PDFHummus::LibAesgm - bundled aesgm library
  • PDFHummus::LibJpeg - bundled libjpeg library
  • PDFHummus::LibPng - bundled libpng library
  • PDFHummus::LibTiff - bundled libtiff library
  • PDFHummus:::Zlib - bundled zlib library

You can use those targets in additon or instead of using PDFWriter if this makes sense to your project (like if you are extracting images, having LibJpeg or LibPng around can be useful).

Packaging PDFHummus for installing someplace else

The project contains definitions for cpack, cmake packaging mechanism. It might be useful for when you want to build PDFHummus and then install it someplace else.

The following will create a zip file with all libs and includes:

cd build
cpack .

VSCode usage

If you are developing this project using vscode here's some suggestions to help you:

  • install vscode C++ extensions:
    • C/C++
    • C/C++ Extension Pack
    • C/C++ Themes
  • install vscode cmake extensions:
    • Cmake
    • Cmake Tools
    • CMake Test Explorder

This should help you enable testing and debugging the tests in vscode.

More building instructions for when you cant use cmake

iOS

I wrote a post about how to compile and use the library for the iPhone and iPad environments. you can read it here.

Build insturctions for other scenraios

It should be quite simple to construct project files in the various building environments (say VS and Xcode) if you want them. Here are some pointers:

  • All the PDFWriter sources are in PDFWriter folder (you can get it by downloading the git project or from the Downloads section).
  • The library is dependent on the dlls/shared libraries of Zlib, LibTiff, LibJpeg, LibPng and FreeType. When linking - make sure they are available.
  • The library should support well both 32 bit and 64 bit environments. It's using standard C++ libraries.

pdf-writer's People

Contributors

amrnablus avatar clairou avatar da-liii avatar fajan avatar filodej avatar galkahana avatar gerronimo avatar ggiorkhelidze avatar jjelosua avatar junrrein avatar kaarrot avatar katalytikos avatar manisandro avatar mathiasborn avatar mgubi avatar michut avatar murakamishinyu avatar owlycode avatar stijnherfst avatar t-gergely avatar thmclellan avatar tiliasagen avatar timgates42 avatar toge avatar zzzoom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdf-writer's Issues

Resolution for Copying Context

Hi galkahana,

I'm very impressed of your PDF-Writer Project! Love it! 👍

I started to work with it and one question arised concerning the issue of embedding pdf pages into a new pdf file.

Now the problem is, I have a source pdf file which has a specific format and it was made with a specific resolution (300 dpi for example). This resolution shall be kept.

In the Acrobat Reader there's an opportunity to define the resolution for snapshot tool images (Page Display Preferences -> General). The default is 72 dpi I guess.

To copy and paste the whole first page of my source pdf the relevant code looks like this:

firstContext = pdfWriter.CreatePDFCopyingContext(frame_path);

EStatusCodeAndObjectIDType resultFirst = firstContext->CreateFormXObjectFromPDFPage(0, ePDFPageBoxMediaBox);

// placing the pages in a result page
contentContext->q();
contentContext->cm(1,0,0,1,0,0);
contentContext->Do(page->GetResourcesDictionary().AddFormXObjectMapping(resultFirst.second));
contentContext->Q();

So my question: Is it possible to cut out a specific area (or a whole page) from the source pdf file with a defined resolution?

Thanks a lot!
Best regards
Maurice

Image Extraction giving invalid files

Hi, I tried to use the following code to extract all the streams in the pdf file. Some of them were supposed to be images, since the source pdf is a scanned document of 3 pages with ~30KB in size (all of them different) each.
However, the only 3 extracted streams which weren't ASCII text with pdf commands, had exactly 460290 bytes, most it was just a lot of "ÿ" characters (viewed through vim). These 3 streams are different though.
Am I doing something wrong, or did I stumble upon a bug?

More importantly, how can I achieve what I want, i.e. extract images from pdfs, preferably detecting if there's an image per page and it's corresponding layout.

Code follows:

    PDFWriter pdfWriter;
    EStatusCode status;
    PDFRectangle a4size(0,0,595,842);

    status = pdfWriter.StartPDF("mark_test_out.pdf", ePDFVersion13);
    if(status != eSuccess)
        return status;

    PDFDocumentCopyingContext *cp_ctx = pdfWriter.CreatePDFCopyingContext("mark_test.pdf");
    PDFParser *parser = cp_ctx->GetSourceDocumentParser();

    for(unsigned long i = 0; i < parser->GetObjectsCount(); ++i)
    {
        PDFObject *obj = parser->ParseNewObject(i);
        if(!obj) continue;
        PDFObject::EPDFObjectType type = obj->GetType();
        fprintf(stderr, "Object %lu is of type %d\n", i, type);
        if(type == PDFObject::ePDFObjectIndirectObjectReference)
        {
            obj = parser->ParseNewObject(static_cast<PDFIndirectObjectReference *>(obj)->mObjectID);
            type = obj->GetType();
            fprintf(stderr, "\tIndirect object is of type %d\n", type);
        }
        if(type == PDFObject::ePDFObjectStream)
        {
            fprintf(stderr, "\tFound a stream\n");
            std::string fname(std::to_string(i));
            std::ofstream outfile(fname.c_str(), std::ofstream::binary);

            IByteReader *stream = parser->StartReadingFromStream(static_cast<PDFStreamInput*>(obj));
            while(stream->NotEnded())
            {
                IOBasicTypes::Byte buf[4096];
                outfile.write((const char *)buf, stream->Read(buf, sizeof(buf)));
            }

            delete stream;
        }
        fprintf(stderr, "\n");
    }

Failure in JPGParser while parsing specific JPG

Hi Gal,

Running the function PDFWriter::CreateImageXObjectFromJPGFile with specific JPG returns failure.
When debugging I realized that the problem is in the function JPEGImageParser::ReadPhotoshopData, it seems that when resolutionBim not found there is an exceeding in the read bytes.
I added some validation tests that seems to solve the problem, below is the function after my changes:
EStatusCode JPEGImageParser::ReadPhotoshopData(JPEGImageInformation& outImageInformation,bool outPhotoshopDataOK)
{
EStatusCode status;
unsigned int intSkip;
unsigned long toSkip;
unsigned int nameSkip;
unsigned long dataLength;
bool resolutionBimNotFound = true;

do {
    status = ReadIntValue(intSkip);
    if(status != PDFHummus::eSuccess)
        break;
    toSkip = intSkip-2;
    status = SkipTillChar(scEOS,toSkip);
    if(status != PDFHummus::eSuccess)
        break;
    while(toSkip > 0 && resolutionBimNotFound)
    {
        status = ReadStreamToBuffer(4);
        if(status !=PDFHummus::eSuccess)
            break;
        toSkip-=4;
        if(0 != memcmp(mReadBuffer,sc8Bim,4))
            break; // k. corrupt header. stop here and just skip the next
        status = ReadStreamToBuffer(3);
        if(status !=PDFHummus::eSuccess)
            break;
        toSkip-=3;
        nameSkip = (int)mReadBuffer[2];
        if(nameSkip % 2 == 0)
            ++nameSkip;
        SkipStream(nameSkip);
        toSkip-=nameSkip;
        resolutionBimNotFound = (0 != memcmp(mReadBuffer,scResolutionBIMID,2));
        status = ReadLongValue(dataLength);
        if(status != PDFHummus::eSuccess)
            break;
        toSkip-=4;
        if(resolutionBimNotFound)
        {
            if(dataLength % 2 == 1)
                ++dataLength;
            toSkip-=dataLength;
            SkipStream(dataLength);
        }
        else
        {
            status = ReadStreamToBuffer(16);
            if(status !=PDFHummus::eSuccess)
                break;
            toSkip-=16;
            outImageInformation.PhotoshopInformationExists = true;
            outImageInformation.PhotoshopXDensity = GetIntValue(mReadBuffer) + GetFractValue(mReadBuffer + 2);
            outImageInformation.PhotoshopYDensity = GetIntValue(mReadBuffer + 8) + GetFractValue(mReadBuffer + 10);
        }
    }
    if(PDFHummus::eSuccess == status)
        SkipStream(toSkip);
}while(false);
outPhotoshopDataOK = !resolutionBimNotFound;
return status;

}

What do you think?

Attached is the problematic JPG.
32degrees_2

Thanks,
Hadas

Embedded LinuxLibertine OpenType fonts come out totally messed up

I was trying to embed some otf fonts i.e. Linux Libertine otf fonts (I got them form http://www.linuxlibertine.org/ )
Since font looks ok elsewhere I'm assuming something goes wrong with font embedding
Here is actuall Hello World otuput I get with LinLibertine_R.otf font:

linlibertine_r otf

Looks like curve control points got treated like actual points or cubic vs quad curves are messed up while converting path or something like that happens.

Linux build Error Fix patch

Hi galkahana, first of all congratulations on the superb job you all have been doing creating this tool.

I try PDF-Writer use Linux (Fedora 18 x86_64) but build error.

typo fix & Linux is case-sensitive directories.
and gcc 4.3 Header dependency cleanup. http://gcc.gnu.org/gcc-4.3/porting_to.html

nothing file xobjectContentContext.h .

cp ../PDFWriter/XObjectContentContext.h ../PDFWriterTestPlayground/

please fix patch.

diff -ur PDF-Writer/CMakeLists.txt PDF-Writer_new/CMakeLists.txt
--- PDF-Writer/CMakeLists.txt   2013-04-09 01:33:23.976952997 +0900
+++ PDF-Writer_new/CMakeLists.txt   2013-04-08 23:54:35.643684419 +0900
@@ -4,7 +4,7 @@
 if(NOT PDFHUMMUS_NO_DCT)
    ADD_SUBDIRECTORY(LibJpeg)
 endif(NOT PDFHUMMUS_NO_DCT)
-ADD_SUBDIRECTORY(Zlib)
+   ADD_SUBDIRECTORY(ZLib)
 if(NOT PDFHUMMUS_NO_TIFF)
    ADD_SUBDIRECTORY(LibTiff)
 endif(NOT PDFHUMMUS_NO_TIFF)
diff -ur PDF-Writer/FreeType/CMakeLists.txt PDF-Writer_new/FreeType/CMakeLists.txt
--- PDF-Writer/FreeType/CMakeLists.txt  2013-04-09 01:33:23.977952981 +0900
+++ PDF-Writer_new/FreeType/CMakeLists.txt  2013-04-08 23:54:35.618684812 +0900
@@ -49,7 +49,7 @@
 src/base/ftglyph.c
 src/gzip/ftgzip.c
 src/base/ftinit.c
-src/lzW/ftlzw.c
+src/lzw/ftlzw.c
 src/base/ftstroke.c
 src/base/ftsystem.c
 src/smooth/smooth.c
@@ -61,4 +61,4 @@
 include/freetype/config/ftoption.h
 include/freetype/config/ftstdlib.h
 include/ft2build.h
-)
\ No newline at end of file
+)
diff -ur PDF-Writer/PDFWriter/AbstractWrittenFont.cpp PDF-Writer_new/PDFWriter/AbstractWrittenFont.cpp
--- PDF-Writer/PDFWriter/AbstractWrittenFont.cpp    2013-04-09 01:33:24.053951779 +0900
+++ PDF-Writer_new/PDFWriter/AbstractWrittenFont.cpp    2013-04-08 23:54:35.610684938 +0900
@@ -20,7 +20,7 @@
 */
 #include "AbstractWrittenFont.h"
 #include "ObjectsContext.h"
-#include "InDirectObjectsReferenceRegistry.h"
+#include "IndirectObjectsReferenceRegistry.h"
 #include "Trace.h"
 #include "DictionaryContext.h"
 #include "PDFParser.h"
@@ -485,4 +485,4 @@
        item = it.GetItem();
        inGlyphEncodingInfo.mUnicodeCharacters.push_back((unsigned long)item->GetValue());
    }
-}
\ No newline at end of file
+}
diff -ur PDF-Writer/PDFWriter/CFFFileInput.h PDF-Writer_new/PDFWriter/CFFFileInput.h
--- PDF-Writer/PDFWriter/CFFFileInput.h 2013-04-09 01:33:24.055951748 +0900
+++ PDF-Writer_new/PDFWriter/CFFFileInput.h 2013-04-08 23:54:35.608684971 +0900
@@ -25,6 +25,8 @@
 #include "CFFPrimitiveReader.h"
 #include "IType2InterpreterImplementation.h"

+#include <string.h>
+
 #include <string>
 #include <list>
 #include <map>
diff -ur PDF-Writer/PDFWriter/InputAscii85DecodeStream.cpp PDF-Writer_new/PDFWriter/InputAscii85DecodeStream.cpp
--- PDF-Writer/PDFWriter/InputAscii85DecodeStream.cpp   2013-04-09 01:33:24.060951668 +0900
+++ PDF-Writer_new/PDFWriter/InputAscii85DecodeStream.cpp   2013-04-08 23:54:35.577685458 +0900
@@ -20,6 +20,8 @@
 */
 #include "InputAscii85DecodeStream.h"

+#include <string.h>
+
 #include <algorithm>

 using namespace IOBasicTypes;
@@ -145,4 +147,4 @@
        }

    }
-}
\ No newline at end of file
+}
diff -ur PDF-Writer/PDFWriter/InputDCTDecodeStream.cpp PDF-Writer_new/PDFWriter/InputDCTDecodeStream.cpp
--- PDF-Writer/PDFWriter/InputDCTDecodeStream.cpp   2013-04-09 01:33:24.060951668 +0900
+++ PDF-Writer_new/PDFWriter/InputDCTDecodeStream.cpp   2013-04-08 23:54:35.604685032 +0900
@@ -21,6 +21,8 @@
 #include "InputDCTDecodeStream.h"
 #include "Trace.h"

+#include <string.h>
+
 #ifndef PDFHUMMUS_NO_DCT

 using namespace IOBasicTypes;
diff -ur PDF-Writer/PDFWriter/MD5Generator.cpp PDF-Writer_new/PDFWriter/MD5Generator.cpp
--- PDF-Writer/PDFWriter/MD5Generator.cpp   2013-04-09 01:33:24.062951637 +0900
+++ PDF-Writer_new/PDFWriter/MD5Generator.cpp   2013-04-08 23:54:35.607684987 +0900
@@ -67,6 +67,8 @@
 #include "OutputStringBufferStream.h"
 #include "SafeBufferMacrosDefs.h"

+#include <string.h>
+
 using namespace IOBasicTypes;
 using namespace PDFHummus;

diff -ur PDF-Writer/PDFWriter/PDFWriter.h PDF-Writer_new/PDFWriter/PDFWriter.h
--- PDF-Writer/PDFWriter/PDFWriter.h    2013-04-09 01:33:24.068951542 +0900
+++ PDF-Writer_new/PDFWriter/PDFWriter.h    2013-04-08 23:54:35.612684906 +0900
@@ -30,7 +30,7 @@
 #include "DocumentContext.h"
 #include "ObjectsContext.h"
 #include "PDFRectangle.h"
-#include "TIFFUsageParameters.h"
+#include "TiffUsageParameters.h"
 #include "PDFEmbedParameterTypes.h"

 #include <string>
diff -ur PDF-Writer/PDFWriter/PrimitiveObjectsWriter.h PDF-Writer_new/PDFWriter/PrimitiveObjectsWriter.h
--- PDF-Writer/PDFWriter/PrimitiveObjectsWriter.h   2013-04-09 01:33:24.069951526 +0900
+++ PDF-Writer_new/PDFWriter/PrimitiveObjectsWriter.h   2013-04-08 23:54:35.606685003 +0900
@@ -21,6 +21,7 @@
 #pragma once

 #include "ETokenSeparator.h"
+#include <string.h>
 #include <string>


diff -ur PDF-Writer/PDFWriter/Trace.h PDF-Writer_new/PDFWriter/Trace.h
--- PDF-Writer/PDFWriter/Trace.h    2013-04-09 01:33:24.070951510 +0900
+++ PDF-Writer_new/PDFWriter/Trace.h    2013-04-08 23:54:35.605685017 +0900
@@ -21,6 +21,10 @@
 #pragma once
 #include "Singleton.h"

+#include <stdarg.h>
+
+#include <string.h>
+
 #include <string>


diff -ur PDF-Writer/PDFWriterTestPlayground/AppendingAndReading.h PDF-Writer_new/PDFWriterTestPlayground/AppendingAndReading.h
--- PDF-Writer/PDFWriterTestPlayground/AppendingAndReading.h    2013-04-09 01:33:24.074951447 +0900
+++ PDF-Writer_new/PDFWriterTestPlayground/AppendingAndReading.h    2013-04-08 23:54:35.640684466 +0900
@@ -22,6 +22,8 @@
 #pragma once
 #include "ITestUnit.h"

+#include <string.h>
+
 class AppendingAndReading : public ITestUnit
 {
 public:
diff -ur PDF-Writer/PDFWriterTestPlayground/FlateEncryptionTest.h PDF-Writer_new/PDFWriterTestPlayground/FlateEncryptionTest.h
--- PDF-Writer/PDFWriterTestPlayground/FlateEncryptionTest.h    2013-04-09 01:33:24.075951431 +0900
+++ PDF-Writer_new/PDFWriterTestPlayground/FlateEncryptionTest.h    2013-04-08 23:54:35.641684450 +0900
@@ -20,6 +20,7 @@
 */
 #pragma once

+#include <string.h>
 #include "TestsRunner.h"

 class FlateEncryptionTest : public ITestUnit
diff -ur PDF-Writer/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.cpp PDF-Writer_new/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.cpp
--- PDF-Writer/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.cpp   2013-04-09 01:33:24.075951431 +0900
+++ PDF-Writer_new/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.cpp   2013-04-08 23:54:35.642684434 +0900
@@ -28,7 +28,7 @@
 #include "ProcsetResourcesConstants.h"
 #include "ObjectsContext.h"
 #include "IndirectObjectsReferenceRegistry.h"
-#include "xobjectContentContext.h"
+#include "XObjectContentContext.h"

 #include <iostream>

diff -ur PDF-Writer/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.h PDF-Writer_new/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.h
--- PDF-Writer/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.h 2013-04-09 01:33:24.075951431 +0900
+++ PDF-Writer_new/PDFWriterTestPlayground/ImagesAndFormsForwardReferenceTest.h 2013-04-08 23:54:35.641684450 +0900
@@ -20,6 +20,7 @@
 */
 #pragma once

+#include <string.h>
 #include "ITestUnit.h"

 class ImagesAndFormsForwardReferenceTest: public ITestUnit
diff -ur PDF-Writer/PDFWriterTestPlayground/TestsRunner.h PDF-Writer_new/PDFWriterTestPlayground/TestsRunner.h
--- PDF-Writer/PDFWriterTestPlayground/TestsRunner.h    2013-04-09 01:33:24.079951368 +0900
+++ PDF-Writer_new/PDFWriterTestPlayground/TestsRunner.h    2013-04-08 23:54:35.640684466 +0900
@@ -25,6 +25,8 @@
 #include "Singleton.h"
 #include "FileURL.h"

+#include <string.h>
+
 #include <string>
 #include <list>
 #include <utility>
Only in PDF-Writer_new/PDFWriterTestPlayground: XObjectContentContext.h

my environment gcc version

gcc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.7.2/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --disable-build-with-cxx --disable-build-poststage1-with-cxx --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC)

Problematic sanity check in OpenTypeFileInput.cpp OpenTypeFileInput::ReadOpenTypeSFNTFromDfont() leads to problems embedding .dfont fonts on El Capitan

There is an sanity check in OpenTypeFileInput::ReadOpenTypeSFNTFromDfont() that results in failures embedding some fonts that are packaged as .dfont files on Mac OS X 10.11 El Capitan.

I believe there has been a misreading of the Inside Mac documentation for the resource fork format, where the space reserved in the resource map header on disk for a copy of the resource file header when in memory is assumed to actually be a bitwise copy of the resource file header on disk. the diagram of the format of the resource map in Inside Mac labels this field as "Reserved for a copy of the resource header", and the text actually says, "After reading the resource map into memory, the Resource Manager stores the indicated information in the reserved areas at the beginning of the map."

{
    // check that the two headers match

    int allzeros = 1, allmatch = 1;
    for (int i = 0; i < 16; ++i )
    {
        if ( head[i] != 0 ) allzeros = 0;
        if ( head2[i] != head[i] ) allmatch = 0;
    }
    if ( !allzeros && !allmatch ) return PDFHummus::eFailure;
}

For example, the data-fork-based resource file /System/Library/Fonts/Geneva.dfont does happen to pass this sanity check on Yosemite, but the version that comes with El Capitan does not. And this means that an attempt to embed the Geneva font will fail on a stock El Capitan system because the reader bails since the resource file header does not match whatever bytes happen to be in the space reserved for the in-memory copy

Removing this check lets us successfully embed the Geneva font (and some other fonts housed in .dfont files) on Mac OS X 10.11 El Capitan.

There is also another sanity check which may be problematic...

        if ( rdata_pos + rdata_len != map_pos || map_pos == 0 ) {
            return PDFHummus::eFailure;
        }

The documentation in Inside Mac does not guarantee that the resource map is exactly positioned immediately after the resource data (i.e. it is not guaranteed by the documentation that rdata_post+rdata_len == map_pos). Having two separate offset fields allows the data structures to appear in any order on disk, and with any number of (possibly non-zero) bytes padding the gap between them. For example, a quick way for the Resource Manager to update the resource map for a resource file on disk might be to simply append a brand new resource map to the end of the file, and then update the header to point to the new one, leaving the old one in place.

Build tools cleanup and python bindings

First off I want to say that I'm really liking the library you've got here @galkahana

I have two requirements for your library (and some initial it looks like your library can do it).

  • Create a new PDF, embed a page from an existing PDF, write text in arbitrary locations on the page
  • Create a new PDF, embed an image, write text in arbitrary locations on the page

The thing is that I'd like to do this in python so I'm going to write some bindings for this library in python. I'd like to confirm you'd be okay with that (and that you have no immediate plans yourself).

The second thing is after reading through some of the code and understanding how things are working here... would you be opposed to me submitting a pull request to clean up the build process for this library?

The primary improvement being relying on development headers available in the environment rather than having libtiff, libjpeg, etc. included in this repository. As I'd like to use the latest libtiff and some API changes have happened since ~2 years ago.

Including SVG graphics in PDF

Is there a way to incorporate vector graphics like .svg file using hummus library (similar to the way to including jpeg or tiff images)?

Adding 'print' javascript to an existing PDF file

Hi Gal,
Very nice project, thanks!

Can you please guide me what's the best and easiest way to make an existing PDF file to be 'auto-printed' when it's downloaded to a browser?

Thanks a lot!

Amit

Type1Input::ParseSubrs() makes assumptions about format of /Subrs dictionary entries that can lead to crashes with older Type 1 fonts

The code in Type1Input::ParseSubrs() assumes that each entry in the /Subrs dictionary uses the NP and ND shortcuts, and this can lead to a crash.

The assumption is that each entry will look like this in all Type 1 fonts:

dup index numBytes RD [numBytes of binary data] NP

However, NP (and ND) are procedures that are locally defined in the Type 1 font as abbreviations to save space in the font file...

/ND { noaccess def } executeonly def
/NP { noaccess put } executeonly def

Most fonts do use them, but some older fonts do not, using the full commands noaccess, put, and def in place. For example, ...

So, instead of...

/Subrs 115 array
dup 0 15 RD 15bytes~ NP
dup 1 9 RD 9bytes~ NP
:
:
ND

an older font might have...

/Subrs 115 array
dup 0 15 RD 15bytes~ noaccess put
dup 1 9 RD 9bytes~ noaccess put
:
:
noaccess def

In the code for Type1Input::ParseSubrs(), after reading the binary bytes of the first entry, it does exactly two further calls to mPFMDecoder.GetNextToken(), expecting to eat up the NP token then either the dup token or the ND token to be ready to read the key (subrIndex) of the next dictionary entry or be finished with the dictionary.

When presented with a font that does not use NP or ND, the next token after reading the first entry is now the dup in the second entry rather than the key (subrIndex) for that entry. Further entries in the mSubrs array are now garbage, and at some point, a 0 CodeLength may used to create an empty Byte array on the heap.

mSubrs[subrIndex].CodeLength = Int(token.second);
mSubrs[subrIndex].Code = new Byte[mSubrs[subrIndex].CodeLength];

This can lead to a crash later in Type1Input::FreeTables(), when the mSubrs array is cleaned up.

for(long i=0;i<mSubrsCount;++i)
    delete[] mSubrs[i].Code;

We don't have code for a general solution (since essentially, this is a problem of the fact that the Type 1 font is actually a Postscript program), but what we have come up with replaces the two of the calls to mPFBDecoder.GetNextToken() in Type1Input::ParseSubrs() :

So that...

// skip NP token
mPFBDecoder.GetNextToken();

// skip dup or end array definition
mPFBDecoder.GetNextToken();

... is replaced with ...

while ( token.first )
{
    token = mPFBDecoder.GetNextToken();
    if ( 0 == token.second.compare("dup") )
        break;
    if ( 0 == token.second.compare("ND") )
        break;
    if ( 0 == token.second.compare("def") )
        break;
}

That handles the cases where the font uses the NP and ND shortcuts, or if they use noaccess, put, and def directly.

Section 2.4 of the Type 1 specification also notes that some fonts can also use other names defined in userdict, or -|, |-, and | defined in the Private dictionary.

LC_NUMERIC may use decimal comma, thousand separator etc..

locale settings affect sprintf, although pdf numbers should always use c locale.

I noticed it by MediaBox:
/MediaBox [ 0 0 595,2 841,68 ] (comma as decimal point, as in most European locale)

after writing this to a PDF, parsing it back fails.

fix could be in PrimitiveObjectsWriter.cpp:

#include <sstream>
#include <locale>
....
void PrimitiveObjectsWriter::WriteInteger(long long inIntegerToken,ETokenSeparator inSeparate)
{
	
	std::stringstream formatter;
	formatter.imbue(std::locale("C"));
	formatter << inIntegerToken;
	std::string formatter_buf = formatter.str();

	mStreamForWriting->Write((const IOBasicTypes::Byte *)formatter_buf.data(), formatter_buf.size());

	WriteTokenSeparator(inSeparate);
}
...

void PrimitiveObjectsWriter::WriteDouble(double inDoubleToken,ETokenSeparator inSeparate)
{
	std::stringstream formatter;
	formatter.imbue(std::locale("C"));
	formatter << inDoubleToken;
	std::string formatter_buf = formatter.str();

	mStreamForWriting->Write((const IOBasicTypes::Byte *)formatter_buf.data(),formatter_buf.size());
	WriteTokenSeparator(inSeparate);
}

I'm tried this on aix/linux/msvc, and it fixed the issue.
Sorry, I can't create a pull request for it now.

How to get baseline to baseline distance (new line advance)?

Currently, I'm using freetype directly like so:

float font_size = 14.;
PDFUsedFont *font = pdfWriter.GetFontForFile("font.ttf");
FT_Face face = *font->GetFreeTypeFont();
float newline_height = font_size * face->height / face->units_per_EM;

But I'd rather use PDFUsedFont, and never mess with freetype (what if some other non-freetype format comes along, or what if this has some quirks which doesn't work with all fonts?). Is it possible to use only PDF-Writer objects to achieve this?

Also, PDFUsedFont::CalculateTextDimensions receives a long as font size, but Tf operator receives a double. Shouldn't these be consistent?

[Question] How to add LZWDecode support?

Failed in an invocation to copyingContext->AppendPDFPageFromPDF(pageIndex) --> calling PDFParser::CreateFilterForStream to parse a PDF page and failed: the log showed "PDFParser::CreateFilterForStream, supporting only flate decode and ascii 85 decode, failing".
Then, i found the PDF using LZWDecode.
It seemed this SDK doesn't support LZWDecode, am i right?
If so, how to add LZWDecode support by myself?
PS: if it's not easy to add LZWDecode filter support, how can i copy pages from one PDF to the other PDF without involving the unsupported filter support?

improprer "using namespace std" in header files

I'm trying to evaluate using PDF-Writer as backend to the TeXmacs (www.texmacs.org) scientific editor, however the fact that the PDF-Writer header fields contains "using namespace std" prevents me to include them into the TeXmacs sources since they do not rely on the C++ standard library and we have a different definition for the string class.

In general is a good idea to avoid using that declaration in header files:

http://www.cplusplus.com/forum/beginner/25538/

Btw, nice library! I look forward to be able to use it to write PDFs inside TeXmacs, we really need a good PDF writing library like that.

[Question] How to set the Font in a FreeText annotation ?

I tried to use your library to create a FreeText annotation (Subtype = FreeText).
So far everything went well, but to set the font, i followed the PDF spec to set
the "DA" field as FontName FontSize "Tf" Red Green Blue "rg"
In the spec, it said FontName should be the key (or name) to the Font dictionary.
Do you know to implement this by using your library?
I checked the PDFUsedFont class and i can load any font file, but how can i write the selected font to the Font dictionary for a "FreeText" annotation?
Thank you very much for creating the great library and helping answer the question!!

Typo in test in CFFFileInput::ReadEncoding()

On line 847 of CFFFileInput.cpp, in the function CFFFileInput::ReadEncoding(), there is a logic error. The code is intended to check if the high bit of an 8-bit byte is set, but the condition actually tested ((encodingFormat & 0x80) == 1) will always be false. That should probably read ((encodingFormat & 0x80) != 0), instead (or any number of ways to correctly test only that bit).
I'm sorry, but for a number of reasons, I can't fork the project and submit a pull request with a simple fix; right now, the best I can do is alert that there is an issue.

Enhance: Web capture

Hey guys, I was wondering if you guys are implementing this feature because i will be really happy to help.

Benchmarks?

The description mentions "high performance" but I don't see any benchmarks or benchmark results anywhere.

Which other node pdf modules/addons did you compare this addon with and how? What were the concrete results?

Extracting Text from PDF

First of all many thanks for writing such a fantastic pdf library in C++.
I saw an old discussion about this for JS but I am not sure if there is any C++ API to just extract basic text from a PDF, something like apache pdfbox textstripper.
I know this is naive but having a straight forward API would help for many scenarios where we just want to read specific elements.

Flate decode issue

Decoding the Flate encoded image stream in the attached PDF file doesn't seem to work. See sample code below. Am I doing something wrong?

#include <iostream>
#include <fstream>
#include <string>
#include "PDFHummus/PDFParser.h"
#include "PDFHummus/InputFile.h"
#include "PDFHummus/PDFStreamInput.h"
#include "PDFHummus/IByteReader.h"
#include "PDFHummus/EStatusCode.h"

using namespace std;
using namespace PDFHummus;

void decodeStream(char *path);

int main(int count, char* args[]) {
    if (count < 2) {
        cerr << "PDF file required" << endl;
        return 1;
    }

    if (count == 2) {
        decodeStream(args[1]);
    }

    return 0;
}

void decodeStream(char *path) {
    PDFParser parser;
    InputFile pdfFile;
    EStatusCode status = pdfFile.OpenFile(path);
    if(status == eSuccess) {
        status = parser.StartPDFParsing(pdfFile.GetInputStream());
        if(status == eSuccess) {
            // Parse image object
            PDFObject* streamObj = parser.ParseNewObject(7);
            if (streamObj != NULL
                && streamObj->GetType() == PDFObject::ePDFObjectStream) {
                PDFStreamInput* stream = ((PDFStreamInput*)streamObj);
                IByteReader* reader = parser.StartReadingFromStream(stream);
                if (!reader) {
                    cout << "Couldn't create reader\n";
                }

                Byte buffer[1000];
                LongBufferSizeType total = 0;
                while(reader->NotEnded()) {
                    LongBufferSizeType readAmount = reader->Read(buffer,1000);
                    total += readAmount;
                    cout << "Total read: " << total << "\n";
                }
            }
        }
    }
}

test1.pdf

Annotation

Hi,

You've made a amazing work. Are you planning to add support for annotations ?

Thanks and regards,

Page hierarchies/ToC support

Is there any way to write the table of contents for a document? I looked all over the docs and I couldn't find any specs.

[Question] How to re-order pages in a PDF ?

Thank Gal for the great SDK!

Did anyone know how to use this SDK for re-ordering pages in a PDF?
i.e. move the 2nd page to the 6th page

I already checked the PDF modification sample codes, but they are all about modifying (or adding / deleting) the contents.
There were no sample codes to re-order pages.

Arabic text is not mapped to correct glyphs

I am trying to write simple Arabic text consisting of 3 consecutive characters of same Unicode code point (letter Ain U+0639, which can be represented by 4 glyphs depending on its position in the word).

pageContentContext->WriteText(50, 200, u8"ععع", textOptions);

Also tried to hard code the unicode text as utf-8 U+0639 -> \xD8\xB9

pageContentContext->WriteText(50, 100, "\xD8\xB9\xD8\xB9\xD8\xB9", textOptions);

But, the output in PDF is shown as: ﻉﻉﻉ
The correct output should be: ععع

Is Unicode to Glyph mapping is not working correctly or am I missing something here?

Type1Input::ParseEncoding() in Type1Input.cpp interprets /NUL as a character name rather than ".notdef"

We came across an old Type 1 font with an /Encoding array populated explicitly (no /StandardEncoding, etc.), and the first entry was /NUL (for code 0). The code in Type1Input::ParseEncoding() simply adds "NUL" as the character name, and that leads to issues if code glyph 0 is actually used... Type1Input::CalculateDependenciesForCharIndex() and Type1Input::GetGlyphCharString() will probably not be able to find an entry for it in mCharStrings, leading to an error.

FreeTypeType1Wrapper::GetGlyphForUnicodeChar() does not actually get a glyph for a Unicode character

We convert to strings of UCS2 codepoints and call FreeTypeFaceWrapper::GetGlyphsForUnicodeText() regardless of the font, and the implementation of FreeTypeType1Wrapper::GetGlyphForUnicodeChar() wasn't actually getting the desired glyph for Type 1 fonts, nor was it reporting if the font did not have a glyph for the given character.
Our solution was to convert the UCS2 codepoint to a Postscript glyph name, check that the type 1 font provided a charstring for that glyph name, and then look up the glyph number for the glyph name in the font's private encoding. This let us use the same input to display text as we would for any TrueType or OpenType font, and also let us determine if we needed to switch to a different font (if the original font didn't provide the needed glyph).
Sorry, I don't have code to share at the moment. We created our map of UCS2 to Postscript glyph names using data from here: https://github.com/adobe-type-tools/agl-aglfn/

Making use of Base 14 fonts

How can I create a PDFUsedFont that references one of the Base 14 fonts in pdf?
The idea is to not embed the fonts, but still have a "deterministic" choice of fonts (quite obviously? :P)

Thanks!

PS: I tried using TfLow("Helvetica", 14); but then mupdf complains that it couldn't find font dictionary. I'm not knowledgeble in pdf specification at all, apart from what I read here.

Linking error

Ideas?

It seems to find some symbols in libPDFWriter.a, but unable to find others. Stumped.

Ld build/Debug/PDFWriterTestPlayground normal x86_64
cd /Users/jbierling/Downloads/Code/PDF-Writer-master/PDFWriterTestPlayground/PDFWriterTestPlayground
setenv MACOSX_DEPLOYMENT_TARGET 10.8
/Applications/Xcode5-DP5.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ -arch x86_64 -isysroot /Applications/Xcode5-DP5.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk -L/Users/jbierling/Downloads/Code/PDF-Writer-master/PDFWriterTestPlayground/PDFWriterTestPlayground/build/Debug -L/Users/jbierling/Downloads/Code/PDF-Writer-master/XCode/build/Debug -F/Users/jbierling/Downloads/Code/PDF-Writer-master/PDFWriterTestPlayground/PDFWriterTestPlayground/build/Debug -filelist /Users/jbierling/Downloads/Code/PDF-Writer-master/PDFWriterTestPlayground/PDFWriterTestPlayground/build/PDFWriterTestPlayground.build/Debug/PDFWriterTestPlayground.build/Objects-normal/x86_64/PDFWriterTestPlayground.LinkFileList -mmacosx-version-min=10.8 -stdlib=libc++ -lLibJpeg -lPDFWriter -lLibTiff -lz.1.2.5 -lstdc++.6.0.9 -lFreetype -Xlinker -dependency_info -Xlinker /Users/jbierling/Downloads/Code/PDF-Writer-master/PDFWriterTestPlayground/PDFWriterTestPlayground/build/PDFWriterTestPlayground.build/Debug/PDFWriterTestPlayground.build/Objects-normal/x86_64/PDFWriterTestPlayground_dependency_info.dat -o /Users/jbierling/Downloads/Code/PDF-Writer-master/PDFWriterTestPlayground/PDFWriterTestPlayground/build/Debug/PDFWriterTestPlayground

Undefined symbols for architecture x86_64:
"OutputFile::OpenFile(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, bool)", referenced from:
OpenTypeTest::SaveCharstringCode(TestConfiguration const&, unsigned short, unsigned short, CFFFileInput*) in OpenTypeTest.o

Digital Signatures

Hi,

I'm looking at using digital signatures in an application, specifically Node, so I'd be using HummusJS.

Is this a feature you have considered adding?

Is is maybe possible to do with the existing API? Maybe adding it to the stream manually?

Thank you!
Andrew

MinGW32 compilation

I am trying to build Hummus PDF Writer on MinGW32/WIndows 7 with gcc version 4.8.1.

Compiler complained about undefined sprintf_s, so I changed

#ifdef  WIN32

to

#if defined(WIN32) && defined(_MSC_VER)

in SafeBufferMacrosDefs.h.

And also, as there are no fseeko and ftello available on my MinGW32 system, I inserted

    #ifdef __MINGW32__
    #define fseeko fseeko64
    #define ftello ftello64
    #endif

in SafeBufferMacrosDefs.h.

With the above changes, I could successfully build Hummus PDF Writer and some sample
programs I made are working but I haven't tested extensively yet.

I am not sure if I am doing correct thing so I'd appreciate it if you could comment on this.
Thanks.

Type1ToCFFEmbeddedFontWriter::AddComponentGlyphs() uses the private encoding for dependent glyphs even for glyphs defined using the 'seac' operator

The 'seac' operator ("Standard Encoding Accented Character") is defined to specify dependent glyphs in the Adobe Standard Encoding rather than the Type 1 font's private encoding, but the implementation of Type1ToCFFEmbeddedFontWriter::AddComponentGlyphs() was only getting the glyph names according to the font's private encoding. In our case, we were getting glyph names defined in the font's private encoding dictionary, but the font did not actually have any charstrings for those glyph names. This caused the recursive call to AddComponentGlyphs() to fail on calling Type1Input::CalculateDependenciesForCharIndex(), and that led to the failure to embed the font, and ultimately the failure to complete the PDF. Or, no glyph would be shown at all in the output PDF.
For example, the charstring for "Aring" might use 'seac' with "A" and "ring" as dependent glyphs, specified in the Adobe Standard Encoding (where "ring" has the code point 0xCA). The font's private encoding might have a different name for the code point 0xCA, "eth"... if the font did not have a charstring for "eth", that would lead to the failure described above; and if the font did have a charstring for that character, the wrong glyph would be drawn.
Our solution was to get the encoded glyph name explicitly from the StandardEncoding object instead.

[Question] How to correctly skip a failed object copying?

I need to copy objects from one PDF to the other PDF, but sometimes some objects are not "valid" (problematic PDF file maybe generated by some buggy PDF apps), i.e. missing parent node, or missing any indirect object. In such cases, i want to skip the object copying, but when PDFDocumentCopyingContext::CopyObject return eFailure, it already allocated some object ID such that in the later pdfWriter.EndPDF() always returned eFailure due to unwritten objects in the xRefTable (failed in ObjectsContext::WriteXrefTable line: 204).
Is there any method in the library that i can call to roll-back the states before the failed CopyObject call?

About your library

HI, I would like to know that it can run on the IOS platform? Is there any compatibility issues? Can it be synthesized? For example, the picture is synthesized in a PDF file to form a new PDF file. Please reply, thank you!

Stack buffer overflow in PDFParser::ParseXrefFromXrefTable()

In PDFParser.cpp, in PDFParser::ParseXrefFromXrefTable(), there is a possibility of an attempt to read past the bounds of the 20-byte array for holding the xref entry.

There are four lines where a pointer to a part of this array (on the stack) is cast to (const char*) and implicitly converted into the std::string passed to the BoxingBaseWithRW<> constructor. The implicit construction of the std::string uses the constructor that only takes a single const char* parameter, and is intended to convert from NULL-terminated character strings; a different std::string constructor for constructing from byte buffers is probably more appropriate here.

I tried some changes to explicitly choose the byte buffer version of the std::string constructor here...

            if(currentObject < inXrefSize)
            {
                inXrefTable[currentObject].mObjectPosition = LongFilePositionTypeBox( std::string( (const char*)entry, 10 ) );
                inXrefTable[currentObject].mRivision = ULong( std::string( (const char*)(entry+11), 5 ) );
                inXrefTable[currentObject].mType = entry[17] == 'n' ? eXrefEntryExisting:eXrefEntryDelete;
            }
            ++currentObject;



            // now parse the section. 
            while(currentObject < firstNonSectionObject)
            {
                if(mStream->Read(entry,20) != 20)
                {
                    TRACE_LOG("PDFParser::ParseXref, failed to read xref entry");
                    status = PDFHummus::eFailure;
                    break;
                }
                if(currentObject < inXrefSize)
                {
                    inXrefTable[currentObject].mObjectPosition = LongFilePositionTypeBox( std::string( (const char*)entry, 10 ) );
                    inXrefTable[currentObject].mRivision = ULong( std::string( (const char*)(entry+11), 5 ) );
                    inXrefTable[currentObject].mType = entry[17] == 'n' ? eXrefEntryExisting:eXrefEntryDelete;
                }
                ++currentObject;
            }

JPEGImageHandler::GetImageDimensions() and pixel density

It's not all that clear what units are used for HummusImageInformation::imageWidth and HummusImageInformation::imageHeight... I assume pixels in the case of a bitmap?
I'm calling PDFWriter::GetImageDimensions("path.to.file") to get the image dimensions on a JPEG file that is 300ppi in the JFIF segment.
It looks like JPEGImageHandler::GetImageDimensions() imposes 72ppi on the returned measurement, and this doesn't jive with what's actually in the placed ImageXObject itself (SamplesWidth and SamplesHeight) when written by JPEGImageHandler::CreateAndWriteImageXObjectFromJPGInformation().

U3D support?

Hi Gal,
your job is very good, I looked for C++ library to create and manage PDF files, I found some library, but in my opinion only your and libharu library are good to use. I prefer your library, but libharu have support to U3D standard to show a CAD like canvas with javascript support.
Have you planned to support this feature?

Thank you

MergePDFPagesToPage the other way around?

L.S.,

I see pdfPage is written out first and then test.pdf.
This results in a pdf file that hides my page with the content in the file (a word export).

MergePDFPagesToPage(pdfPage, "c:\devel\test.pdf", singePageRange);

Could it also work the other way around?

ANSIFontWriter::WriteWidths crashes

ANSIFontWriter::WriteWidths crashes when called from CFFANSIFontWriter and no glyphs in the font have actually been used. Specifically, the result from mCharactersVector.begin() can't be dereferenced because it's empty in this case, so an exception is thrown.

I'm not sure if this issue is confined to ANSIFontWriter or if other types of fonts have an analogous issue.

Ideally, the font wouldn't be emitted at all since it's not actually used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.