Thank Gal for the great SDK! Did anyone know how to use this SDK for

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[Question] How to re-order pages in a PDF ? about pdf-writer HOT 18 CLOSED

galkahana commented on July 20, 2024

[Question] How to re-order pages in a PDF ?

from pdf-writer.

Comments (18)

magicboker commented on July 20, 2024

Is it correct to only modify the page order in the "Kids" array in the root /Page node?

from pdf-writer.

galkahana commented on July 20, 2024

Yes. In essence this should do it. The pages array is exactly what determines the order of pages.

from pdf-writer.

magicboker commented on July 20, 2024

How to update a "PageTree" to the file after changing its mKidsNodes and mKidsIDs?
In the SDK, can I modify the PageTree without incremental update, but modify the original page tree object in the file directly instead?

PS: after getting this page-move function done, i would like to send back the modified files for contributing to this SDK.

from pdf-writer.

magicboker commented on July 20, 2024

After reading the page tree part in the PDF-1.7 spec, i realized that the page tree could be a balanced tree so it may not be appropriate to modify the node directly in the file without using incremental update.
Could you please give me some guide about how to modify the SDK to implement the page move function?

from pdf-writer.

galkahana commented on July 20, 2024

Hi @magicboker,
this note is to tell you that actually there's quite a simple solution to this...working on an example and will get back to you.

nice riddle :)

from pdf-writer.

galkahana commented on July 20, 2024

Hi,
To go about changing the order of pages you can use the PageTree implementation of hummus. here is an example.

first, get the modified file parser so you have access to page object IDs:

PDFParser& modifiedFileParser = pdfWriter.GetModifiedFileParser();

The next step is to add all the pages in the order that we want them. you can get all page object IDs from the parser, and add them in order using the CatalogInformation object directly -

CatalogInformation& catalogInformation = pdfWriter.GetDocumentContext().GetCatalogInformation();
IndirectObjectsReferenceRegistry& objectsRegistry = pdfWriter.GetObjectsContext().GetInDirectObjectsRegistry();

// add pages in any order that you want
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(1), objectsRegistry);
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(0), objectsRegistry);

This code simply changes the order of the first and second page. the AddPageToPageTree method is used in part when writing pages to add them to the page tree (you can check the code of DocumentContext::WritePage to see its usage). The method makes sure to keep the page tree balanced.

Now, we could finish here and go directly to pdfWriter.EndPDF();, however this would mean that we still have the old page tree. This will result in have the pages in the old order and then the pages in the new order. not good. need to get rid of the old page tree.

Thing is...that there isn't really a method to do that.

So i modified the library code a bit in the part that checks if there's an old page tree for a modified file, when writing the end of the PDF for a modified file. I made it check if the old page tree is a deleted obejct...which is a fair thing to check for. [you can check the latest commit for that]

Then i simply added to the same code a delete for this old page tree root, like this:

PDFObjectCastPtr<PDFIndirectObjectReference> catalogReference(modifiedFileParser.GetTrailer()->QueryDirectObject("Root"));
PDFObjectCastPtr<PDFDictionary> catalog(modifiedFileParser.ParseNewObject(catalogReference->mObjectID));
PDFObjectCastPtr<PDFIndirectObjectReference> pagesReference(catalog->QueryDirectObject("Pages"));
        objectsRegistry.DeleteObject(pagesReference->mObjectID);

Done. we got a solution.

So, to do your thing, get the code from git, so you have the new modification, and follow the example.

Hope it helps.
Gal.

from pdf-writer.

magicboker commented on July 20, 2024

Thank you very much for the solution!!
Another question:
Before moving a page from one page node to the other page node, if it inherits some properties from its parent page node, i.e. /Rotate 90 or something, after moving it to a different page node that has no such properties or different properties, i.e. /Rotate 180. Will the solution modify the page object by adding the lost properties from previous parent page node?
i.e. move page 2 to page 7 in the following example:

PS: refer to the figure 3.6, Inheritance of attributes in pdf_reference_1-7.pdf

from pdf-writer.

galkahana commented on July 20, 2024

This solution does not allow inheritance as is. You can modify the page objects where appropriate

from pdf-writer.

magicboker commented on July 20, 2024

If the original PDF has some page node with /Rotate properties, will i lose them after writing back the new page tree?

from pdf-writer.

galkahana commented on July 20, 2024

It does not effect the page object itself. Only the root and intermediate page tree nodes. Essentially it creates new ones

from pdf-writer.

magicboker commented on July 20, 2024

If i delete many pages in a PDF by calling DeleteObject, them will be marked as free in the xref table and the file size will not be reduced. Is it possible to truncate the deleted page objects with their contents such that the PDF will have a reasonable size after page deletion? i.e. a 20-page PDF has 10MB. After deleting 19 pages (only 1 page left), it would be unreasonable to users that it still has 10 MB.

from pdf-writer.

galkahana commented on July 20, 2024

No. You can only do this by recreating a new document and importing the pages to the new document.

from pdf-writer.

magicboker commented on July 20, 2024

To achieve that, am i right to call "AppendPDFPagesFromPDF" to import pages from one pdf to the other?

from pdf-writer.

galkahana commented on July 20, 2024

yes, this should be good. if you want more, then there are quite a few options here. you can read more in https://github.com/galkahana/PDF-Writer/wiki/PDF-Embedding.

from pdf-writer.

magicboker commented on July 20, 2024

I found a bug in your solution:

PDFObjectCastPtr<PDFIndirectObjectReference> catalogReference(modifiedFileParser.GetTrailer()-QueryDirectObject("Root"));
PDFObjectCastPtr<PDFDictionary> catalog(modifiedFileParser.ParseNewObject(catalogReference-mObjectID));
PDFObjectCastPtr<PDFIndirectObjectReference> pagesReference(catalog-QueryDirectObject("Pages"));
        objectsRegistry.DeleteObject(pagesReference->mObjectID);

The old page "Parent" still recorded the old Pages node. Need to update it in the EndPDF method!
In DocumentContext::FinalizeModifiedPDF function

if(originalDocumentPageTreeRoot.ObjectID != 0)
{
    finalPageRoot.ObjectID = WriteCombinedPageTree(inModifiedFileParser);
    finalPageRoot.GenerationNumber = 0;

    // check for error - may fail to write combined page tree if document is protected!
    if(finalPageRoot.ObjectID == 0)
    {
        status = eFailure;
        break;
    }
}
else
{
    WritePagesTree(); <---- Need to modify the "Parent" attribute of the first level pages' dictionary**
    PageTree* pageTreeRoot = mCatalogInformation.GetPageTreeRoot(mObjectsContext->GetInDirectObjectsRegistry());
    finalPageRoot.ObjectID = pageTreeRoot->GetID();
    finalPageRoot.GenerationNumber = 0;

}

from pdf-writer.

galkahana commented on July 20, 2024

righ right. strange how when i ran this, the pdf was looking fine.

let's fix this.

calling:
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(1), objectsRegistry);

returns a number. it's the object ID of the parent of the page.
the solution need to be complemented with rewriting the page object with the new parent.
something like this:

        ObjectIDType pageID =modifiedFileParser.GetPageObjectID(1);
       ObjectIDType newParent = catalogInformation.AddPageToPageTree(pageID, objectsRegistry);

        PDFObjectCastPtr<PDFDictionary> pageObject = copyingContext->GetSourceDocumentParser()->ParseNewObject(pageID);

        MapIterator<PDFNameToPDFObjectMap> pageObjectIt = pageObject->GetIterator();

        inPDFWriter->GetObjectsContext().StartModifiedIndirectObject(pageID);
        DictionaryContext* modifiedPageObject = inPDFWriter->GetObjectsContext().StartDictionary();

        while(pageObjectIt.MoveNext())
        {
            if(pageObjectIt.GetKey()->GetValue() != "Parent")
            {
                modifiedPageObject->WriteKey(pageObjectIt.GetKey()->GetValue());
                copyingContext->CopyDirectObjectAsIs(pageObjectIt.GetValue());
            }
        }   

        // write new parent
        modifiedPageObject->WriteKey("Parent")
        modifiedPageObject->WriteNewObjectReferenceValue(newParent);

        inPDFWriter->GetObjectsContext().EndDictionary(modifiedPageObject);
        inPDFWriter->GetObjectsContext().EndIndirectObject();

putting it in a nice method that gets the page index would be nice.

from pdf-writer.

magicboker commented on July 20, 2024

It works! thank you very much!

from pdf-writer.

galkahana commented on July 20, 2024

Wonderful :)

from pdf-writer.

[Question] How to re-order pages in a PDF ? about pdf-writer HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent