Giter Club home page Giter Club logo

Comments (18)

magicboker avatar magicboker commented on July 20, 2024

Is it correct to only modify the page order in the "Kids" array in the root /Page node?

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

Yes. In essence this should do it. The pages array is exactly what determines the order of pages.

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

How to update a "PageTree" to the file after changing its mKidsNodes and mKidsIDs?
In the SDK, can I modify the PageTree without incremental update, but modify the original page tree object in the file directly instead?

PS: after getting this page-move function done, i would like to send back the modified files for contributing to this SDK.

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

After reading the page tree part in the PDF-1.7 spec, i realized that the page tree could be a balanced tree so it may not be appropriate to modify the node directly in the file without using incremental update.
Could you please give me some guide about how to modify the SDK to implement the page move function?

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

Hi @magicboker,
this note is to tell you that actually there's quite a simple solution to this...working on an example and will get back to you.

nice riddle :)

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

Hi,
To go about changing the order of pages you can use the PageTree implementation of hummus. here is an example.

first, get the modified file parser so you have access to page object IDs:

PDFParser& modifiedFileParser = pdfWriter.GetModifiedFileParser();

The next step is to add all the pages in the order that we want them. you can get all page object IDs from the parser, and add them in order using the CatalogInformation object directly -

CatalogInformation& catalogInformation = pdfWriter.GetDocumentContext().GetCatalogInformation();
IndirectObjectsReferenceRegistry& objectsRegistry = pdfWriter.GetObjectsContext().GetInDirectObjectsRegistry();

// add pages in any order that you want
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(1), objectsRegistry);
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(0), objectsRegistry);

This code simply changes the order of the first and second page. the AddPageToPageTree method is used in part when writing pages to add them to the page tree (you can check the code of DocumentContext::WritePage to see its usage). The method makes sure to keep the page tree balanced.

Now, we could finish here and go directly to pdfWriter.EndPDF();, however this would mean that we still have the old page tree. This will result in have the pages in the old order and then the pages in the new order. not good. need to get rid of the old page tree.

Thing is...that there isn't really a method to do that.

So i modified the library code a bit in the part that checks if there's an old page tree for a modified file, when writing the end of the PDF for a modified file. I made it check if the old page tree is a deleted obejct...which is a fair thing to check for. [you can check the latest commit for that]

Then i simply added to the same code a delete for this old page tree root, like this:

PDFObjectCastPtr<PDFIndirectObjectReference> catalogReference(modifiedFileParser.GetTrailer()->QueryDirectObject("Root"));
PDFObjectCastPtr<PDFDictionary> catalog(modifiedFileParser.ParseNewObject(catalogReference->mObjectID));
PDFObjectCastPtr<PDFIndirectObjectReference> pagesReference(catalog->QueryDirectObject("Pages"));
        objectsRegistry.DeleteObject(pagesReference->mObjectID);

Done. we got a solution.

So, to do your thing, get the code from git, so you have the new modification, and follow the example.

Hope it helps.
Gal.

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

Thank you very much for the solution!!
Another question:
Before moving a page from one page node to the other page node, if it inherits some properties from its parent page node, i.e. /Rotate 90 or something, after moving it to a different page node that has no such properties or different properties, i.e. /Rotate 180. Will the solution modify the page object by adding the lost properties from previous parent page node?
i.e. move page 2 to page 7 in the following example:
example
PS: refer to the figure 3.6, Inheritance of attributes in pdf_reference_1-7.pdf

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

This solution does not allow inheritance as is. You can modify the page objects where appropriate

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

If the original PDF has some page node with /Rotate properties, will i lose them after writing back the new page tree?

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

It does not effect the page object itself. Only the root and intermediate page tree nodes. Essentially it creates new ones

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

If i delete many pages in a PDF by calling DeleteObject, them will be marked as free in the xref table and the file size will not be reduced. Is it possible to truncate the deleted page objects with their contents such that the PDF will have a reasonable size after page deletion? i.e. a 20-page PDF has 10MB. After deleting 19 pages (only 1 page left), it would be unreasonable to users that it still has 10 MB.

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

No. You can only do this by recreating a new document and importing the pages to the new document.

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

To achieve that, am i right to call "AppendPDFPagesFromPDF" to import pages from one pdf to the other?

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

yes, this should be good. if you want more, then there are quite a few options here. you can read more in https://github.com/galkahana/PDF-Writer/wiki/PDF-Embedding.

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

I found a bug in your solution:

PDFObjectCastPtr<PDFIndirectObjectReference> catalogReference(modifiedFileParser.GetTrailer()-QueryDirectObject("Root"));
PDFObjectCastPtr<PDFDictionary> catalog(modifiedFileParser.ParseNewObject(catalogReference-mObjectID));
PDFObjectCastPtr<PDFIndirectObjectReference> pagesReference(catalog-QueryDirectObject("Pages"));
        objectsRegistry.DeleteObject(pagesReference->mObjectID);

The old page "Parent" still recorded the old Pages node. Need to update it in the EndPDF method!
In DocumentContext::FinalizeModifiedPDF function

if(originalDocumentPageTreeRoot.ObjectID != 0)
{
    finalPageRoot.ObjectID = WriteCombinedPageTree(inModifiedFileParser);
    finalPageRoot.GenerationNumber = 0;

    // check for error - may fail to write combined page tree if document is protected!
    if(finalPageRoot.ObjectID == 0)
    {
        status = eFailure;
        break;
    }
}
else
{
    WritePagesTree(); <---- Need to modify the "Parent" attribute of the first level pages' dictionary**
    PageTree* pageTreeRoot = mCatalogInformation.GetPageTreeRoot(mObjectsContext->GetInDirectObjectsRegistry());
    finalPageRoot.ObjectID = pageTreeRoot->GetID();
    finalPageRoot.GenerationNumber = 0;

}

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

righ right. strange how when i ran this, the pdf was looking fine.

let's fix this.

calling:
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(1), objectsRegistry);

returns a number. it's the object ID of the parent of the page.
the solution need to be complemented with rewriting the page object with the new parent.
something like this:

        ObjectIDType pageID =modifiedFileParser.GetPageObjectID(1);
       ObjectIDType newParent = catalogInformation.AddPageToPageTree(pageID, objectsRegistry);

        PDFObjectCastPtr<PDFDictionary> pageObject = copyingContext->GetSourceDocumentParser()->ParseNewObject(pageID);

        MapIterator<PDFNameToPDFObjectMap> pageObjectIt = pageObject->GetIterator();

        inPDFWriter->GetObjectsContext().StartModifiedIndirectObject(pageID);
        DictionaryContext* modifiedPageObject = inPDFWriter->GetObjectsContext().StartDictionary();

        while(pageObjectIt.MoveNext())
        {
            if(pageObjectIt.GetKey()->GetValue() != "Parent")
            {
                modifiedPageObject->WriteKey(pageObjectIt.GetKey()->GetValue());
                copyingContext->CopyDirectObjectAsIs(pageObjectIt.GetValue());
            }
        }   

        // write new parent
        modifiedPageObject->WriteKey("Parent")
        modifiedPageObject->WriteNewObjectReferenceValue(newParent);

        inPDFWriter->GetObjectsContext().EndDictionary(modifiedPageObject);
        inPDFWriter->GetObjectsContext().EndIndirectObject();

putting it in a nice method that gets the page index would be nice.

from pdf-writer.

magicboker avatar magicboker commented on July 20, 2024

It works! thank you very much!

from pdf-writer.

galkahana avatar galkahana commented on July 20, 2024

Wonderful :)

from pdf-writer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.