Comments (18)
Is it correct to only modify the page order in the "Kids" array in the root /Page node?
from pdf-writer.
Yes. In essence this should do it. The pages array is exactly what determines the order of pages.
from pdf-writer.
How to update a "PageTree" to the file after changing its mKidsNodes and mKidsIDs?
In the SDK, can I modify the PageTree without incremental update, but modify the original page tree object in the file directly instead?
PS: after getting this page-move function done, i would like to send back the modified files for contributing to this SDK.
from pdf-writer.
After reading the page tree part in the PDF-1.7 spec, i realized that the page tree could be a balanced tree so it may not be appropriate to modify the node directly in the file without using incremental update.
Could you please give me some guide about how to modify the SDK to implement the page move function?
from pdf-writer.
Hi @magicboker,
this note is to tell you that actually there's quite a simple solution to this...working on an example and will get back to you.
nice riddle :)
from pdf-writer.
Hi,
To go about changing the order of pages you can use the PageTree implementation of hummus. here is an example.
first, get the modified file parser so you have access to page object IDs:
PDFParser& modifiedFileParser = pdfWriter.GetModifiedFileParser();
The next step is to add all the pages in the order that we want them. you can get all page object IDs from the parser, and add them in order using the CatalogInformation
object directly -
CatalogInformation& catalogInformation = pdfWriter.GetDocumentContext().GetCatalogInformation();
IndirectObjectsReferenceRegistry& objectsRegistry = pdfWriter.GetObjectsContext().GetInDirectObjectsRegistry();
// add pages in any order that you want
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(1), objectsRegistry);
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(0), objectsRegistry);
This code simply changes the order of the first and second page. the AddPageToPageTree
method is used in part when writing pages to add them to the page tree (you can check the code of DocumentContext::WritePage
to see its usage). The method makes sure to keep the page tree balanced.
Now, we could finish here and go directly to pdfWriter.EndPDF();
, however this would mean that we still have the old page tree. This will result in have the pages in the old order and then the pages in the new order. not good. need to get rid of the old page tree.
Thing is...that there isn't really a method to do that.
So i modified the library code a bit in the part that checks if there's an old page tree for a modified file, when writing the end of the PDF for a modified file. I made it check if the old page tree is a deleted obejct...which is a fair thing to check for. [you can check the latest commit for that]
Then i simply added to the same code a delete for this old page tree root, like this:
PDFObjectCastPtr<PDFIndirectObjectReference> catalogReference(modifiedFileParser.GetTrailer()->QueryDirectObject("Root"));
PDFObjectCastPtr<PDFDictionary> catalog(modifiedFileParser.ParseNewObject(catalogReference->mObjectID));
PDFObjectCastPtr<PDFIndirectObjectReference> pagesReference(catalog->QueryDirectObject("Pages"));
objectsRegistry.DeleteObject(pagesReference->mObjectID);
Done. we got a solution.
So, to do your thing, get the code from git, so you have the new modification, and follow the example.
Hope it helps.
Gal.
from pdf-writer.
Thank you very much for the solution!!
Another question:
Before moving a page from one page node to the other page node, if it inherits some properties from its parent page node, i.e. /Rotate 90 or something, after moving it to a different page node that has no such properties or different properties, i.e. /Rotate 180. Will the solution modify the page object by adding the lost properties from previous parent page node?
i.e. move page 2 to page 7 in the following example:
PS: refer to the figure 3.6, Inheritance of attributes in pdf_reference_1-7.pdf
from pdf-writer.
This solution does not allow inheritance as is. You can modify the page objects where appropriate
from pdf-writer.
If the original PDF has some page node with /Rotate properties, will i lose them after writing back the new page tree?
from pdf-writer.
It does not effect the page object itself. Only the root and intermediate page tree nodes. Essentially it creates new ones
from pdf-writer.
If i delete many pages in a PDF by calling DeleteObject, them will be marked as free in the xref table and the file size will not be reduced. Is it possible to truncate the deleted page objects with their contents such that the PDF will have a reasonable size after page deletion? i.e. a 20-page PDF has 10MB. After deleting 19 pages (only 1 page left), it would be unreasonable to users that it still has 10 MB.
from pdf-writer.
No. You can only do this by recreating a new document and importing the pages to the new document.
from pdf-writer.
To achieve that, am i right to call "AppendPDFPagesFromPDF" to import pages from one pdf to the other?
from pdf-writer.
yes, this should be good. if you want more, then there are quite a few options here. you can read more in https://github.com/galkahana/PDF-Writer/wiki/PDF-Embedding.
from pdf-writer.
I found a bug in your solution:
PDFObjectCastPtr<PDFIndirectObjectReference> catalogReference(modifiedFileParser.GetTrailer()-QueryDirectObject("Root"));
PDFObjectCastPtr<PDFDictionary> catalog(modifiedFileParser.ParseNewObject(catalogReference-mObjectID));
PDFObjectCastPtr<PDFIndirectObjectReference> pagesReference(catalog-QueryDirectObject("Pages"));
objectsRegistry.DeleteObject(pagesReference->mObjectID);
The old page "Parent" still recorded the old Pages node. Need to update it in the EndPDF method!
In DocumentContext::FinalizeModifiedPDF function
if(originalDocumentPageTreeRoot.ObjectID != 0)
{
finalPageRoot.ObjectID = WriteCombinedPageTree(inModifiedFileParser);
finalPageRoot.GenerationNumber = 0;
// check for error - may fail to write combined page tree if document is protected!
if(finalPageRoot.ObjectID == 0)
{
status = eFailure;
break;
}
}
else
{
WritePagesTree(); <---- Need to modify the "Parent" attribute of the first level pages' dictionary**
PageTree* pageTreeRoot = mCatalogInformation.GetPageTreeRoot(mObjectsContext->GetInDirectObjectsRegistry());
finalPageRoot.ObjectID = pageTreeRoot->GetID();
finalPageRoot.GenerationNumber = 0;
}
from pdf-writer.
righ right. strange how when i ran this, the pdf was looking fine.
let's fix this.
calling:
catalogInformation.AddPageToPageTree(modifiedFileParser.GetPageObjectID(1), objectsRegistry);
returns a number. it's the object ID of the parent of the page.
the solution need to be complemented with rewriting the page object with the new parent.
something like this:
ObjectIDType pageID =modifiedFileParser.GetPageObjectID(1);
ObjectIDType newParent = catalogInformation.AddPageToPageTree(pageID, objectsRegistry);
PDFObjectCastPtr<PDFDictionary> pageObject = copyingContext->GetSourceDocumentParser()->ParseNewObject(pageID);
MapIterator<PDFNameToPDFObjectMap> pageObjectIt = pageObject->GetIterator();
inPDFWriter->GetObjectsContext().StartModifiedIndirectObject(pageID);
DictionaryContext* modifiedPageObject = inPDFWriter->GetObjectsContext().StartDictionary();
while(pageObjectIt.MoveNext())
{
if(pageObjectIt.GetKey()->GetValue() != "Parent")
{
modifiedPageObject->WriteKey(pageObjectIt.GetKey()->GetValue());
copyingContext->CopyDirectObjectAsIs(pageObjectIt.GetValue());
}
}
// write new parent
modifiedPageObject->WriteKey("Parent")
modifiedPageObject->WriteNewObjectReferenceValue(newParent);
inPDFWriter->GetObjectsContext().EndDictionary(modifiedPageObject);
inPDFWriter->GetObjectsContext().EndIndirectObject();
putting it in a nice method that gets the page index would be nice.
from pdf-writer.
It works! thank you very much!
from pdf-writer.
Wonderful :)
from pdf-writer.
Related Issues (20)
- Can not modify a document by creating a new form XObject and using it in one of the pages HOT 3
- [Question] - pdf to image HOT 1
- Question about attachments HOT 2
- some example projects in wiki are missing HOT 2
- Streams objects writing problem HOT 2
- Add watermark to PDF HOT 7
- Missing lib.obj file HOT 3
- Android Build Workflow HOT 3
- CIDSet encoding does not conform with ISO 19005-2:2011, ISO 19005-3:2012 (PDF/A-2b or PDF/A-3b) HOT 21
- annotations are lost with PDFDocumentCopyingContext::AppendPDFPageFromPDF HOT 3
- How to draw Bezier curves using PDF-Witer library? HOT 2
- Parse a screenplay into scene objects? HOT 2
- color emojis HOT 16
- Links are removed when documents are merged HOT 8
- Color inversion problem occurs when exporting images HOT 1
- infinite loop HOT 2
- Crash when WriteUsedFontsDefinitions HOT 17
- Publish to github releases without PDFWriterTesting HOT 4
- U3D support, 10 years later HOT 10
- `Segmentation fault (core dumped)` just for adding `PDFWriter pdfWriter` in the `h` file HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf-writer.