Comments (21)
i shared a file at this url: testpublic
it contains a PDF file with text using Segoe UI and part of text in arabic: the font Segoe UI is encoded with CIDSet (using your last fix commit). The file still raises the CIDSet requirement error with VeraPDF. (and no issue with hebrew or latin text).
i simply use this text to generate the PDF, using Segoe UI font on Windows 11:
"Segoe UI: للمصممين نص"
If i use my workaround by removing the CIDSet the PDF file passes VeraPDF validation.
from pdf-writer.
I reproduced this CIDSet PDF/A issue with "Yu Gothic" for instance on Windows:
but the text is still rendered well if i remove the CIDSet and passes without the CIDSet VeraPDF validation.
from pdf-writer.
How about this - please prep a pr and ill look into trying to understand where it might cause trouble.
from pdf-writer.
Also if you have an example that can recreate the issue maybe i can recreate it and figure out if theres something to do about the cid set to correct it
from pdf-writer.
Hi, you can reproduce by generating a pdf file on windows with PDF-Writer using text with font "Yu Gothic": then if you verify it with VeraPDF online (selecting PDF/A-2b comformance) it will show conformance errors including the same CIDSet error as described above.
Only this CIDSet error is still remaining for me: i fixed other conformance issues in my client application on top of PDF-Writer.
But note that removing CIDSet key and object in PDF-Writer code fixes this last PDF/A-2 conformance error and the rendering of text remains correct: according to PDF specification CIDSet is optional so it seems to be safe to just remove it: but i am not 100% sure so it is why i ask you if it is really safe or not to remove CIDSet in font descriptor ?
(cf jacques-quidu@6cf1030 in my fork of your repo)
I will try to send you tomorrow 2 samples of PDF generated by my customer application, one with the CIDSet and one without so you can compare, if you need it: rendering is exactly the same in my tests with or without CIDSet.
from pdf-writer.
id rather correct it
from pdf-writer.
or as CIDSet is optional according to PDF specification, maybe just providing a PDF option at creation in order to not add /CIDSet would be enough.
Because the requirement which raises the conformance error is not present in PDF/A-1, only starting with PDF/A-2 (Specification: ISO 19005-2:2011, Clause: 6.2.11.4) so the current CIDSet implementation is still legit in PDF 1.3-1.7 or PDF/A-1.
from pdf-writer.
the code intends to do what PDF/A-2 states. at least this -
"Specification: ISO 19005-2:2011, Clause: 6.2.11.4, Test number: 4
If the FontDescriptor dictionary of an embedded CID font contains a CIDSet stream, then it shall identify all CIDs which are present in the font program, regardless of whether a CID in the font is referenced or used by the PDF or not."
seems to be what the code does. whatever it is it seems like a bug then and i'd like to fix it.
you are free to do what you will in your fork.
from pdf-writer.
to answer your question RE whether it's safe to remove it, i can't speak for all usages, but im guessing if it's optional and renders well that it's fine to remove it. im not sure what possible side effects might happen, but on its face it seems rather safe. here's the note in the PDF specs:
CIDSet
stream
(Optional) A stream identifying which CIDs are present in the CIDFont file. If this entry is present, the CIDFont contains only a subset of the glyphs in the character collection defined by the CIDSystemInfo dictionary. If it is absent, the only indication of a CIDFont subset is the subset tag in the FontName entry (see Section 5.5.3, “Font Subsets”).
The stream’s data is organized as a table of bits indexed by CID. The bits should be stored in bytes with the high-order bit first. Each bit corresponds to a CID. The most significant bit of the first byte corresponds to CID 0, the next bit to CID 1, and so on.
from pdf-writer.
yes i read this in the PDF spec too so it is why i thought it would be safe just to remove it ;)
Also i checked with different fonts and same rendering with ou without CIDSet so it seems to be really safe.
Thanks again for this great C++ lib: i like its easy extensibility too.
from pdf-writer.
ok. i am able to recreate the problem and then also understand the problem and generate a working prototype.
i'll have to figure out how to combine it with the rest of the code, but i expect to be able to deliver a working solution with the CID set no later than the weekend.
from pdf-writer.
ok..figured i'll just go ahead and fix it.
Turns out my CIDSet implementation was nowhere near how it should be. omg. not for true type fonts (which Yu Gothic is) nor for otf fonts. figured out what's actually intended to be the implementation. corrected both cases, and it seems to make https://demo.verapdf.org/ complaints about CIDSet go away.
if you wanna test this, grab the code from master branch (or just change per what's in here - #217) and i think it'll fix the problem on your end too.
from pdf-writer.
Hi thanks for this quick fix:
i will integrate it in my fork asap and try it with my test documents.
Best regards,
Jacques.
from pdf-writer.
Unfortunately the PDF/A-2b conformance error for CIDSet encoding is still present with text in arabic:
it is not reproducible with latin or hebrew text otherwise (according to my tests).
Well i decided to keep the actual workaround (by removing CIDSet) in my fork as it is safe to remove it (i did not found any issue with my test documents without the CIDSet): the main need for my customer is to generate Factur-X or ZugFERD invoices which are based so on PDF/A-3.
And as removing CIDSet reduces also file size (even if very slightly) it is good too for electronic invoices (or for archiving with PDF/A).
from pdf-writer.
This is weird. Id live to be able to reproduce it. Tried arabic text didnt work. And im fairly sure the solution is good. Ok. Ill try a bit more or wait for recreation method from someone where the workaround isnt good enough. Thanks.
i suspect that hebrew and arabic in your example just dont create cidset (you can open the pdf file and look for the string CIDSet) because they don't generate a CID font. when introducing something like Japanese (or maybe Arabic in the font you are using) the CID font is created and with it a CID set. anyways. good that the workaround works for you.
from pdf-writer.
Thanks man!
and thanks for the workaround
from pdf-writer.
i added in testpublic
the PDF file version without CIDSet: same rendering so that with the other file but this file testArabicNoCIDSet.pdf passes VeraPDF validation for PDF/A-2b.
from pdf-writer.
oh my. i understand the problem. my solution does not account for something called dependent glyphs, which are used here. ok. i can add something for that. (at this point it's fine if you don't want to test the result haha. i understand that you got a good solution).
from pdf-writer.
This MR - #218 - should take care of this problem.
again, up to you if you want to verify it.
from pdf-writer.
Thanks for fixing again this issue:
but yes i will stick with my workaround for now which is safe also.
By the way, i found another issue related to copying context: when you append pages from another pdf using a pdf copying context, annotations are lost (like url links - or bookmarks links i implemented in client application using a similar code as for url links by using also pdf annotations): otherwise if you use ModifyPDF (with PDF incremental so), annotations are not lost. So for converting for instance PDF to PDF/A i use ModifyPDF when 4-bytes signature is correct in source file or stream (whick keeps annotations) and a copying context from the source file or stream and then append pages from source when 4-bytes signature is not correct (but with annotations being lost in this case). For now it is fine as i need conversion from PDF to PDF/A only for PDF files generated by the client application (by printing or exported directly) so in this case for now 4-bytes signature in original pdf is always correct and so i can use ModifyPDF which preserves annotations.
But just tell me if you intend to fix this issue too: but you can take your time as for now using ModifyPDF is fine for me ;)
I opened a separate issue for it: #219
from pdf-writer.
i guess it would need for copying context to copy annotations from source page to DocumentContext::mAnnotations in append pages code before writing page because the write page code assumes annotations to write for page are stored in DocumentContext::mAnnotations ?
from pdf-writer.
Related Issues (20)
- Can not modify a document by creating a new form XObject and using it in one of the pages HOT 3
- [Question] - pdf to image HOT 1
- Question about attachments HOT 2
- some example projects in wiki are missing HOT 2
- Streams objects writing problem HOT 2
- Add watermark to PDF HOT 7
- Missing lib.obj file HOT 3
- Android Build Workflow HOT 3
- annotations are lost with PDFDocumentCopyingContext::AppendPDFPageFromPDF HOT 3
- How to draw Bezier curves using PDF-Witer library? HOT 2
- Parse a screenplay into scene objects? HOT 2
- color emojis HOT 16
- Links are removed when documents are merged HOT 8
- Color inversion problem occurs when exporting images HOT 1
- infinite loop HOT 2
- Crash when WriteUsedFontsDefinitions HOT 17
- Publish to github releases without PDFWriterTesting HOT 4
- U3D support, 10 years later HOT 10
- `Segmentation fault (core dumped)` just for adding `PDFWriter pdfWriter` in the `h` file HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf-writer.