Comments (9)
The error occurs because the for loop generates figures with duplicated ids. I will see if I can come up with a fix.
from sablon.
@fernandobrito thanks for looking into this! You are definitely on the right track. The way sablon loops work is that they copy the contents for every iteration. If there are elements inside the loop that have to have unique properties, Word will most likely complain.
from sablon.
Any suggestions on how to fix it? I needed something quick for a client, so I'm just appending a unique number to each node inside the loop body that has an "id" attribute. It solved my issue with figures.
However, I'm not sure if it is always safe to update all nodes with an "id" attribute or if other types of nodes may have different unique ids but use some other name on the node attribute.
from sablon.
@fernandobrito I would assume id's are unique across the entire document but to really be sure you'd need to check the XML spec (5500 pages of light reading if you need something to do on a weekend). While gigantic it is very well organized and bookmarked so its not as bad as it seems. Another good reference for WordML and DrawingML: http://officeopenxml.com, they reference the 3rd edition XML spec but I think that is simply due to websites age.
Could you throw out a small excerpt of the corrupted version i.e. duplicated id's and a excerpt of your fixed version? Seeing what is actually going on will help me out a bit.
I think the best way to fix it is to add functionality that finds the max id value in use and increments it from that point forward. You'll need to check every element that has an id
attribute not just the drawing related ones. I'd track the value on the environment instance. You may need to check other files such as footers, headers, etc. as well. Depending on the implementation the 'next' id values could just be stored in a hash with a key matching the unique attribute name.
from sablon.
@stadelmanma thanks for the references.
I've done a quick search on the XML spec and it seems most figures/drawings have their unique ids described as an attribute either on the wp:docPr or on the wp:cNvPr tag. I couldn't really understand when which one is used.
The description for the id attribute (which should be a unique integer) is:
id (Unique Identifier)
Specifies a unique identifier for the current DrawingML object within the current
document. This ID can be used to assist in uniquely identifying this object so that it can
be referred to by other parts of the documentIf multiple objects within the same document share the same id attribute value, then the
document shall be considered non-conformant.
https://msdn.microsoft.com/en-us/library/documentformat.openxml.drawing.wordprocessing.docproperties(v=office.15).aspx
Here you can find a small example of a rectangle inside a loop: https://github.com/fernandobrito/sablon/blob/eebd895954b157e659fbf180ea8312b027a05a69/test/fixtures/xml/figure_loop.xml#L30. Line 30 has a docPr element with an id attribute which will get duplicated when sablon make copies of the loop body, corrupting the output file.
I've started playing around with your idea of finding the highest id and then assigning new ids for each loop iteration (on this fork), but I just realized I did a big mistake. I started doing my work on top of the images support PR :(. When I have more time I will cherry-pick my changes on top of senny/sablon master so you can take a look and maybe provide some feedback.
By the way, I've been using a lot a tool from Microsoft: https://www.microsoft.com/en-us/download/details.aspx?id=30425. It lets me generate diffs between OOXML files and also validate them. Is there anywhere where I should add it for future contributors? Perhaps on a wiki page on this repo?
from sablon.
@fernandobrito I completely missed the last bit of your comment about the MS tool. Sadly, I don't have access to a Windows computer to try it out.
Does it generate a diff like git showing you the line by line changes to each XML file in the document?
The validation part sounds extremely useful since MS Word tends to be pretty ambiguous when a docx gets corrupted. What kind of information does it generate when validating a document?
from sablon.
@stadelmanma I had to install Windows on a virtual machine in order to use it.
Yes, the tool provides diffs, such as:
, but I think you can use normal diff tools or web tools such as: https://www.corefiling.com/opensource/xmldiff/.
Maybe the biggest win is the validation feature. It shows which lines are causing problems:
That's how I found about the duplicated id issue on figures.
from sablon.
I think this is more easily fixable with the new DOM logic, I have a branch on my fork to work on it. All r:id
attributes should be unmodified because that will break relationships defined in a *.rels file.
Additional notes on any elements that use the id
attribute:
- Table cell identifier (17.4.66),
w:id
, only unique within the table itself, optional, any string value. Probably can be ignored. - Annotation identifier (17.13.4.2)
w:id
, applies to comments it says that if more than one comment has the same value for this attribute then only one of them is ignored.- This appears to be the same for all of the change tracking elements from 17.13.4 - 17.13.5. I think I should leave these id attributes alone.
- Changes tracking in a template doesn't make sense from a use case aside from providing the consumer with notes on what they may need to manually up date so I'm less worried about support this corner case.
- The
w:bookmarkStart
andw:bookmarkEnd
elements share a unique ID value.- If I copy and change one I need to ensure the corresponding end element also gets changed.
- If both elements aren't present in the repeated content then I should just drop the clones
- The
w:permStart
andw:permEnd
elements follow the same convention as bookmarks. - The last relevant id is in cNcPR (19.3.1.12) which we already know needs to be unique.
Side note there is a <w:id> element but we shouldn't need to worry about this one (17.5.2.18).
from sablon.
Implementation stages:
- Auto-increment the id attribute in the
wp:docPr
andwp:cNvPr
tags. I think I should find these tags for any namespace (https://stackoverflow.com/questions/4440451/how-to-ignore-namespaces-with-xpath)- I'll need to test what happens when shapes are in footnotes, endnotes, headers, etc.
-
Handle the bookmark tags and migrate this logic to perm tags- Several bookmarks with overlapping id's does not appear to corrupt the document. I will leave this to the end user to handle as bookmarks spanning a loop in a odd fashion shouldn't be the original intent
from sablon.
Related Issues (20)
- Can i access a specific item in an array? HOT 2
- Dynamic Table Columns HOT 3
- Adding table to html creates corrupted file HOT 3
- template members get automatically stripped HOT 3
- image: "auto" property to keep aspect ratio HOT 2
- HTML Table got inserted without width HOT 1
- Conditionals: allow string comparism HOT 4
- Remove empty pages on the generated docx HOT 4
- Issue with inserting a Hyperlink via HTML content. HOT 2
- Table header: repeat on subsequent pages HOT 1
- Use mail merge fields in LibreOffice Writer HOT 1
- nested tables support? HOT 5
- Insert new mail merge fields using sablon HOT 2
- How to iterate over values of a hash? HOT 1
- Insert Templates into one another HOT 2
- Is it possible to specify the Word "Table Style" with an HTML <table>?
- Re-using content with images on multiple renders HOT 5
- p is not a valid child element of div HOT 4
- Injecting text-align: justify into docx file
- Unicode control characters in inputs may break generation of documents HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sablon.