Giter Club home page Giter Club logo

Comments (9)

fernandobrito avatar fernandobrito commented on July 20, 2024

The error occurs because the for loop generates figures with duplicated ids. I will see if I can come up with a fix.

from sablon.

senny avatar senny commented on July 20, 2024

@fernandobrito thanks for looking into this! You are definitely on the right track. The way sablon loops work is that they copy the contents for every iteration. If there are elements inside the loop that have to have unique properties, Word will most likely complain.

from sablon.

fernandobrito avatar fernandobrito commented on July 20, 2024

Any suggestions on how to fix it? I needed something quick for a client, so I'm just appending a unique number to each node inside the loop body that has an "id" attribute. It solved my issue with figures.

However, I'm not sure if it is always safe to update all nodes with an "id" attribute or if other types of nodes may have different unique ids but use some other name on the node attribute.

fernandobrito@eebd895

from sablon.

stadelmanma avatar stadelmanma commented on July 20, 2024

@fernandobrito I would assume id's are unique across the entire document but to really be sure you'd need to check the XML spec (5500 pages of light reading if you need something to do on a weekend). While gigantic it is very well organized and bookmarked so its not as bad as it seems. Another good reference for WordML and DrawingML: http://officeopenxml.com, they reference the 3rd edition XML spec but I think that is simply due to websites age.

Could you throw out a small excerpt of the corrupted version i.e. duplicated id's and a excerpt of your fixed version? Seeing what is actually going on will help me out a bit.

I think the best way to fix it is to add functionality that finds the max id value in use and increments it from that point forward. You'll need to check every element that has an id attribute not just the drawing related ones. I'd track the value on the environment instance. You may need to check other files such as footers, headers, etc. as well. Depending on the implementation the 'next' id values could just be stored in a hash with a key matching the unique attribute name.

from sablon.

fernandobrito avatar fernandobrito commented on July 20, 2024

@stadelmanma thanks for the references.

I've done a quick search on the XML spec and it seems most figures/drawings have their unique ids described as an attribute either on the wp:docPr or on the wp:cNvPr tag. I couldn't really understand when which one is used.

The description for the id attribute (which should be a unique integer) is:

id (Unique Identifier)
Specifies a unique identifier for the current DrawingML object within the current
document. This ID can be used to assist in uniquely identifying this object so that it can
be referred to by other parts of the document

If multiple objects within the same document share the same id attribute value, then the
document shall be considered non-conformant.
https://msdn.microsoft.com/en-us/library/documentformat.openxml.drawing.wordprocessing.docproperties(v=office.15).aspx

Here you can find a small example of a rectangle inside a loop: https://github.com/fernandobrito/sablon/blob/eebd895954b157e659fbf180ea8312b027a05a69/test/fixtures/xml/figure_loop.xml#L30. Line 30 has a docPr element with an id attribute which will get duplicated when sablon make copies of the loop body, corrupting the output file.

I've started playing around with your idea of finding the highest id and then assigning new ids for each loop iteration (on this fork), but I just realized I did a big mistake. I started doing my work on top of the images support PR :(. When I have more time I will cherry-pick my changes on top of senny/sablon master so you can take a look and maybe provide some feedback.

By the way, I've been using a lot a tool from Microsoft: https://www.microsoft.com/en-us/download/details.aspx?id=30425. It lets me generate diffs between OOXML files and also validate them. Is there anywhere where I should add it for future contributors? Perhaps on a wiki page on this repo?

from sablon.

stadelmanma avatar stadelmanma commented on July 20, 2024

@fernandobrito I completely missed the last bit of your comment about the MS tool. Sadly, I don't have access to a Windows computer to try it out.

Does it generate a diff like git showing you the line by line changes to each XML file in the document?

The validation part sounds extremely useful since MS Word tends to be pretty ambiguous when a docx gets corrupted. What kind of information does it generate when validating a document?

from sablon.

fernandobrito avatar fernandobrito commented on July 20, 2024

@stadelmanma I had to install Windows on a virtual machine in order to use it.

Yes, the tool provides diffs, such as:
image, but I think you can use normal diff tools or web tools such as: https://www.corefiling.com/opensource/xmldiff/.

Maybe the biggest win is the validation feature. It shows which lines are causing problems:
image

That's how I found about the duplicated id issue on figures.

from sablon.

stadelmanma avatar stadelmanma commented on July 20, 2024

I think this is more easily fixable with the new DOM logic, I have a branch on my fork to work on it. All r:id attributes should be unmodified because that will break relationships defined in a *.rels file.

Additional notes on any elements that use the id attribute:

  • Table cell identifier (17.4.66), w:id, only unique within the table itself, optional, any string value. Probably can be ignored.
  • Annotation identifier (17.13.4.2) w:id, applies to comments it says that if more than one comment has the same value for this attribute then only one of them is ignored.
    • This appears to be the same for all of the change tracking elements from 17.13.4 - 17.13.5. I think I should leave these id attributes alone.
    • Changes tracking in a template doesn't make sense from a use case aside from providing the consumer with notes on what they may need to manually up date so I'm less worried about support this corner case.
  • The w:bookmarkStart and w:bookmarkEnd elements share a unique ID value.
    • If I copy and change one I need to ensure the corresponding end element also gets changed.
    • If both elements aren't present in the repeated content then I should just drop the clones
  • The w:permStart and w:permEnd elements follow the same convention as bookmarks.
  • The last relevant id is in cNcPR (19.3.1.12) which we already know needs to be unique.

Side note there is a <w:id> element but we shouldn't need to worry about this one (17.5.2.18).

from sablon.

stadelmanma avatar stadelmanma commented on July 20, 2024

Implementation stages:

  • Auto-increment the id attribute in the wp:docPr and wp:cNvPr tags. I think I should find these tags for any namespace (https://stackoverflow.com/questions/4440451/how-to-ignore-namespaces-with-xpath)
    • I'll need to test what happens when shapes are in footnotes, endnotes, headers, etc.
  • Handle the bookmark tags and migrate this logic to perm tags
    • Several bookmarks with overlapping id's does not appear to corrupt the document. I will leave this to the end user to handle as bookmarks spanning a loop in a odd fashion shouldn't be the original intent

from sablon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.