Child of <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

OK, substitutions in Asciidoc headers are different: Special c

XML isn't supposed to work with HTML Character Entities. </blockquote

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Entities in Document header about asciidoctor-rfc HOT 10 CLOSED

metanorma commented on July 30, 2024

Entities in Document header

from asciidoctor-rfc.

Comments (10)

opoudjis commented on July 30, 2024

OK, substitutions in Asciidoc headers are different:

Special characters
Attributes
But not
Quotes
Replacements
Macros
Post Replacement

That means that & converts to & (special characters), but otherwise, HTML and XML entities are NOT recognised (replacements). So it is a characteristic of Asciidoc that & and &nsbp; cannot appear in the header.

The misrendering as &amp is fixed by replacing xml.area text with xml.area { |a| << text }

from asciidoctor-rfc.

ronaldtse commented on July 30, 2024

Oooh AsciiDoc characteristics. I wonder if a better defined format helps 😉

from asciidoctor-rfc.

opoudjis commented on July 30, 2024

smile One product at a time, Ronald!

Yeah, I understand the header was one of your major concerns; and the Asciidoc substitutions are idiosyncratic. (My solution to entities in attributes, btw, was to expand out all entities using HTMLEntities, and let Nokogiri reencode them on output.)

I would say in retrospect btw that, given how nasty Nokogiri XML is about entities, it was more trouble than it was worth to encode. the XML using Nokogiri (as opposed to validating it after the event).

One of the major pushes behind RFC2XML, I'm seeing from the RFC Format FAQ, is to permit non-ASCII characters in RFC. Dealing with HTML entities has resulted in me dealing with those too; the XML is now not in UTF-8 but ASCII, because who knows what you're going to find downstream; but non-ASCII is being encoded in entities, and we are now addressing that non-ASCII requirement safely.

Decimal not Hex entities, because that's what Nokogiri does out of the box. I am less of a Nokogiri fan now than I was six months ago...

from asciidoctor-rfc.

ronaldtse commented on July 30, 2024

I went through the code we have now and it's quite confusing how we switch back and forth between just "nokogiri" and "nokogiri-generated text to be inserted back to nokogiri".

Don't you think everything will be cleaner if we just stick to the plain "nokogiri"? 😉 That will help us take care the UTF-8 issues too.

from asciidoctor-rfc.

ronaldtse commented on July 30, 2024

On the other hand doesn't the entity issue stem from RFC XML's usage of it? XML isn't supposed to work with HTML Character Entities.

from asciidoctor-rfc.

opoudjis commented on July 30, 2024

XML isn't supposed to work with HTML Character Entities.

On the other hand doesn't the entity issue stem from RFC XML's usage of it? And yet, the v1 RFC XML documents had   all over them. And people will use HTML entities whether we want them to or not; now, at least, we can deal with them.

Paolo was migrating the code from text templating to nokogiri; the migration is probably not complete, and I can look at it. Again, I now think migrating to nokogiri was in fact a mistake, because of the hassles around entities.

I'm going to give priority still to the issues you found in #59.

from asciidoctor-rfc.

ronaldtse commented on July 30, 2024

Yes the Character Entity problem is a RFC XML problem. They should not have allowed HTML Entities inside XML. But in any case, we can still deal with them using Nokogiri.

I still think using Nokogiri was the correct way to go, since we're just writing Entities, not reading Entities. We just need to make sure when we write we generate Entities the RFC XML way and will only involve handling text nodes -- but we might not even need to do this?

In fact, I don't think XML2RFC relies on Character Entities -- in the #59 document I have gotten rid of all character entities, and the characters generated are identical to the original ones.

from asciidoctor-rfc.

opoudjis commented on July 30, 2024

Oh, the output will be the same. My concern is that, if we are making the tool widely available, we cannot guarantee that people won't use  , and I'd rather we not constrain it if we don't have to. In fact, the RFC XML spec doesn't say anything about HTML entities, and certainly doesn't rely on them; but if only because the v1 templates did use them, better safe than sorry.

The noko() routine is consistently treating the document fragments it builds as XHTML not XML. That is what takes care of reading entities. The outputting entities is taken care of by the XML encoding as ASCII; we can leave it as UTF-8, but even in 2017, I don't think it's safe to.

from asciidoctor-rfc.

ronaldtse commented on July 30, 2024

@opoudjis but people most likely won't use HTML Character Entities in the AsciiDoc format as input, right?

I don't think we should use the noko() routine but directly pass around the XML document model around to add nodes/attributes. The noko() routine is treating fragments as XHTML because that's what it was specified in our code.

We should also use UTF-8 for v3 output but only "US-ASCII" for v2 output. Only at the end we should call to_xml, once.

from asciidoctor-rfc.

opoudjis commented on July 30, 2024

Well, up to you. I made the noko() routine XHTML to deal with in the samples; it was XML before. I can pass the xml document model around, but the XTHTML/XML choice of dealing with HTML samples would still need to be made. So what you want is XML no XHTML; do not accept any HTML entities; and pass xml document model instead of using an external builder. Right?

This was @paolobrasolin 's framework, so I'd like to hear from him too.

from asciidoctor-rfc.

Entities in Document header about asciidoctor-rfc HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent