Giter Club home page Giter Club logo

Comments (10)

opoudjis avatar opoudjis commented on July 30, 2024

OK, substitutions in Asciidoc headers are different:

  • Special characters
  • Attributes
    But not
  • Quotes
  • Replacements
  • Macros
  • Post Replacement

That means that & converts to & (special characters), but otherwise, HTML and XML entities are NOT recognised (replacements). So it is a characteristic of Asciidoc that & and &nsbp; cannot appear in the header.

The misrendering as &amp;amp is fixed by replacing xml.area text with xml.area { |a| << text }

from asciidoctor-rfc.

ronaldtse avatar ronaldtse commented on July 30, 2024

Oooh AsciiDoc characteristics. I wonder if a better defined format helps 😉

from asciidoctor-rfc.

opoudjis avatar opoudjis commented on July 30, 2024

smile One product at a time, Ronald!

Yeah, I understand the header was one of your major concerns; and the Asciidoc substitutions are idiosyncratic. (My solution to entities in attributes, btw, was to expand out all entities using HTMLEntities, and let Nokogiri reencode them on output.)

I would say in retrospect btw that, given how nasty Nokogiri XML is about entities, it was more trouble than it was worth to encode. the XML using Nokogiri (as opposed to validating it after the event).

One of the major pushes behind RFC2XML, I'm seeing from the RFC Format FAQ, is to permit non-ASCII characters in RFC. Dealing with HTML entities has resulted in me dealing with those too; the XML is now not in UTF-8 but ASCII, because who knows what you're going to find downstream; but non-ASCII is being encoded in entities, and we are now addressing that non-ASCII requirement safely.

Decimal not Hex entities, because that's what Nokogiri does out of the box. I am less of a Nokogiri fan now than I was six months ago...

from asciidoctor-rfc.

ronaldtse avatar ronaldtse commented on July 30, 2024

I went through the code we have now and it's quite confusing how we switch back and forth between just "nokogiri" and "nokogiri-generated text to be inserted back to nokogiri".

Don't you think everything will be cleaner if we just stick to the plain "nokogiri"? 😉 That will help us take care the UTF-8 issues too.

from asciidoctor-rfc.

ronaldtse avatar ronaldtse commented on July 30, 2024

On the other hand doesn't the entity issue stem from RFC XML's usage of it? XML isn't supposed to work with HTML Character Entities.

from asciidoctor-rfc.

opoudjis avatar opoudjis commented on July 30, 2024

XML isn't supposed to work with HTML Character Entities.

On the other hand doesn't the entity issue stem from RFC XML's usage of it? And yet, the v1 RFC XML documents had &nbsp; all over them. And people will use HTML entities whether we want them to or not; now, at least, we can deal with them.

Paolo was migrating the code from text templating to nokogiri; the migration is probably not complete, and I can look at it. Again, I now think migrating to nokogiri was in fact a mistake, because of the hassles around entities.

I'm going to give priority still to the issues you found in #59.

from asciidoctor-rfc.

ronaldtse avatar ronaldtse commented on July 30, 2024

Yes the Character Entity problem is a RFC XML problem. They should not have allowed HTML Entities inside XML. But in any case, we can still deal with them using Nokogiri.

I still think using Nokogiri was the correct way to go, since we're just writing Entities, not reading Entities. We just need to make sure when we write we generate Entities the RFC XML way and will only involve handling text nodes -- but we might not even need to do this?

In fact, I don't think XML2RFC relies on Character Entities -- in the #59 document I have gotten rid of all character entities, and the characters generated are identical to the original ones.

from asciidoctor-rfc.

opoudjis avatar opoudjis commented on July 30, 2024

Oh, the output will be the same. My concern is that, if we are making the tool widely available, we cannot guarantee that people won't use &nbsp;, and I'd rather we not constrain it if we don't have to. In fact, the RFC XML spec doesn't say anything about HTML entities, and certainly doesn't rely on them; but if only because the v1 templates did use them, better safe than sorry.

The noko() routine is consistently treating the document fragments it builds as XHTML not XML. That is what takes care of reading entities. The outputting entities is taken care of by the XML encoding as ASCII; we can leave it as UTF-8, but even in 2017, I don't think it's safe to.

from asciidoctor-rfc.

ronaldtse avatar ronaldtse commented on July 30, 2024

@opoudjis but people most likely won't use HTML Character Entities in the AsciiDoc format as input, right?

I don't think we should use the noko() routine but directly pass around the XML document model around to add nodes/attributes. The noko() routine is treating fragments as XHTML because that's what it was specified in our code.

We should also use UTF-8 for v3 output but only "US-ASCII" for v2 output. Only at the end we should call to_xml, once.

from asciidoctor-rfc.

opoudjis avatar opoudjis commented on July 30, 2024

Well, up to you. I made the noko() routine XHTML to deal with   in the samples; it was XML before. I can pass the xml document model around, but the XTHTML/XML choice of dealing with HTML samples would still need to be made. So what you want is XML no XHTML; do not accept any HTML entities; and pass xml document model instead of using an external builder. Right?

This was @paolobrasolin 's framework, so I'd like to hear from him too.

from asciidoctor-rfc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.