r888888888 / dtext_rb Goto Github PK
View Code? Open in Web Editor NEWcompiled dtext parser extension
compiled dtext parser extension
Brought up by Type-kun on Danbooru 2 Issues thread:
Hah, found another thing with new dtext parser. h1 tag is parsed even if it's inline now: see forum 119616.
Forum post with issue: http://danbooru.donmai.us/forum_posts/119616
Replicated on Testbooru: http://testbooru.donmai.us/posts/11#comment-13
The @ symbol breaks the state machine when DText markers appear at the end of text, as it's looking for a word break for the mention so it skips over any closing markers such as style markers.
This issue is demonstrated at http://testbooru.donmai.us/comments/16
Not sure what should or shouldn't be allowed when @ appears at the beginning of a word, but if @ appears in any other location then it should probably not enter the mention state.
With the inclusion of headers with ID's, thing like creating Table of Contents are now possible.
Example:
http://danbooru.donmai.us/wiki_pages/37251
However, these hash links only work on the wiki pages themselves. If used from the Wiki section of the post search page instead, those hash links will take the user away from the post search page and to the wiki page itself, which would be an undesired behavior.
A solution would be to add in support for hash only links.
Example:
"Example text":[#dtext-link_to_below]
h1#link_to_below.Title Heading
...which would create...
<a href="#dtext-link_to_below">Example text</a>
<h1 id="dtext-link_to_below">Title Heading</h1>
"[b]foo[/b]":http://www.example.com
"[b]foo[/b]":[http://www.example.com]
The first form bolds the link text as expected. The second form does not.
The input "post #1234":http://example.com
produces a invalid nested link:
<p>
<a class="dtext-link dtext-external-link" href="http://example.com">
<a class="dtext-link dtext-id-link dtext-post-id-link" href="/posts/1234">post #1234</a>
</a>
</p>
The only kind of markup that should be allowed inside named links are the basic formatting tags: [b]
, [i]
, [s]
, [u]
.
This will help with being able to visually separate different elements, for example artist commentaries from different sources.
Not sure if this used to be a thing with the old DText parser, since the Help:Blacklists wiki had [code] blocks inline. For now, I've replaced those with quotation marks, but it would be nice to be able to do code styling inline.
Danbooru is still missing the recent updates to DText. dtext.c
needs to be regenerated to include the latest changes to dtext.rl
.
Does this file need to be in version control? It's a generated file after all. As long as one has ragel installed, it will be regenerated automatically as part of the build if it doesn't already exist.
The parser has several memory leaks:
The first leak is in the basic_wiki_link
/ aliased_wiki_link
parsers. When parsing e.g. [[Hatsune Miku]]
, we call g_utf8_strdown
to lowercase the tag, but g_utf8_strdown
allocates a string that is never freed.
The second leak is in the header_with_id
parser. When parsing e.g. h1#id. title
, a string is allocated for id
which is freed with g_string_free(id_name, false)
. Passing false
here is wrong; it causes g_string_free
to free only the GString
object, not the underlying char *
holding the actual string.
The third leak is in free_machine
. The stack
variable is freed with g_array_free(stack, FALSE)
, but again passing false here causes it to free only the GArray
struct, not the underlying array. Also, g_array_free
is not thread safe; the docs suggest using g_array_unref
instead.
The fourth leak is in parse_file
. A GOptionContext
is allocated but never freed.
The first two leaks are high severity. An attacker can consume all available memory by exploiting these leaks on a high-traffic post or wiki page.
The third leak causes memory to be leaked every time the parser is invoked, but only by a small amount. This is mitigated by the use of the Unicorn worker killer gem on Danbooru, which restarts worker processes every 5000-10000 requests.
The fourth leak only occurs when using cdtext
from the commandline, so it doesn't really matter.
The input [nodtext]foo
produces the output <p>foo</p></p>
.
This only applies to block [nodtext]
tags. Unclosed inline [nodtext]
tags work correctly: foo [nodtext][b]bar
produces <p>foo [b]bar</p>
.
I posted an example of this on Danbooru.
http://danbooru.donmai.us/forum_topics/9127?page=152#forum_post_125569
Basically, if the opening [nodtext] tag starts at the beginning of a line without any preceding characters, then the closing [/nodtext] tag will not reenable DText Parsing again.
Edit:
I'm guessing because an extra BLOCK_P is being pushed onto the stack from the main function (though not the inline function), and so when the nodtext function call goes to check the dstack, it sees a BLOCK_P instead of the BLOCK_NODTEXT.
Can be seen on the Danbooru wiki...
【http://www.example.com】
「http://www.example.com」
Links like the above are common in Pixiv commentaries. The closing brackets should not be included as part of the link.
More generally, most (if not all) closing punctuation characters should be treated as boundary characters. Certain other punctuation like the ideographic full stop (。
) should be too.
This fails:
[b][[[/b]kantai collection[b]]][/b]
Named links ("foo":http://bar.com
) get the dtext-link dtext-external-link
CSS classes. Bare links (http://bar.com
) don't.
In Markdown you can surround a link with angle brackets to delimit where it ends. This is useful for cases where trailing punctuation gets included in a link:
Lorem ipsum (dolor sit amet https://en.wikipedia.org/wiki/Orange_(fruit)). Consectetur elit.
DText wrongly generates https://en.wikipedia.org/wiki/Orange_(fruit)) here. Being able to say <https://en.wikipedia.org/wiki/Orange_(fruit)>
would prevent this.
The input [tn]blah[/tn]
produces the output <p class="tn">blah
. Omitting the </p>
tag is allowed in HTML5 under certain contexts (see "Tag omission" at MDN), so technically this works, but only by accident.
While attempting to run bundle install on the most recent Danbooru on Debian Wheezy, it fails with the following in the output:
make "DESTDIR="
compiling rb_dtext.c
compiling dtext.c
ext/dtext/dtext.rl: In function ‘main’:
ext/dtext/dtext.rl:1418:27: error: ‘G_OPTION_FLAG_NONE’ undeclared (first use in this function)
ext/dtext/dtext.rl:1418:27: note: each undeclared identifier is reported only once for each function it appears in
make: *** [dtext.o] Error 1make failed, exit code 2
Curiously, running gem install dtext_rb -v '1.4.4'
like bundler suggests, executes without any trouble.
Demonstrated on Danbooru...
http://danbooru.donmai.us/forum_topics/9127?page=152#forum_post_25578
Not sure what the proper behavior should be. I'm not in favor of it, but if this is the way it's supposed to work, then I'll just document it in the Wiki.
This fails:
[B]Bold Text[/B]
[b]Bold Text[/b]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.