exist-db / exist-markdown Goto Github PK
View Code? Open in Web Editor NEWMarkdown Parser in XQuery
License: Other
Markdown Parser in XQuery
License: Other
Markdown interleaved in HTML blocks was expected to work by the author of test.md.
Markdown interleaved in HTML blocks is mangled
See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L309-L340.
This test takes this markdown:
<div class="row">
<div class="col-md-6">
First column in **two column layout**.
Second paragraph.
</div>
<div class="col-md-6">
Second column in two column layout.
</div>
</div>
With this input, the markdown:parse()
function should return:
<body>
<div class="row">
<div class="col-md-6">
<p>First column in <strong>two column layout</strong>.</p>
<p>Second paragraph.</p>
</div>
<div class="col-md-6">
<p>Second column in two column layout.</p>
</div>
</div>
</body>
But it actually returns:
<body>
<div class="row">
<body/>
<div class="col-md-6">
<body>
<p>First column in two column layout.</p>
</body>
</div>
</div>
<p>Second paragraph. <div class="col-md-6"> Second column in two column layout. </div> </div></p>
</body>
Note that (1) an empty <body/>
element is inserted into the outer div, (2) the "Second paragraph" is ejected from the first inner div, and (3) the second inner div is inserted into the "Second paragraph" <p>
element.
Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):
<testcase name="HTML block containing markdown" class="tests:html-block-containing-markdown">
<failure message="assertTrue failed." type="failure-error-code-1"/>
<output>false</output>
</testcase>
Note that the Commonmark dingus at https://spec.commonmark.org/dingus/ also produces mangled output:
<div class="row">
<div class="col-md-6">
First column in **two column layout**.
<pre><code> Second paragraph.
</div>
<div class="col-md-6">
Second column in two column layout.
</div>
</code></pre>
</div>
This suggests that a Commonmark-compliant processor may not be expected to handle interleaved HTML blocks and Markdown.
This is a followup on #15
A few enhancements we should consider now or later:
drop .existdb.json
Put all necessary package metadata in a property app
, exist
or xar
in package.json.
npm packages are implicitly allowed to add their custom properties in package.json but have to take care themselves not to clash with names used by npm itself. I have use app
for other projects in the past.
add npm script to install the library (without the test application)
I usually use npm start
for that.
optimise GithubActions
adopt readOptionsFromEnv
To allow all npm and gulp scripts to target different existdb instances with ease.
A quite complete setup with all of the above can be found in a eeditiones/roaster#30 which is not yet merged.
Describe the bug
When loading the landing page (/main.md
), the generated HTML has a strange <h4039>
element inside the body/section
:
<body class="container">
<body>
<section>
<h4039># Supported Markdown syntax
Markdown within this element is not further processed or transformed into HTML.
Expected behavior
The page should contain valid HTML.
To Reproduce
Install app, load http://localhost:8080/exist/apps/markdown.
Context (please always complete the following information):
Additional context
conf.xml
? noneLike eXide and monex, this app is included in all default installations of eXist. To facilitate reporting issues related to it, it would be best, if possible, if the repository belonged to the eXist-db organization.
Related: wolfgangmm/eXide#144 and https://github.com/wolfgangmm/monex/issues/39.
Curly braces inside fenced code blocks should be left as literal curly braces.
Curly braces are replaced with a <span itemprop="">
element.
See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L223-L244.
This test takes this markdown:
```xquery
for $i in 1 to 10
return
<li>{$i * 2}</li>
```
With this input, the markdown:parse()
function should return:
<body>
<pre data-language="xquery">for $i in 1 to 10
return
<li>{$i * 2}</li>
</pre>
</body>
The Commonmark dingus at https://spec.commonmark.org/dingus/ returns something quite similar, so our expectations are inline with Commonmark:
<pre>
<code class="language-xquery">for $i in 1 to 10
return
<li>{$i * 2}</li>
</code>
</pre>
But it actually returns:
<body>
<pre data-language="xquery">for $i in 1 to 10
return
<li><span itemprop="$i * 2">$i * 2</span></li>
</pre>
</body>
Note that the curly braces are transformed into a <span itemprop="">
structure - which is associated with the library's handling of "label" at https://github.com/eXist-db/exist-markdown/blob/master/content/markdown.xqm#L119-L128.
Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):
<testcase name="Code Blocks" class="tests:code-blocks">
<failure message="assertTrue failed." type="failure-error-code-1"/>
<output>false</output>
</testcase>
Without a test suite, fixing bugs in this library's Markdown parser risks introducing new ones.
The CommonMark tests from https://github.com/commonmark/commonmark-spec would be a natural starting point., as CommonMark is:
a standard, unambiguous syntax specification for Markdown, along with a suite of comprehensive tests to validate Markdown implementations against this specification.
To get started, I cloned the commonmark-spec repository and extracted the tests as described in its README:
gh repo clone commonmark/commonmark-spec
cd commonmark-spec
python3 test/spec_tests.py --dump-tests > commonmark-tests.json
... and I uploaded these to /db/commonmark-tests.json
.
Then I developed the following query. Initially I got all errors or failures, but when I stripped out the trailing \n
newline from the test's source Markdown, I got 68 passes, 570 failures, and 14 errors.
Certainly, some of the failures are caused by whitespace differences, but without a function for parsing HTML in eXist-db (!), normalizing expected and actual outputs is not possible, and thus the test suite can't tell us whether a failure is a real problem or just a meaningless whitespace issue.
xquery version "3.1";
import module namespace markdown="http://exist-db.org/xquery/markdown";
let $tests := json-doc("/db/commonmark-tests.json")
let $results :=
for $test in $tests?*
let $markdown := $test?markdown
(: disregard trailing newline from the source test's expected output :)
let $expected-result := $test?html => replace("\n$", "")
let $actual-result :=
try {
(
(: the parse function wraps results in a <body> element :)
markdown:parse($markdown)/node()
! serialize(., map { "method": "html", "indent": true(), "html-version": 4.0 } )
)
=> string-join()
}
catch * {
map {
"error": "markdown parsing error raised at " || $err:line-number || ":" || $err:column-number
|| ": " || $err:description
}
}
return
map {
"expected-result": $expected-result,
"actual-result": $actual-result,
"status":
(
if ($actual-result instance of map(*)) then
"error"
else if (deep-equal($expected-result, $actual-result)) then
"pass"
else
"fail"
),
"source": $test
}
for $result in $results
group by $status := $result?status
order by index-of(("pass", "fail", "error"), $status)
return
map {
"status-group": $status,
"number-of-results": count($result),
"results": array { $result }
}
Researching and fixing the failing tests would be an extensive project. It would require developing an XQuery function for parsing HTML—or shifting development to BaseX, which has an HTML parsing module.
Alternatively, an XQuery wrapper around the https://github.com/commonmark/commonmark-java or https://github.com/vsch/flexmark-java project might be a better investment.
The markdown:parse()
function mangles XQuery source code contained in fenced code blocks.
For example, the following code...
xquery version "3.1";
import module namespace markdown="http://exist-db.org/xquery/markdown";
markdown:parse('# Code sample
This is a map containing two entries, one whose value is an array and another whose value is a string.
```xquery
xquery version "3.1";
map { "k1": array { "v1", "v2" }, "k2": "v3" }
```
This code should correctly render.
')
... returns the following HTML:
<body>
<section>
<h1>Code sample</h1>
<p>This is a map containing two entries, one whose value is an array and another whose value is a string.</p>
<pre data-language="xquery">xquery version "3.1";
map <span itemprop=" "k1"">array { "v1"</span>, <span itemprop=" "k1"">"v2" </span>, "k2": "v3" }
</pre>
<p>This code should correctly render.</p>
</section>
</body>
Effectively, it turns:
map { "k1": array { "v1", "v2" }, "k2": "v3" }
into:
map array { "v1", "v2" , "k2": "v3" }
This can be seen in https://exist-db.org/exist/apps/wiki/blogs/eXist/XQuery31 in the section titled "Serialization".
After starting up current eXist develop, loading the markdown app at http://localhost:8080/exist/apps/markdown/ redirects to http://localhost:8080/exist/apps/markdown/test.md, which yields the following error:
<exception>
<path>/db/apps/markdown/parse.xql</path>
<message>
err:XQST0033 error found while loading module md: Error while loading module content/markdown.xql: Cannot bind prefix 'md' to 'http://exist-db.org/xquery/markdown' it is already bound to 'http://exist-db.org/metadata'
</message>
</exception>
The key bit:
Cannot bind prefix 'md' to 'http://exist-db.org/xquery/markdown' it is already bound to 'http://exist-db.org/metadata'
The registration of this prefix appears to stretch back to 2012 - according to eXist-db/exist@c33a2fa - so it's very odd that we haven't seen this before!
I have tried this in Xidel, but the first paragraph is always missing
E.g. markdown:parse("xx")
becomes <body></body>
* a
* b
* c
becomes <body></body>
, too
But
a
b
c
becomes <body><p>b</p><p>c</p></body>
And
x
* a
* b
* c
becomes
<body><ul><li>
a
</li><li>
b
</li><li>
c
</li></ul></body>
Is this an issue with Xidel or the module? I had to replace xquery version "3.0";
with xquery version "3.1";
and util:parse-html
with x:parse-html
In inline HTML, inline elements like <mark>
should be preserved.
The elements are dropped from output.
See the pending test at https://github.com/eXist-db/exist-markdown/blob/master/test/xqs/test-suite.xqm#L346-L361.
This test takes this markdown:
A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.
With this input, the markdown:parse()
function should return:
<body>
<p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.</p>
</body>
The Commonmark dingus at https://spec.commonmark.org/dingus/ returns this exactly (sans the <body>
wrapper, which exist-markdown uses to ensure its results are well-formed, and which users of the library would normally omit from output):
<p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <mark>inline</mark> <code>HTML</code>.</p>
But it actually returns:
<body>
<p>A <span style="color: red;">paragraph <span style="color: green;">containing</span></span> some <code>HTML</code>.</p>
</body>
Note that the <mark>inline</mark>
element was dropped from the output and replaced with an extra space character between some
and <code>HTML</code>
.
Since the parsed markdown doesn't equal the expected output, the test fails (and is marked as pending in the source until a fix is in place):
<testcase name="Inline HTML" class="tests:inline-html">
<failure message="assertTrue failed." type="failure-error-code-1"/>
<output>false</output>
</testcase>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.