3b / 3bmd Goto Github PK
View Code? Open in Web Editor NEWmarkdown processor in CL using esrap parser
License: MIT License
markdown processor in CL using esrap parser
License: MIT License
Something like * #not a title
will be rendered as
<ul>
<li><h1>not a title</h1></li>
</ul>
when it should just be
<ul>
<li>#not a title</li>
</ul>
It seems that regular, indented code blocks are treated differently than unindented (marked with ```) code blocks in terms of highlighting? IMO, even though there's currently no way to set a language, the indented code blocks should also undergo highlighting, at least that's how other formatters, e.g. Stackoverflow render it too.
There are two problems:
CODE-BLOCK
.40ANTS-DOC-TEST/UTILS-TEST> (let ((3bmd-code-blocks:*code-blocks* t))
(3bmd-grammar:parse-doc "
* Added a warning mechanism, which will issue such warnings on words which looks
like a symbol, but when real symbol or reference is absent:
```
WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)
```
"))
((:BULLET-LIST
(:LIST-ITEM
(:PLAIN "Added" " " "a" " " "warning" " " "mechanism," " " "which" " "
"will" " " "issue" " " "such" " " "warnings" " " "on" " " "words" " "
"which" " " "looks" "
"
" " "like" " " "a" " " "symbol," " " "but" " " "when" " " "real" " "
"symbol" " " "or" " " "reference" " " "is" " " "absent:")))
(:PLAIN " "
(:CODE "
WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)
")))
NIL
T
When there is now indentation, than code block is parsed correctly:
40ANTS-DOC-TEST/UTILS-TEST> (let ((3bmd-code-blocks:*code-blocks* t))
(3bmd-grammar:parse-doc "
```
WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)
```
"))
((3BMD-CODE-BLOCKS::CODE-BLOCK :LANG "" :PARAMS NIL :CONTENT
"WARNING: Unable to find symbol \"API\" mentioned in (CL-INFO:@INDEX SECTION)"))
NIL
T
Is there any intent to meet the new spec at http://spec.commonmark.org/?
When a mailto element is printed to html there is randomness injected into the encoding supposedly to make life more difficult for spammers:
(defun encode-email (text)
(with-output-to-string (s)
(loop for i across text
for r = (random 1.0)
do (cond
((< r 0.1) (write-char i s))
;; fixme: make this portable to non-unicode/ascii lisps?
((< r 0.6) (format s "&#x~x;" (char-code i)))
(t (format s "&#~d;" (char-code i)))))))
Unfortunately, this has the side effect of introducing spurious diffs when the generated html is version controlled. Would a deterministic solution be acceptable?
Currently the Pygments mode of ext-code-blocks
passes user input to the pygmentize
for the language and options. The code tries to do so safely by trying to avoid going through a shell and rejecting the cssfile
option, but it would probably be better to whitelist the allowed options and either whitelist the languages (possibly querying from pygmentize
on first use?) or at least restrict the characters allowed.
Running this:
(let ((3bmd-grammar:*smart-quotes* t))
(3bmd:parse-string-and-print-to-stream "\\'" *standard-output*))
gives the error:
Cannot FUNCALL the SYMBOL-FUNCTION of special operator QUOTE.
Would you please consider adding a :description option to your system definition of 3bmd, 3bmd-ext-code-blocks and 3bmd-ext-wiki-links?
Currently HTML is generated with some newlines and no indentation. Working indentation would be nice to have, as would an option to add no extra whitespace including newlines.
*PADDING*
and PADDED
sound like they should modify indentation, but apparently just do something with newlines?
... with error "Incomplete parse, stopped at 6.". If I add a newline and some more text, it works.
We use Markdown because it's able to output meaningful HTML no matter how bad is the input; so more generally, it would be nice to have an option to just accept any input and never throw a parse error.
Current parse tree is mostly derived from the grammar rather than having any though put into it.
Would be nicer to have a more logical parse tree as an officially supported part of the API, for people who want to modify it or add other output formats.
This should produce a warning when printed:
[something][non-existent]
The grammar should match all input, but in case of bugs it would be nice to (optionally?) catch parse errors and return something useful anyway.
(* character)
to the end of the doc
rule, and add it to the blocks (maybe as an extra plain
block?)doc
grammar if this needs to be here anyway?)github and commonmark don't require blank lines before or after fenced code blocks, so 3bmd probably shouldn't require them either.
In commit 18a59d3, I changed print-md-escaped
to escape the []
and {}
characters. The former was necessary for print/parse consistency, while the latter wasn't because {}
are not parsed specially (except for allowing them to be backslash escaped). However, in melisgl/mgl-pax#28, we find that escaping curly brackets makes outputting latex-in-markdown for pandoc a pain.
Do you think not escaping them would be correct?
in lists with some items separated by blank lines, we currently treat all elements as paragraphs.
markdown.pl only treat entries before/after blank lines as paragraph (2,3,4 in example).
Github treats everything starting before the first line as a paragraph (2,3,4,5,6 in example below).
see http://babelmark.bobtfish.net/ for a comparison of various other implementations, all 3 behaviours seem reasonably common
test case:
* l0
* l1
* l2
* l3
* l4
* l5
* l6
test case as displayed by github:
l0
l1
l2
l3
l4
l5
l6
Hello!
Thank you for your amazing project. I am using it to write a static site generator and am running into an issue where it it outputs a single paragraph tag for an entire string of text with newlines instead of separating into new paragraphs on the newlines.
The below code:
This is a first post. I am excited to have this post in place. I am using a new blogging engine I wrote myself in Common Lisp.
This is me hoping the paragraph gets formatted properly.
Gives me the following output:
<p>This is a first post. I am excited to have this post in place. I am using a new blogging engine I wrote myself in Common Lisp.This is me hoping the paragraph gets formatted properly.</p>
Any assistance with this issue would be much appreciated. I am running with the latest build from quicklisp on SBCL for macOS.
It fails as:
:info:build Caught UNBOUND-VARIABLE while processing --eval option "(asdf:operate (quote asdf:build-op) (quote 3bmd-tests))":
:info:build The variable DEF-GRAMMAR-TEST is unbound.
:info:build Command failed: env XDG_CACHE_HOME=$HOME/.cache /opt/local/bin/abcl --noinit --batch --eval '(require "asdf")' --eval '(setf asdf:*central-registry* (list* (quote *default-pathname-defaults*) #p"/opt/local/var/macports/build/_Users_catap_src_macports-ports_lisp_cl-3bmd/cl-3bmd/work/build/system/" #p"/opt/local/share/common-lisp/system/" asdf:*central-registry*))' --eval '(asdf:operate (quote asdf:build-op) (quote 3bmd-tests))' 2>&1
SBCL, ECL, CLisp and CCL works.
Code blocks lose the indent when printed:
CL-USER> (let ((3bmd-code-blocks:*code-blocks* t))
(3bmd-grammar:parse-doc "
- xxx
```
0123456789
89
```
"))
((:BULLET-LIST
(:LIST-ITEM (:PARAGRAPH "xxx")
(3BMD-CODE-BLOCKS::CODE-BLOCK :LANG "" :PARAMS NIL :CONTENT "0123456789
89"))))
NIL
T
CL-USER> (let ((3bmd-code-blocks:*code-blocks* t))
(3bmd:print-doc-to-stream * *standard-output* :format :markdown))
- xxx
```
0123456789
89
```
Hello,
This is surprising, but why not:
(3bmd:parse-string "rst")
; in: 3BMD:PARSE-STRING "rst"
; (3BMD:PARSE-STRING "rst")
;
; caught STYLE-WARNING:
; undefined function: 3BMD:PARSE-STRING
Slime finds this choice, I find parse-string
as an exported symbol, but my grep and my eyes couldn't find a function definition too.
with Quicklisp of january.
regards
Please consider adding :description, :author and :license information to your ASDF system(s). This will greatly help Quicklisp users and make it easier for them to report bugs.
More information:
http://blog.quicklisp.org/2015/05/looking-for-more-metadata.html
https://www.quicklisp.org
The docs say 3bmd:*colorize-name-map*
but it should be 3bmd-code-blocks::*colorize-name-map*
(also it's unexported)
cl-mongo README.md
embeds documentation created with docmentation template, which doesn't close <p>
tags, and the embedded chunks are <p>
followed immediately by a <blockquote>
in 1 html block rather than separated into 2 as I understand the markdown docs to require.
Github parses these chunks as HTML blocks when rendering documentation, but doesn't seem to in issues.
Probably can get reasonable parsing by optionally allowing multiple html-block-in-tags
in one html-block
, and adding a variant of <p>
that is closed by a html-block-in-tags
(or possibly only a subset of block tags?) rather than </p>
Is this not valid input?
(3bmd-grammar:parse-doc "[l][*x*]")
.. debugger invoked on SB-KERNEL:CASE-FAILURE:
.. :EMPH fell through ETYPECASE expression.
.. Wanted one of (STRING CHARACTER LIST).
[a][a[a]]
confuses the reference-link-double
grammar, shouldn't match it at all
I'm using the per-block implementation in parse-doc
, but it's still fairly easy to run out of memory with large %block
s with something like this:
CL-USER> (time
(let ((input (with-output-to-string (out)
(loop repeat 100000
do (format out "- ~A ~A ~A ~A~%"
(random 1000000) (random 1000000)
(random 1000000) (random 1000000))))))
(3bmd-grammar::parse-doc input)
(length input)))
Evaluation took:
12.364 seconds of real time
12.371129 seconds of total run time (11.750773 user, 0.620356 system)
[ Run times consist of 5.771 seconds GC time, and 6.601 seconds non-GC time. ]
100.06% CPU
37,030,481,562 processor cycles
15,570,202,368 bytes consed
2955662
CL-USER> (/ 15570202368 2955662.0)
5267.924
This example uses a bulleted list because it is probably the worst offender, but a large paragraph behaves similarly.
According to time
, consing scales linearly with the number of repeat
s, which is good. Perhaps 5267 bytes per character is too high, but I suspect that the main problem is that maximum size of the working set also scales linearly.
Per https://github.com/nikodemus/esrap, the new location of esrap is https://github.com/scymtym/esrap if you want to update the README link.
3bmd is older than CommonMark, so it tries to implement the original markdown syntax with reference to behavior of other markdown processors where that was ambiguous. That strategy has all the problems that motivated CommonMark, and CommonMark seems popular enough now that not matching it is annoying and/or confusing to users (ex #45).
Unfortunately, it looks like it would be difficult or impossible to write a proper PEG/TDPL grammar for the entire CommonMark spec at once, so it would probably be hard to maintain compatibility with existing 3bmd extensions.
It probably wouldn't be too hard to write a new parser using something like the multiple pass lines -> blocks -> inlines
strategy suggested by the spec. The inlines
pass might be able to reuse a lot of the 3bmd inline grammar, possibly with some limitations on length of code span delimiters and similar. In that case, inline extensions might be usable without too much changes (I'd probably want to clean up the AST in the process though, so they would need updates to match that). Block elements would need rewritten though, not sure if that pass would use esrap
for parsing or if it would need something more complicated to handle the arbitrary indentation in lists/blockquotes. Possibly a hybrid with an esrap rule to detect start of a block, and then let the block parse the following lines however it wants.
I don't have any current plans to work on such a thing though, since my current limited markdown needs are satisfied by 3bmd as it is and I have other things that are higher priority for now (unless someone has a pile of money to throw at a commonmark parser or something). It does seem interesting enough that I might try to at least do a proof-of-concept between other projects at some point, but will probably be a while if so.
some related links:
CommonDoc : probable replacement for the ad-hoc AST in 3bmd in a rewrite
commondoc-markdown : Project using 3bmd with CommonDoc, possibly supporting CommonMark in the future.
cl-cmark : CommonMark processor using FFI to libcmark
aka "standard markdown", "common markdown"
http://commonmark.org/
There is no applicable method for the generic function #<STANDARD-GENERIC-FUNCTION 3BMD-EXT:PRINT-MD-TAGGED-ELEMENT (35)> when called with arguments (3BMD-DEFINITION-LISTS::DEFINITION-LIST #<SB-IMPL::STRING-OUTPUT-STREAM {666A6F3}> ((:TERMS ((3BMD-DEFINITION-LISTS::DEFINITION-TERM "test" " " "definition")) :DEFINITIONS ((3BMD-DEFINITION-LISTS::DEFINITION-LIST-ITEM (:PLAIN "The" " " "definition" " " "test")))) (:TERMS ((3BMD-DEFINITION-LISTS::DEFINITION-TERM "second" " " "item")) :DEFINITIONS ((3BMD-DEFINITION-LISTS::DEFINITION-LIST-ITEM (:PLAIN "Nother" " " "definition" " " "test")))))).
Code is expected PRINT-MD-TAGGED-ELEMENT
method, but extension defines PRINT-TAGGED-ELEMENT
.
3bmd-ext-tables ignore empty cells in table. In a following text, 3bmd don't render correctly.
| a | |
| - | - |
| | b |
I expect to render like a following.
a | |
---|---|
b |
I guess the cause is
Line 37 in 5b301ad
Thank you.
Out of the box 3bmd
doesn't recognise that processing instructions are valid:
cl-user> (3bmd:parse-string-and-print-to-stream "<?this is a valid processing instruction?>" t)
<p><?this is a valid processing instruction?></p>
At least according to the CommonMark spec, processing instructions are allowed as blocks and inlines, and should be passed through verbatim.
When trying to print as markdown I get an error that the function 3bmd::ensure-paragraph
is undefined, and inspection shows that 3bmd::end-paragraph
also is.
Hypothesis: these were renamed 3bmd::ensure-block
and 3bmd::end-block
and the one call site didn't get edited. Can you confirm, @melisgl?
A have a bit of headache with the parse/print consistency of headings.
First, and this may be how markdown works, if there is no newline after the "heading", then it's parsed as :PLAIN
:
CL-USER> (3bmd::parse-doc "x
#y
")
((:PARAGRAPH "x") (:HEADING :LEVEL 1 :CONTENTS ("y")))
NIL
T
CL-USER> (3bmd::parse-doc "x
#y")
((:PARAGRAPH "x") (:PLAIN "#" "y"))
NIL
T
When the latter is printed, an extra newline is inserted:
CL-USER> (3bmd:print-doc-to-stream (3bmd::parse-doc "x
#y") t :format :markdown)
x
#y
NIL
When the heading is escaped, the parse is good, but printing loses the escape:
CL-USER> (3bmd::parse-doc "x
\\#y
")
((:PLAIN "x" "
"
"#" "y"))
NIL
T
CL-USER> (3bmd:print-doc-to-stream (3bmd::parse-doc "x
\\#y
") t)
x
#y
NIL
If this output is parsed again, then we get a :HEADING
. Thus print/parse consistency is lost.
The quick fix would be to escape all # characters in print-md-escaped
, but that produces unnecessarily cluttered output, which goes against the spirit of markdown. The right solution seems to be to escape only in column 0, but that's not easily and portably available.
I'm using clisp from https://gitlab.com/gnu-clisp/clisp/-/commit/66924971790e4cbee3d58f36e530caa0ad568e5f and attempt to run tests via MacPorts leads to failure:
:info:test INDENT-BY-TAB-SHOULD-BE-REPLACED-WITH-SPACES........................................................................[ OK ]
:info:test BLANK-LINE-TEST2........................................................................[ OK ]
:info:test BLANK-LINE-TEST1........................................................................[ OK ]
:info:test NEWLINE........................................................................[ OK ]
:info:test MULTIPLE-SPACES-MIXED-WITH-TABS........................................................................[ OK ]
:info:test TAB-AS-SPACE........................................................................[ OK ]
:info:test SPACE-TEST........................................................................[ OK ]
:info:test EOF-TEST........................................................................[ OK ]
:info:test Test run had 1 failure:
:info:test Failure 1: FAILED-ASSERTION when running 3BMD-TESTS::PARSE-LIST-WITH-CARRIAGE-RETURN
:info:test Binary predicate (EQUALP X Y) failed.
:info:test x: 3BMD-TESTS::RESULT =>
:info:test ((:BULLET-LIST
:info:test (:LIST-ITEM
:info:test (:PLAIN "x"
:info:test "
:info:test "
:info:test "y"
:info:test "
:info:test "
:info:test "Not" " " "verbatim"))))
:info:test y: 3BMD-TESTS::EXPECTED =>
:info:test ((:BULLET-LIST
:info:test (:LIST-ITEM
:info:test (:PARAGRAPH
:info:test "x
:info:test y")
:info:test (:PARAGRAPH "Not" " " "verbatim"))))
:info:test *** - tests failed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.