commonmark / commonmark-spec Goto Github PK

CommonMark spec, with reference implementations in C and JavaScript

License: Other

Makefile 2.08% Python 44.50% JavaScript 4.83% Lua 35.25% HTML 13.35%

commonmark-spec's Introduction

CommonMark

CommonMark is a rationalized version of Markdown syntax, with a spec and BSD-licensed reference implementations in C and JavaScript.

Try it now!

For more details, see https://commonmark.org.

This repository contains the spec itself, along with tools for running tests against the spec, and for creating HTML and PDF versions of the spec.

The reference implementations live in separate repositories:

https://github.com/commonmark/cmark (C)
https://github.com/commonmark/commonmark.js (JavaScript)

There is a list of third-party libraries in a dozen different languages here.

Running tests against the spec

The spec contains over 500 embedded examples which serve as conformance tests. To run the tests using an executable $PROG:

python3 test/spec_tests.py --program $PROG

If you want to extract the raw test data from the spec without actually running the tests, you can do:

python3 test/spec_tests.py --dump-tests

and you'll get all the tests in JSON format.

JavaScript developers may find it more convenient to use the commonmark-spec npm package, which is published from this repository. It exports an array tests of JSON objects with the format

{
  "markdown": "Foo\nBar\n---\n",
  "html": "<h2>Foo\nBar</h2>\n",
  "section": "Setext headings",
  "number": 65
}

The spec

The source of the spec is spec.txt. This is basically a Markdown file, with code examples written in a shorthand form:

```````````````````````````````` example
Markdown source
.
expected HTML output
````````````````````````````````

To build an HTML version of the spec, do make spec.html. To build a PDF version, do make spec.pdf. For both versions, you must have the lua rock lcmark installed: after installing lua and lua rocks, luarocks install lcmark. For the PDF you must also have xelatex installed.

The spec is written from the point of view of the human writer, not the computer reader. It is not an algorithm---an English translation of a computer program---but a declarative description of what counts as a block quote, a code block, and each of the other structural elements that can make up a Markdown document.

Because John Gruber's canonical syntax description leaves many aspects of the syntax undetermined, writing a precise spec requires making a large number of decisions, many of them somewhat arbitrary. In making them, we have appealed to existing conventions and considerations of simplicity, readability, expressive power, and consistency. We have tried to ensure that "normal" documents in the many incompatible existing implementations of Markdown will render, as far as possible, as their authors intended. And we have tried to make the rules for different elements work together harmoniously. In places where different decisions could have been made (for example, the rules governing list indentation), we have explained the rationale for our choices. In a few cases, we have departed slightly from the canonical syntax description, in ways that we think further the goals of Markdown as stated in that description.

For the most part, we have limited ourselves to the basic elements described in Gruber's canonical syntax description, eschewing extensions like footnotes and definition lists. It is important to get the core right before considering such things. However, we have included a visible syntax for line breaks and fenced code blocks.

Differences from original Markdown

There are only a few places where this spec says things that contradict the canonical syntax description:

It allows all punctuation symbols to be backslash-escaped, not just the symbols with special meanings in Markdown. We found that it was just too hard to remember which symbols could be escaped.
It introduces an alternative syntax for hard line breaks, a backslash at the end of the line, supplementing the two-spaces-at-the-end-of-line rule. This is motivated by persistent complaints about the “invisible” nature of the two-space rule.
Link syntax has been made a bit more predictable (in a backwards-compatible way). For example, Markdown.pl allows single quotes around a title in inline links, but not in reference links. This kind of difference is really hard for users to remember, so the spec allows single quotes in both contexts.
The rule for HTML blocks differs, though in most real cases it shouldn't make a difference. (See the section on HTML Blocks for details.) The spec's proposal makes it easy to include Markdown inside HTML block-level tags, if you want to, but also allows you to exclude this. It also makes parsing much easier, avoiding expensive backtracking.

It does not collapse adjacent bird-track blocks into a single blockquote:

> these are two

> blockquotes

> this is a single
>
> blockquote with two paragraphs

Rules for content in lists differ in a few respects, though (as with HTML blocks), most lists in existing documents should render as intended. There is some discussion of the choice points and differences in the subsection of List Items entitled Motivation. We think that the spec's proposal does better than any existing implementation in rendering lists the way a human writer or reader would intuitively understand them. (We could give numerous examples of perfectly natural looking lists that nearly every existing implementation flubs up.)
Changing bullet characters, or changing from bullets to numbers or vice versa, starts a new list. We think that is almost always going to be the writer's intent.
The number that begins an ordered list item may be followed by either . or ). Changing the delimiter style starts a new list.
The start number of an ordered list is significant.
Fenced code blocks are supported, delimited by either backticks (```) or tildes (~~~).

Contributing

There is a forum for discussing CommonMark; you should use it instead of github issues for questions and possibly open-ended discussions. Use the github issue tracker only for simple, clear, actionable issues.

Authors

The spec was written by John MacFarlane, drawing on

his experience writing and maintaining Markdown implementations in several languages, including the first Markdown parser not based on regular expression substitutions (pandoc) and the first markdown parsers based on PEG grammars (peg-markdown, lunamark)
a detailed examination of the differences between existing Markdown implementations using BabelMark 2, and
extensive discussions with David Greenspan, Jeff Atwood, Vicent Marti, Neil Williams, and Benjamin Dumke-von der Ehe.

Since the first announcement, many people have contributed ideas. Kārlis Gaņģis was especially helpful in refining the rules for emphasis, strong emphasis, links, and images.

commonmark-spec's People

Contributors

Stargazers

Watchers

Forkers

joshyphp balpha gopster85 factormystic stof robinst toddself zpasternack dai emwap jarvizx asgh twolfson mithgol jacobxie fordhurley sukima dm04806 cyberlight vmg littletinker ricecake abiggerhammer pratcurve ahmedshuhel jordanmilne leiziyeah ousia bengt alanhogan dsyayo zofuthan ming300 lichuanzhi7909 rns jaysowen sonny-weight mb21 hongnod bia-lx johnzhang1984 itsjohncs l3pp4rd frankswu estherheller rlidwka asauber michael-benin-cn rolandshoemaker maxlieblich kasperpeulen guillaumecr apnadkarni kao98 tchetch edwinyzh bozzcq mohae abduelhamit julgeiger quyet-dc gmhooray rayray uikit0 fliedonion mildsunrise michaelsproul gatispriede lantran knagis llxwj jesstelford dashift rhinoman pbevin petere qcode-software finid nishidayuya stephenzeng nodesman cirosantilli maxlinc stedman sstedman winterthediplomat nguyenduy01 mrlee23 neuroradiology yf2009017 znanl juderosario oliahad withchase zudov hirotokagotani win7guru gcochard kublaj mike-ward

commonmark-spec's Issues

Spec does not define interrupting rules for emphasis

§6.4 should give prose and examples of how emphasis and strong emphasis interact and interrupt (or not) other inline elements.

Link to http://commonmark.org/

I got linked directly to the github page, hence my reason for opening #22. As I can see, there's no link from this github page to http://standardmarkdown.com/, which there probably should be to prevent further useless issues popping up here like mine. I haven't made this a pull request since being hosted on github, you can put the link to the website next to the short description at the top.

Underscores inside of emphasis.

Why is it impossible to emphasise strings beginning or ending with an underscore? I'm using this input for testing:

blah *_*

blah *_x*

blah *x_*

blah *__*

Here is the comparison. Most implementations handle all 4 cases correctly (as it seems to me), but pandoc (in strict mode) and stmd produce this:

<p>blah *_*</p>
<p>blah *_x*</p>
<p>blah *x_*</p>
<p>blah *__*</p>

Have I misunderstood the spec, or is it a bug?

Unit tests for JS parser

Would you consider adding some unit tests for JS parser, e.g. QUnit?
I see you've got some manual tests in oldtests folder, and test.js file, but that seems to be dependant on node.js and can't be simply run in a browser.
I think it's important to have some automated tests as well.

If that's ok, I'd add QUnit (or similar tests) to the project. I think it's best to have a testing framework without external dependencies (such as node.js).

Escaped hash sign at end of header is ignored

This Markdown:

# Hello World in C\#
...

... produces this HTML on the try.standardmarkdown.com page:

<h1>Hello World in C\</h1>
<p>...</p>

I would have expected the output to include the hash sign at the end of the header, given that it is escaped and the spec explicitly mentions that you can escape hashes in headers.

I've actually reported the same issue to kramdown a few days ago, so maybe this is an edge case that is easily missed and should have a test case in the conformance test suite?

errors when generating ePub from spec.txt

Cngratulations for your excellent work on pandoc and also on Standard Markdown.

I get the following errors when generating the ePub version from the spec (inside a clone of the repo):

$ pandoc -f markdown spec.txt -o ../stmd-spec.epub
pandoc: Duplicate link reference `[foo]' "source" (line 5378, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5367, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5356, column 1)
pandoc: Duplicate link reference `[*foo* bar]' "source" (line 5338, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5330, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5320, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5308, column 1)
pandoc: Duplicate link reference `[*foo* bar]' "source" (line 5298, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5290, column 1)
pandoc: Duplicate link reference `[BAR]' "source" (line 5280, column 1)
pandoc: Duplicate link reference `[bar]' "source" (line 5272, column 1)
pandoc: Duplicate link reference `[foo *bar*]' "source" (line 5230, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5200, column 1)
pandoc: Duplicate link reference `[baz]' "source" (line 5199, column 1)
pandoc: Duplicate link reference `[bar]' "source" (line 5188, column 1)
pandoc: Duplicate link reference `[baz]' "source" (line 5187, column 1)
pandoc: Duplicate link reference `[bar]' "source" (line 5165, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5164, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5144, column 1)
pandoc: Duplicate link reference `[[[foo]]]' "source" (line 5143, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5103, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5092, column 1)
pandoc: Duplicate link reference `[*foo* bar]' "source" (line 5082, column 1)
pandoc: Duplicate link reference `[*foo* bar]' "source" (line 5074, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5066, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5048, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5035, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 5017, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 4986, column 1)
pandoc: Duplicate link reference `[bar]' "source" (line 4975, column 1)
pandoc: Duplicate link reference `[bar]' "source" (line 4966, column 1)
pandoc: Duplicate link reference `[bar]' "source" (line 4934, column 1)
pandoc: Duplicate link reference `[bar]' "source" (line 4924, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 3802, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 3709, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 1854, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 1761, column 4)
pandoc: Duplicate link reference `[foo]' "source" (line 1723, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 1722, column 1)
pandoc: Duplicate link reference `[foo]' "source" (line 1711, column 1)
pandoc: Could not find image `/url . <p><img src="/url" alt="foo" /></p> .', skipping...
pandoc: Could not find image `url', skipping...
pandoc: Could not find image `/path/to/train.jpg', skipping...
pandoc: Could not find image `train.jpg', skipping...
pandoc: Could not find image `/url', skipping...

Are these error messages intended or could you provide the image files?

Many thanks for your help.

Spec doesn't define that lists can interrupt paragraphs

Other block constructs make it clear whether or not they can interrupt paragraphs, but the lists section doesn't mention anything. The dingus results imply that they can interrupt paragraphs.

You guys failed to made the list sane

This

1. asd
2. qwe
3. 345

Produces this

<ol>
<li>asd</li>
<li>qwe</li>
<li>345</li>
</ol>

And this

1. asd
2. qwe

1. 345

Produces this

<ol>
<li><p>asd</p></li>
<li><p>qwe</p></li>
<li><p>345</p></li>
</ol>

This makes completely no sense, totally unintuitive and useless and contrary to the intention of everyone in their right mind. Who on earth needs to wrap their list items in paragraphs and who one earth would try to achieve this by inserting a blank line at random place in the list?

stmd does not produce standard complient HTML

There is no DTD declaration etc.

Incorrect handling of nested emphasis.

This:

blah ***hello* world***

according to the rules, should be parsed as

<p>blah <strong><em>hello</em> world</strong>*</p>

but both the JavaScript and C implementations output

<p>blah <strong><em>hello</em> world</strong>**</p>

Have I misunderstood the spec, or is it a bug?

Should I use this JS implementation?

Do you intend to support your JS implementation here as production code? E.g. will it be published to NPM/Bower in the future?

It'd be great to have an "official" JS Markdown implementation.

Don't Call It Standard Markdown

Y'all didn't create Markdown and don't particularly have the right to create a "standard" based on it against the wishes of the original author.

From the license:

Neither the name “Markdown” nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

It seems like you did both. Please provide proof of your written permission or consider a new name.

Possible new test case

Started testing some Markdown variants on cases that I often use, but are often b0rk'd by parsers. Pandoc and stmd are the only tools that get it right. (Yay! \o/)

https://gist.github.com/skyzyx/2b3183cd890affd877a4

Are the Yankees the best team in baseball? No.

The site has a footnote that reads "Are the Yankees the best team in baseball? Yes." I believe it should say Red Sox. :P (no need to cite current AL East standings, I know!)

XSS hazards need to be addressed

At the moment, the spec doesn't seem to mention XSS issues with a single word. The reference implementation is a huge footgun as it will happily render things like:

[Click me](javascript:alert("XSS"))
<javascript:alert("XSS")>
<div onclick="alert('XSS')">click me</div>
<script>alert("XSS")</script>

There should be at least a huge warning that the reference implementation shouldn't be used without some kind of additional XSS filter - or when the content rendered is trusted. Ideally however, a reference implementation should allow specifying a whitelist (URL schemes, tags, attributes) and even provide a whitelist that is appropriate in most cases. Otherwise the sanitization of the generated code will become yet another distinguishing point of every implementation.

ids and classes for elements

I use markdown enough that I really would like to associate a class and/or id to certain elements.

Is there already a stunt of some sort that allows the generated HTML be accessed (Javascript/CSS)? If not, does it make sense for the standard? Maybe an "escape" to HTML?

I realize I can use CSS selectors that are content specific (the second H3 after the div "mkdn") but that is just so weird!

-- Owen

Spec not clear on ---

What does


---

---

mean? Is it

 <h2>---</h2>

<hr />
<hr />

Spec is not clear

Feature Request: Source maps

It would be great to click in the preview pane in a Markdown editor & select the piece of source Markdown that rendered to that text.

Can you add an option to generate source maps so that editors can support this?

This is also necessary for synchronized scrolling.

/cc @am11 @madskristensen

Example 137 (lazy fenced code block inside of blockquote) is unclear

I'm not sure why Example 137, which shows off the effect of lazy blockquote continuation when the last arrowed line starts a fenced code block, outputs what it does.

All the other examples of bad laziness clearly happen because the un-arrowed line is, itself, not a paragraph-starting line; they all have special syntax that starts a new block. But example 137 has an un-arrowed plain-text line.

Is the intention that it fails because laziness only covers paragraphs themselves, and any other context can't extend past the end of the arrowed section? If so, this wasn't clear from the text or the other examples.

Spec not clear on "* * *"

The spec says

When both a horizontal rule and a list item are possible interpretations of a line, the horizontal rule is preferred

This is nonsensical:

* * *

is a valid list item (a list with a single item in it.) The wording needs rewording here

Also:

* * *
* is this a two item list or a one item list?

Error handling issues

Error handling in the library has some concerning issues.

Basically, all the error handling paths (i.e. the ones through the check macro) are either superfluous or not real errors.

Here's a brief list of examples:

https://github.com/jgm/stmd/blob/master/src/main.c#L84 -- this can never happen (and in fact is optimized away by the C compiler on O3)
https://github.com/jgm/stmd/blob/master/src/html.c#L93 -- there are dozens of checks like this throughout the file. Here's the kick: they can never be triggered! There isn't a single code path in the HTML rendering routines that actually returns an error code! A static analyzer can trivially show this.
https://github.com/jgm/stmd/blob/master/src/blocks.c#L81 -- this is not an "error", this is a bug in the parser! If this happens, we don't want to return an error code, we want to assert and abort the program so we can fix the underlying issue.

On top of that, and what's more important to me, all error reporting is done through stderr, something that makes the library not an option for embedding in other systems.

My proposal is as follows:

Remove all the stderr reporting altogether because it makes the library unsuitable for most environments.
Remove most of the code paths that return -1. 90% of the functions when parsing Markdown cannot fail unless there's a bug in the parser, so assert accordingly. This will greatly simplify the code in the library.
The only real errors that can occur when parsing Markdown are invalid UTF8 codepoints. Change the error codes accordingly to make sure we return a specific error code to the caller that identifies the UTF8 error.

Am I making sense, John? Do you think this proposal is acceptable?

It would be very nice to dump the ast in XML

For obvious reasons - the AST is in an ad hock text format - which means you'll need
another project to define that it means and so on ...

Nice work BTW

Typo on website

by 2014 there were dozens of implementations in many languages.

By 2014 there were dozens of implementations in many languages.

What is qualified as blank line

Specification says:

A line containing only spaces (after tab expansion) followed by a line ending is called a blank line.

I think that a line with no spaces (just a line ending) should be called the blank line as well.

http://jgm.github.io/stmd/js/ does not render ` ![](<img url>)` as image.

http://jgm.github.io/stmd/js/ does not render  ![](<img url>) as image. e.g as following:

<br/>
![](http://commonmark.org/images/markdown-mark-small.png)

Inserting new line after   make it works as following:

<br/>

![](http://commonmark.org/images/markdown-mark-small.png)

malloc() can fail! check the damn return value.

Good on you using a DFA regex library, tough.

A-elements are not self-closing.

All throughout the spec files anchors have been sprinkled in the form of <a id="line"/> (line 194). This is counter to the HTML spec.

@twolfson first reported this (#40) and fixed it (#41) as a Firefox specific bug “with self-closing <a>’s next to <ol>’s”. But Firefox is not the only browser that gets unexpected output, here as an example from Safari’s DOM-Tree:

<h2 id="paragraphs">
  <span class="header-section-number">4.8</span>
  Paragraphs
</h2>
<p>
  A sequence of non-blank lines that cannot be interpreted as other kinds of blocks forms a
  <a href="#paragraph">paragraph</a>
  <a id="paragraph">. The contents of the paragraph are the result of parsing the paragraph’s raw content as inlines. The paragraph’s raw content is formed by concatenating the lines and removing initial and final spaces.</a>
</p>

This is because HTML does not contain self-closing tags. There are void elements that “must not” have end tags and there are normal elements where the end tag “can be omitted” following a strict set of rules. The a element is not defined by either.

Instead of fixes like #41 all anchors should be changed to their full form, i.e. <a id="line"></a>.

Philosophy on 'other constructs' (such as tables)

For the most part, I have limited myself to the basic elements described in Gruber's canonical syntax description, eschewing extensions like footnotes and definition lists. It is important to get the core right before considering such things.

And from the introduction, 1.1:

By 2014 there were dozens of implementations in many languages. Some of them extended basic markdown syntax with conventions for footnotes, definition lists, tables, and other constructs, and some allowed output not just in HTML but in LaTeX and many other formats.

Forgive me as I have not found other specific references here to such extended markdown implementations and stmd's relationship to them (pandoc extended markdown, multimarkdown or github flavoured markdown being the most compelling examples). I understand you have chosen to limit stmd to the basic elements in Gruber's originally defined syntax, which is understandable of course as it is being called a standard. However, anticipating a large demand for standardizing extended constructs such as tables or equations (for example, from those who already have many pandoc markdown documents utilizing such elements), does stmd see itself integrating such features some day in the future, or is that against the ideals of this project?

Will it allow meta/header content at the start of the file like in pandoc's extended markdown? Will it be compatible with pandoc's extended markdown (seeing as though you created pandoc and that stmd requires pandoc to build html/pdf versions of the spec)?

Again, sorry if you've already mentioned this somewhere as I haven't found it in my searches.

Keep the captured line in the AST so that it can be rewritten

We want to use a limited subset of Standard Markdown for an application. For example, we don't want to allow Horizontal Rules.

One approach is to take the parsed tree and rewrite all HorizontalRules as Paragraphs. That won't work though since the Parsed output no longer has the original line in it.

One approach is to add a capture property to the tree, as I've done in my commit. With that information available, it is possible to walk the tree and do something like this:

switch (child.t) {
    case "HorizontalRule":
        child.t = "Paragraph";
        child.inline_content = [{ t: "Str", c: child.capture }];
        child.strings = [child.capture];
        break;
}

This rewrites all HorizontalRule things as Paragraphs, containing the original match as the inline string.

The obvious issue here is that the changing of the types requires knowledge of implementation details, but that could be solved by exposing a "Create Paragraph From String" function.

Does the Standard Markdown team feel this might make sense to do in the Parser?

Does not cleanly error on lack of re2c

If you compile without re2c you get:

re2c --case-insensitive -bis src/scanners.re > src/scanners.c
/bin/sh: re2c: command not found

Which is fine and expected. But then install re2c and run make and you get:

cc -g -O3 -Wall -Werror -o stmd src/main.c src/inlines.o src/blocks.o src/detab.o src/bstrlib.o src/scanners.o src/print.o src/html.o src/utf8.o
src/inlines.o: In function `handle_entity':
stmd/src/inlines.c:500: undefined reference to `scan_entity'
src/inlines.o: In function `handle_pointy_brace':
stmd/src/inlines.c:595: undefined reference to `scan_autolink_uri'
stmd/src/inlines.c:605: undefined reference to `scan_autolink_email'
stmd/src/inlines.c:616: undefined reference to `scan_html_tag'
src/inlines.o: In function `handle_left_bracket':
stmd/src/inlines.c:746: undefined reference to `scan_spacechars'
stmd/src/inlines.c:713: undefined reference to `scan_spacechars'
stmd/src/inlines.c:714: undefined reference to `scan_link_url'
stmd/src/inlines.c:719: undefined reference to `scan_spacechars'
stmd/src/inlines.c:722: undefined reference to `scan_link_title'
stmd/src/inlines.c:723: undefined reference to `scan_spacechars'
src/inlines.o: In function `handle_entity':
stmd/src/inlines.c:500: undefined reference to `scan_entity'
src/inlines.o: In function `parse_reference':
stmd/src/inlines.c:951: undefined reference to `scan_link_url'
stmd/src/inlines.c:965: undefined reference to `scan_link_title'
src/blocks.o: In function `incorporate_line':
stmd/src/blocks.c:532: undefined reference to `scan_atx_header_start'
/stmd/src/blocks.c:545: undefined reference to `scan_open_code_fence'
stmd/src/blocks.c:556: undefined reference to `scan_html_block_tag'
stmd/src/blocks.c:572: undefined reference to `scan_hrule'
src/blocks.o: In function `parse_list_marker':
stmd/src/blocks.c:326: undefined reference to `scan_hrule'
src/blocks.o: In function `incorporate_line':
stmd/src/blocks.c:684: undefined reference to `scan_close_code_fence'
stmd/src/blocks.c:562: undefined reference to `scan_setext_header_line'
src/html.o: In function `escape_html':
stmd/src/html.c:40: undefined reference to `scan_entity'
collect2: error: ld returned 1 exit status
make: *** [stmd] Error 1

Which is not good.

Then run make clean:

rm test src/*.o src/scanners.c
rm: cannot remove `test': No such file or directory
make: [clean] Error 1 (ignored)
rm -r *.dSYM
rm: cannot remove `*.dSYM': No such file or directory
make: [clean] Error 1 (ignored)
rm README.html
rm: cannot remove `README.html': No such file or directory
make: [clean] Error 1 (ignored)
rm spec.md fuzz.txt spec.html
rm: cannot remove `spec.md': No such file or directory
rm: cannot remove `fuzz.txt': No such file or directory
rm: cannot remove `spec.html': No such file or directory
make: [clean] Error 1 (ignored)

So even more errors.

But running make after that works fine.

Setext headers’ fuzzy definition

[…] with no more than 3 spaces indentation and any number of leading or trailing spaces.

How could it have any leading space?

Reference implementation won't render an inline link if followed by an autolink

Consider the following Markdown code:

[Foo](http://example.com/) something <http://example.com/>

The JavaScript reference library currently renders this as:

<p>[Foo](http://example.com/) something <a href="http://example.com/">http://example.com/</a></p>

So if an inline link is followed by an autolink it isn't rendered, opposite ordering is fine however. I would expect the rendered output to be:

<p><a href="http://example.com/">Foo</a> something <a href="http://example.com/">http://example.com/</a></p>

Publish js version to npm

Considering the JavaScript parser is currently under copyright by John MacFarlane, is there any possibility to re-license/publish the parser to npm? I'm aware that the implementation details might change as the spec approaches 1.0 but it would be great to have an easy way to play with the spec.

list-item / block quote typo

https://github.com/jgm/stmd/blob/gh-pages/spec.html#L2551:

<li><strong>That’s all.</strong> Nothing that is not counted as a list item by rules #1–4 counts as a <a href="#block-quote">list item</a>.</li>

The link text doesn't match the link href.

Incorrect example in the discussion of the fours-space rule

Section 5.2.1 (Motivation) says

The four-space rule is clear but unnatural. It is quite unintuitive that
- foo

 bar

 - baz
should be parsed as two lists with an intervening paragraph, [...] as the four-space rule demands, rather than a single list
<ul>
<li>foo
bar</li>
<li>baz<li>
</ul>

-- but this is not what this markdown source is parsed as; instead, "baz" should be the first (and only item) of a sublist inside the outer list item. (I tested this in http://jgm.github.io/stmd/js/ and that's indeed what happened).

Not making this a pull request since I'm not sure of what this section should actually say (I did however open a pull request that fixes an extraneous angle bracket in this spot).

How tabs are expanded to spaces? I'm confused.

The spec states that "Tabs in lines are expanded to spaces, with a tab stop of 4 characters". But when I look at the html result. The first tab only got 1 space?

foo→baz→→bim    |  <p>foo baz     bim</p>

Feature Request: Tables

Tables are used very often to help explain information that would requires multiple paragraphs into a clear concise 2d layout. The issue with making this feature optional means it won't be fully widespread. And stuff like tables and anchors are common enough to be a core standard.

A common complaint I understand about tables in markdown variants that attempts to implement this, is that it is hard to maintain. So here is some ways I think it can be simplified from "Markdown Extra" syntax for this effort.

This is Markdown Extra Syntax for tables:

 | Item      | Value |
 | --------- | -----:|
 | Computer  | $1600 |
 | Phone     |   $12 |

First example (Compress the pipe headers):

To indicate a field is a header you use |-, -| .
For header alignment: |:- left aligned -|, |- right aligned -:| |:- Centre aligned -:| .

|:- Header -:|:- Header -:|
|   Row      |   Row      |
|   Row      |   Row      |

Second Example ( CSV Input):

The second issue, is that people find it hard to have to deal with formatting the pipes. If alignment of cell data is of no concern to the user, then we should use CSV data as the inspiration.

I'm a big fan of CSV data, due to how easy it is to type. The ease of use comes from sticking to csv which most people use already, and combining it with a simplified table header.

If you still need alignment control for each cell, then you can just use the previous (but simplified) pipe tables shown above using |:, :|

|:- Year -|:- Make  -|:- Model                         -:| 
  1997,      Ford,      E350
  1999,      Chevy,    "Venture ""Extended Edition"""
  1999,      Chevy,    "Venture ""Extended Edition
  1996,      Jeep,      Grand Cherokee

 This is some other text, since the end of a table is implied by a new paragraph.

example data from: http://en.wikipedia.org/wiki/Comma-separated_values

Essentially, just treat pipes as 'optional' for the actual cell data (which is the field that gets modified most often anyway (compared to the header). This way, we can avoid too much formatting, and heck if you are lazy, you could just remove whitespaces and it shall still be very maintainable like so:

|:- Year -|:- Make -|:- Model  -:| 
1997, Ford, E350
1999, Chevy, "Venture ""Extended Edition"" "
1999, Chevy, "Venture ""Extended Edition"" "
1996, Jeep, Grand Cherokee

The second approach is my preference. Since I believe markdown is about getting formatting out of the way of your writing.

Syntax Highlighting would be nice.

I would highly appreciate if there was something about syntax higlighting.

newline === newline

Tested the "standard" and like it. Only one thing frustrates me (a lot) is that (markdown newline) !== (result newline).

Please. Isn't the whole point of markdown to make writing easy? This is how my thought pattern works:
press enter for a single new line: wtf I didn't get one. Maybe I need two?
press enter a second time for a single new line: wtf now I got two newlines, I just need one!!
backspace to get a single newline: no newlines again.. god damn this is annoying.
replace newlines with
: really?? this works?? starting to hate markdown..

In support of my argument: http://www.marco.org/2012/02/25/right-vs-pragmatic

H-rule/setext-header ambiguity

Issue requested in http://talk.standardmarkdown.com/t/h-rule-setext-header-ambiguity/95/2.

It should be explicit that setext headers precede horizontal rules.
According to section 4.1 and example 4, this gives a paragraph followed by a HR:

aaa

---

Unclosed fenced code block doesn't always run until the end of the document

Section 4.5 (Fenced code blocks) says (emph. mine)

If the end of the document is reached and no closing code fence has been found, the code block contains all of the lines after the opening code fence.

but this isn't necessarily true; it may also be the end of the containing block that ends the fenced block. Example 137 is an instance of this:

> ```
foo
```

renders as

<blockquote>
<pre><code></code></pre>
</blockquote>
<p>foo</p>
<pre><code></code></pre>

I'm not making this a pull request since I'm not 100% on the best wording here.

Javascript unit test fails on Windows

On Windows, the regular expressions for extracting examples from spec.txt are failing. This seems to be because fs.ReadFile() is adjusting the line endings to be \r\n in memory (even if \n in the file.

Net result is the tests fail. A simple normalization transform cleans this up.

Typo in spec

"eample" instead of "example" when discussing blank lines separating block quotes.

License of the documentation and specification is unclear

My concern is that the specification is not clearly licensed (maybe I didn't do enough work to find this out). I do not know if I am free to modify it, copy it, share it, remix it, etc.

If the intent is apply the BSD3 to documentation and specification can that be clarified within the documents themselves? Even the FreeBSD project doesn't license documentation under the BSD license: https://en.wikipedia.org/wiki/FreeBSD_Documentation_License

Otherwise may I suggest CC-0 or CC-BY or CC-BY-SA as a default license for specifications.

Much like the complaints of ambiguity in the original description, the original description suffers from this spec/documentation licensing ambiguity as well.

STMD is slow because of `bstring`

The bstring library is not particularly good for this use case (arguably, for any use cases).

Doing a stack profile of a large parsing run gives these results:

samples  %        image name               symbol name
2405     20.4820  libc-2.19.so             __memmove_ssse3_back
1237     10.5348  libc-2.19.so             vfprintf
893       7.6052  libc-2.19.so             _int_malloc
794       6.7621  no-vmlinux               /no-vmlinux
758       6.4555  libc-2.19.so             _int_free
594       5.0588  libc-2.19.so             fgetc
560       4.7692  libc-2.19.so             _IO_strn_overflow
544       4.6329  libc-2.19.so             _IO_default_xsputn
518       4.4115  libc-2.19.so             malloc
287       2.4442  stmd                     bgets
233       1.9843  stmd                     binchr
206       1.7544  stmd                     bformata
199       1.6948  stmd                     bdetab
159       1.3541  libc-2.19.so             strlen
153       1.3030  libc-2.19.so             malloc_consolidate

Top 20 is completely dominated by the bstring code. Particularly, the thousands of unnecessary calls to vprintf, the constant reallocations, and non-stop shifting around of content inside of strings.

I have a set of string libraries that we've been using in Sundown and all over GitHub for years. They are efficient and remarkably secure (thoroughly audited by external firms).

Would you take a PR that replaces the use of bstring with a new string handling code, or are you set on using bstring? I understand this is a big undertaking but I promise the result will be just as clean and extensively tested.

custom HtmlRenderer for stmd.js

this is more a feature request than an issue. Marked has a very slick way to define custom Renderer. From quickly inspecting the code, with stmd.js it is not too difficult, but at best there is still lots of boilerplate code to be copied into a custom HtmlRenderer and at worst (shame on me, I haven't tried yet), the original HtmlRenderer uses unexported stuff from stmd.js and that would require to patch (or fork) the whole stmd.js

Spec for language direction

Markdown still doesn't define any spec for language direction (i.e. RTL and LTR). I know I can swap it throughout dir="rtl" in the final HTML generation but it has to be better specially when writing on two languages with different directions (e.g. Arabic and English).

Firefox improperly rendering `Block quotes` section

There seems to be a visual issue with how Firefox handles the Block quotes section's anchor. It takes the generated HTML

<h2 id="block-quotes"><span class="header-section-number">5.1</span> Block quotes</h2>
<p>A <a href="#block-quote-marker">block quote marker</a> <a id="block-quote-marker" />consists of 0-3 spaces of initial indent, plus (a) the character <code>&gt;</code> together with a following space, or (b) a single character <code>&gt;</code> not followed
    by a space.</p>
<p>The following rules define <a href="#block-quote">block quotes</a>:<a id="block-quote" />
</p>
<ol style="list-style-type: decimal">
    <li>
        <p><strong>Basic case.</strong> If a string of lines <em>Ls</em> constitute a sequence of blocks <em>Bs</em>, then the result of appending a [block quote marker] to the beginning of each line in <em>Ls</em> is a <a href="#block-quote">block quote</a> containing <em>Bs</em>.</p>
    </li>

and converts it to this:

<h2 id="block-quotes"><span class="header-section-number">5.1</span> Block quotes</h2>
<p>A <a href="#block-quote-marker">block quote marker</a>  <a id="block-quote-marker"> consists of 0-3 spaces of initial indent, plus (a) the character <code>&gt;</code> together with a following space, or (b) a single character <code>&gt;</code> not followed by a space.</a>
</p>
<a id="block-quote-marker">
</a>
<p><a id="block-quote-marker">The following rules define </a><a href="#block-quote">block quotes</a>:
    <a id="block-quote"></a>
</p>
<a id="block-quote">
</a>
<ol style="list-style-type: decimal">
    <a id="block-quote">
    </a>
    <li>
        <a id="block-quote"></a>
        <p><a id="block-quote"><strong>Basic case.</strong> If a string of lines <em>Ls</em> constitute a sequence of blocks <em>Bs</em>, then the result of appending a [block quote marker] to the beginning of each line in <em>Ls</em> is a </a><a href="#block-quote">block quote</a> containing <em>Bs</em>.</p>
    </li>

To make it more obvious, it is taking the self-closing tag and making it into 2; 1 before the ol and 1 inside the ol

<a id="block-quote">
</a>
<ol style="list-style-type: decimal">
    <a id="block-quote">
    </a>

The visual result looks like:

I will be submitting a PR shortly to resolve the issue.

ctype function arguments char instead of int

On NetBSD, there are warnings:

src/blocks.c: In function 'parse_list_marker':
src/blocks.c:339:3: warning: array subscript has type 'char' [-Wchar-subscripts]
   } else if (isdigit(c)) {
   ^
src/inlines.c: In function 'normalize_reference':
src/inlines.c:41:5: warning: array subscript has type 'char' [-Wchar-subscripts]
     if (isspace(c)) {
     ^
src/inlines.c: In function 'scan_delims':
src/inlines.c:347:3: warning: array subscript has type 'char' [-Wchar-subscripts]
   *can_open = numdelims > 0 && numdelims <= 3 && !isspace(char_after);
   ^
src/inlines.c:348:3: warning: array subscript has type 'char' [-Wchar-subscripts]
   *can_close = numdelims > 0 && numdelims <= 3 && !isspace(char_before);
   ^
src/inlines.c:350:5: warning: array subscript has type 'char' [-Wchar-subscripts]
     *can_open = *can_open && !isalnum(char_before);
     ^
src/inlines.c:351:5: warning: array subscript has type 'char' [-Wchar-subscripts]
     *can_close = *can_close && !isalnum(char_after);
     ^

The reason is that NetBSD is more picky about the argument type of the ctype functions.
NetBSD's ctype(3) man page gives details:
http://netbsd.gw.com/cgi-bin/man-cgi?ctype++NetBSD-current
See the CAVEATS section.

Feature Request: Anchor Links

It would be great if Standard Markdown could incorporate anchor links. One of the best uses of Markdown is to write simple one-page webpages, for people who do not want to manage a large website code, and yet want fine-tuned control of their website. However, if the webpage is just one page, content is distributed in the same page using sections. So, it is important for the page to have anchor links, so that if I want to send a link to someone for my projects, I can send http://sherjil.ozair.io/#projects, which will open my one-page webpage, and scroll to the projects section.

A similar useful feature would be the ability to specify "open in new tab" links.