remarkjs / remark Goto Github PK

View Code? Open in Web Editor NEW

7.2K 55.0 353.0 12.85 MB

markdown processor powered by plugins part of the @unifiedjs collective

Home Page: https://remark.js.org

License: MIT License

JavaScript 100.00%

markdown ast javascript unified remark commonmark

remark's People

Contributors

Stargazers

Watchers

Forkers

spmjs eush77 ulrikstrid why-jay anandthakker gitter-badger minodisk zkochan boguan deanishe sfrdmn jpeer264 sahwar mizchi catesandrew ohtake mattcreager kgryte sweptr richardlitt kyleamathews modulexcite binhndicts joseroubert08 alex-e-leon tteltrab nokome rokt33r sethvincent ondinerhi simov barkinet niilante robomatic vhf stevenxl ruifortes davidtheclark quantizor dherges pabloleon brendo djm erquhart marfuzzi thomascullen michaelisprihanto tbroadley whowhenwheredev ikatyang kwangkim mlrawlings christianmurphy lvl99 streamich jessepinho alesanchezr devongovett darklightblue hamms qfox damianofusco rubys transitive-bullshit jeffluong pauloptimizely seafoam6 noahprince22 joe223 johnking bjudson strugee jaredk3nt tada1 wconnorwalsh calebissharp rexxars zslabs imcuttle antialiasis staltz trott korolev jashmenn sbugert jjm31601394 ghsyeung swizec mike-north cherishsince aocenas mohammedgmgn jake-low hydro47 mintutu restingtsunami zwz oraykt millette alexeykuzmin

remark's Issues

ouput wrong position when passing empty string

var mdast = require('mdast');
var emptyString = "";
var ast = mdast.parse(emptyString);
console.log(JSON.stringify(ast));
/*
{
    "type": "root",
    "children": [],
    "position": {
        "start": {
            "line": 1,
            "column": 1
        }
    }
}
*/
// position.end is undeifned

Example : http://requirebin.com/?gist=ad3c34ef897338867009

Expected:

{
    "type": "root",
    "children": [],
    "position": {
        "start": {
            "line": 1,
            "column": 1
        },
        "end": {
            "line": 1,
            "column": 1
        }
    }
}

Actual:

position.end is undeifned

1.0.0?

mdast is currently on a semver "unstable" 0.x.x version. Is that intentional?

It seems to have tests with full coverage, no open issues, no recent API breaks (though I haven't checked very carefully), and a little bunch of dependent packages.

What's blocking a stable release?

Should have a CLI

This would be extra powerful with plugins (like duo)

Could we add more code samples in manpages?

https://github.com/wooorm/mdast/blob/master/doc/mdastplugin.3.md

In this page, I am trying very hard to understand how to create and manipulate plugins. While I know concepts have been organized in a very able way, you are introducing a log of new terms as attacher, transformer and completer.

There is but one code example on creating a plugin, and this only implements the transformer. Code samples are are to coders as pictures are to anyone else in a manual. They put things in perspective. I am having a very hard time figuring out how to implement these definitions above, and if more snippits were given, I am sure this could be simplified greatly.

Example:

To access all files once they are transformed, create a completer. A completer is invoked before files are compiled, written, and logged, but after reading, parsing, and transforming. Thus, a completer can still change files or add messages.

Where does one create a signature? A simple example would be worth another 5 paragraphs of description.

Do not output blank lines between definitions

Source and reprocessed versions at https://gist.github.com/anonymous/3bf6b6095f73702c187d

Should be able to expose style information

Probably not by default;
Information like which emphasis markers are used, asterisks or underscores;
This would highly benefit the creation of something mdlint-like.

Add test for multiple footnotes to the same definition

Seems to work currently, when typing the following in the demo.

Here’s a footnote[^1] and such[^1].

[^1]: This one’s also a footnote.

…but it seems to fail when inlining footnotes.

Blank cell in table can not be parsed

| | a|c|
|--|:----:|:---|
|a|b|c|
|a|b|c|

1:3: Incorrectly eaten value: please report this warning on http://git.io/vUYWz

Is there an "encode" method to insert escaped text into the AST?

I sometimes have to insert text-as-is into the AST, e.g. I need to insert (Taylor, Stouffer, & Meehl, 2011) in a way that this exact text turns up in the markdown rendered to HTML. For this I need to insert something like $Taylor, Stouffer, & Meehl, 2011$. Can mdast do this for me? Or should I use something like markdown-escape?

Should decode HTML entities

Such as, & in AT&T.

Print warnings & errors to stderr?

One of the examples in mdast --help is not working correctly:

$ cat readme.md
- 1
- 2
$ cat readme.md | mdast -s 'setext: true, bullet: "*"' > readme-new.md
$ cat readme-new.md
*   1
*   2

<stdin>: no issues found

Add `--file-path` cli flag for stdin

Just for some nice logging, I can imagine it to be useful by third party cli-engine users (projects which require mdast/cli);

Make it easier for plugins to add tokenizers to the parser

Looking here and here it seems like I need to have intimate knowledge of the how the parser works in order to define regular expressions to tokenize. The use case is detecting and linking URLs (auto-linking) and @mentions. Some of the URLs I'd like to turn into special node types – such as "twitter", which another plugin could render as HTML for an embedded tweet.

Ideally I could write a plugin that only has to specify a regular expression, a function which returns the node, and some rules about scope (for example, I wouldn't want to create a link for a URL that is already inside a link).

Positions of fenced vs. unfenced code

Hi. I'm in the middle of switching from marked to mdast for parsing in my mockdown library. I've run into a slight snag, however, which is that mdast gives the start position of a fenced code block as the line where the backquotes are, but gives the start position of an indented code block as the line where the actual code starts.

When I was using marked, this wasn't a problem because I could detect the absence of a lang property to know that a code block was indented rather than fenced, and the presence of the attribute (even if null) to know when I need to offset the code's line position by 1. But mdast creates the property with a null value on indented blocks as well as on fenced ones, so there is no way for me to know whether to offset the line number.

Well, technically, there is: I can count the number of lines in the code node's value, and compare this to the number of lines in the node's position range, and if it's 2 less, I know it's a fenced code block and can offset the start position of the code accordingly.

This seems a bit fragile, though, so I was wondering if there can be some official way to do this. That is, to either be able to tell the two kinds of code blocks apart (e.g. via a fenced property), or to have the position of a code block be registered as the position where the code starts, rather than the position where the code's block wrapper starts.

Heck, just allowing an empty string for lang when it's a fenced block without a language would work for me. The main point is just to have an officially supported way to be able to know what line number the actual code of a code node begins on, whether the block is indented or fenced.

Thanks!

Stringify: Preferred link-style

Inline- or reference-styls

Paragraph `mdast.stringify` creates line-breaks on return

When invoking mdast.stringify on a paragraph node and all of its child nodes, it renders the original paragraph with line breaks. Example:

This is a markdown pargraph with a [link](http://this-page-intentionally-left-blank.org) to something silly.

On stringifying this, one gets:

This is a markdown pargraph with a 
[link](http://this-page-intentionally-left-blank.org)
 to something silly.

Inverse order of attachers when passed in array

Looking at index.js, it seems that

mdast.use([plugin1, plugin2, plugin3])

is equivalent to

mdast.use(plugin3).use(plugin2).use(plugin1)

which is counter-intuitive if you ask me.

Why is it so?

Add support for CLI plugins

I can imagine other tools would want to:

Add extensions;
Add settings.

LInk parser lowercases identifiers

When I parse [][@TayEA11], the resulting AST is

{
  "type": "root",
  "children": [
    {
      "type": "paragraph",
      "children": [
        {
          "type": "linkReference",
          "identifier": "@tayea11",
          "referenceType": "full",
          "children": [],
          "position": {
            "start": {
              "line": 1,
              "column": 1
            },
            "end": {
              "line": 1,
              "column": 13
            },
            "indent": []
          }
        }
      ],
      "position": {
        "start": {
          "line": 1,
          "column": 1
        },
        "end": {
          "line": 1,
          "column": 13
        },
        "indent": []
      }
    }
  ],
  "position": {
    "start": {
      "line": 1,
      "column": 1
    },
    "end": {
      "line": 1,
      "column": 13
    }
  }
}

Is there a setting that keeps the casing of identifiers?

Transformer should not rely on mutated object

Take the following abbreviated sample of an embedded plugin:

// This will only return the first element in the .md
const processor = mdast().use(function (mdst, opt) {
  function transformer(ast, file) {
    ast.children = ast.children.slice(0, 1);
  }
  return transformer;
});
return processor.process(data);

In this example, the transformer method is expected to mutate the incoming parameters, ast and file. This had me confused for quite a while as it is commonly considered a best-practice to keep parameters immutable. Due to expecting transformer to return the tranformed objects and not seeing it in any of your plugins, I was thrown a bit. The transformer doesn't actually do anything with its returned object.

A more optimum approach would be something like this:

// This will only return the first element in the .md
const processor = mdast().use(function (mdst, opt) {
  function transformer(ast, file) {
    var mutatedAst = ast.children.slice(0, 1);
    return mutatedAst;
  }
  return transformer;
});
return processor.process(data);

While I understand two parameters are in play, they should probably be returned grouped together as an object. The point is that one should not expect the user to mutate incoming parameters and not even return a result, which is a basic in functional programming.

I know correcting this would probably break other plugins: perhaps you could schedule it in to the next major release?

Refactor breaks in CommonMark

They’re currently added as an escape node ({type: 'escape', value: '\n'}), but should be added as {type: 'break'}.

This should be accompanied by a stringily option to either use CommonMark style, or trailing-space style.

Cannot distinguish `|---|` and `|:---|`

mdast parses un-aligned table column (|---|) as left-aligned, as well as |:---|. This makes it impossible to emulate GitHub's Markdown renderer -- it renders the header of un-aligned table column center, and the body left, by leaving their text-align style unspecified:

|un-aligned(center)|center|left|
|---|:---:|:---|
|Lorem ipsum dolor sit amet|Lorem ipsum dolor sit amet|Lorem ipsum dolor sit amet|
|un-aligned(left)|center|left|

↓

un-aligned(center)	center	left
Lorem ipsum dolor sit amet	Lorem ipsum dolor sit amet	Lorem ipsum dolor sit amet
un-aligned(left)	center	left

uglify breaks mdast

I've just created a testcase-repo to replicate this one because it's very weird:

https://github.com/tmcw/mdast-uglify-bug

The jist is that UglifyJS causes mdast to fail on processing input that it would otherwise be able to process.

Should accept empty fenced code blocks

With default options, the following…

Before

```one
```

And

```two
```

Yields:

Before

````one
```

And

```two
````

Should expose footnote definitions as a node.

An object Instead of an array:

   "footnotes": {
-    "1": [
-      {
-        "type": "paragraph",
-        "children": [
-          {
-            "type": "text",
-            "value": "A footnote."
-          }
-        ]
-      }
-    ]
+    "1": {
+      "type": "footnoteDefinition",
+      "id": "1",
+      "children": [
+        {
+          "type": "paragraph",
+          "children": [
+            {
+              "type": "text",
+              "value": "A footnote"
+            }
+          ]
+        }
+      ]
+    }
   }

Add an option to output something when no messages are found

Spin off from #57

Avoid using peerDependencies

I'm trying to force mdast-react to use 0.26.2 or newer because of the recently-fixed parsing bugs. Doing so results in a

~/src/mdast-react〉npm install
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install"
npm ERR! node v0.12.6
npm ERR! npm  v2.12.1
npm ERR! code EPEERINVALID

npm ERR! peerinvalid The package mdast does not satisfy its siblings' peerDependencies requirements!
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.22.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.22.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.25.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.24.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.22.0

npm ERR! Please include the following file with any support request:
npm ERR!     /Users/tmcw/src/mdast-react/npm-debug.log

Combine with npm moving away from peerDependencies, it would be awesome to use normal ol' dependencies rather than peerDependencies to string mdast packages together.

Fix CLI-settings

Currently, it’s impossible to pass nested objects or arrays because the parsing system is way too simple. This should be changed to accepting just JSON.

Something like mdast . -s 'foo: {bar: "baz"}'?

Bullet parser does not follow common mark

CommonMark Spec

In the first place, in gfm rule, Github does not follow common mark specs. Anyway default case is no problem to use.

But in commonmark: true I think it should follow specs. How do you think?

Parse error in bullet with space before newline

I encoutered this.

> mdast.parse('- \n')

TypeError: Cannot read property 'length' of null
  at Parser.tokenizeList (/Users/mizchi/sandbox/mdast/lib/parse.js:299:25)
  at Parser.tokenizeBlock (/Users/mizchi/sandbox/mdast/lib/parse.js:1572:28)
  at Parser.parse (/Users/mizchi/sandbox/mdast/lib/parse.js:1225:14)
  at Object.parse (/Users/mizchi/sandbox/mdast/lib/parse.js:1733:50)
  at repl:1:8
  at REPLServer.replDefaults.eval (/Users/mizchi/.nodebrew/node/v0.10.33/lib/node_modules/coffee-script/lib/coffee-script/repl.js:33:42)
  at repl.js:239:12
  at Interface.<anonymous> (/Users/mizchi/.nodebrew/node/v0.10.33/lib/node_modules/coffee-script/lib/coffee-script/repl.js:66:9)
  at Interface.emit (events.js:117:20)
  at Interface._onLine (readline.js:202:10)
  at Interface._line (readline.js:531:8)
  at Interface._ttyWrite (readline.js:760:14)
  at ReadStream.onkeypress (readline.js:99:10)
  at ReadStream.emit (events.js:117:20)
  at emitKey (readline.js:1095:12)
  at ReadStream.onData (readline.js:840:14)
  at ReadStream.emit (events.js:95:17)
  at ReadStream.<anonymous> (_stream_readable.js:764:14)
  at ReadStream.emit (events.js:92:17)
  at emitReadable_ (_stream_readable.js:426:10)
  at emitReadable (_stream_readable.js:422:5)
  at readableAddChunk (_stream_readable.js:165:9)
  at ReadStream.Readable.push (_stream_readable.js:127:10)
  at TTY.onread (net.js:528:21)

Add website

mdast should have a cool website!

Maybe http://mdast.md? http://mdast.js.org (free)?, or just at GitHub (free)?

Also: should be good looking and useful.

Can't parse html tag correctly

can parse <div>div</div> and <pre>pre</pre>
<a>foo</a> and <span>foo</span>

It looks inline tag can't be parsed.

coffee> mdast.parse('<a>foo</a>').children[0]
{ type: 'paragraph',
  children: 
   [ { type: 'html',
       value: '<a>',
       position: [Object] },
     { type: 'text',
       value: 'foo',
       position: [Object] },
     { type: 'html',
       value: '</a>',
       position: [Object] } ],
  position: 
   { start: { line: 1, column: 1 },
     end: { line: 1, column: 11 } } }

Add support for tab characters

To enable CommonMark’s tab expansion by dependants.

Create mdast-html

One of the major things to do is create a plug-in which compiles an mdast AST into HTML.

This plug-in would be a great way to test how applicable the AST is for heavy duty transpiling into another language.

Nested tasklist

Here is trivial difference.

- [x] aaa
  - [ ] bbb
  - [ ] ccc

aaa
- bbb
- ccc

It looks mdast doesn't handle nested tasklist.

Add list-item-indent stringification option

...which defaults to "tab-size", for greatest support, but also accepts "mixed" and "1".

Supersedes GH-30.

Ways to use global mdast plugins with CLI?

What is the preferred way of using globally installed plugins with CLI?

$ echo "# hello" | mdast -u mdast-html
# hello

<stdin>
        1:1  error    Error: Cannot find module 'mdast-html'

It just worked before. I found two ways of working around it.

Including $(npm root -g) in $NODE_PATH:

$ echo "# hello" | env NODE_PATH="$(npm root -g):$NODE_PATH" mdast -u mdast-html

Specifying full path to a plugin:

$ echo "# hello" | mdast -u "$(npm root -g)/mdast-html"

Both ways are somewhat clumsy. Is there a simpler way of doing that or some relevant configuration option?

Stringify: Preferred fence style

Tildes (~) or ticks (```).

Add missing/invalid footnotes/links to ast

This would enable mdlint-like tools to raise an issue when a definition is forgotten.

Why 3 spaces after list bullet?

* list item

add 2 space use stringify:

-    list item

I just want 1 space after list bullet.

But I can't find any options from https://github.com/wooorm/mdast/blob/master/doc/options.md#list-item-bullets

and I found this in source code:

$ grep -r "'   '" node_modules/mdast
node_modules/mdast/node_modules/concat-stream/node_modules/readable-stream/node_modules/core-util-is/float.patch:-            return '   ' + line;

Fix demo

The current demo is horrible. It’s slow, not that useful, and more.

It should be good looking;
It should use a faster editor;
it should be user-friendly.

Add `style` properties on nodes

Currently, only global stringification settings, such as bullet, are supported. I’d like to extend stringification style to per-node settings. Thus, a list-item can have a style.bullet = ‘*' property.

Something like:

heading nodes have an enum headingStyle property set to "atx",
"atx-closed", or "setext";
tables nodes have a boolean looseTable property;
tables nodes have a boolean spacedTable property;
code nodes have a nullable enum fenceMarker property set to ""or "~"`;
code nodes have a boolean fences property;
listItem nodes have an enum listItemBullet property set to *, -,
+, ., or ).
listItem nodes have a nullable listItemIndex property set to an integer;
horizontalRule nodes have an enum ruleMarker property set to *, -, or
_.
horizontalRule nodes have a boolean ruleRepetition property;
horizontalRule nodes have a boolean ruleSpaces property;
strong and emphasis nodes have an enum emphasisMarker property
set to _ or *.

These should be overwritten when a setting is given to mdast (this allows
mdast to fix code-style), but overwrite the default values noted in
mdast.process()

Supersedes GH-30.

Want a "don't merge HTML nodes" option

Sometimes merged HTML nodes get in my way when transforming AST into vertual DOM.

We can't just split a seemingly-merged HTML node by /\n\n/ because doing so breaks <div>text\n\n</div>[1] in <div>text and </div>. Though I'm fine with nodes whose value is simple tag (<div>, </div>) or balanced fragment (<div>text</div>), something like <div>text is not very acceptable.

[1] it can be obtained by parsing this Markdown document:

<div>text

</div>

Watching files

Hi, thanks for your work. I'm trying to use mdast-lint, and am thinking it'd be wonderful to have something like a --watch option built into mdast.

Github-flavored markdown html incompatibility

FYI, mdast does not parse HTML the way Github itself does. More specifically, it doesn't parse invalid HTML the same way Github does, or at least invalid HTML comments. If you have an HTML comment containing --, Github ignores this invalidity and still treats the overall comment as HTML and doesn't turn it into a paragraph.

I would say this is a bug rather than a feature, since no user-facing tool I've tried (e.g. Marked 2, MarkdownPad, MacDown, etc.) ever insists on HTML being valid HTML and reverting it to a paragraph otherwise. Likewise, of the parsers I've tried, mdast seems to be unique in this respect.

Store all links in central place, not just referenced links

This would make sure just one reference is created when stringifying with referenceLinks: true:

[a link][link] and [another link](http://example.com)

[link]: http://example.com

Yields:

[a link][1] and [another link][2]

[1]: http://example.com
[2]: http://example.com

range/location support?

Hi!

I notice that CommonMark's AST has been implemented location info(but it is unstable).

{ t: 'Document',
  start_line: 1,
  start_column: 1,
  end_line: 20,
  children: []
}

An example is azu/commonmark-ast-sandbox.

Do you have any plans to support range or location on AST(like Esprima)?

Extending grammar

How would one extend the parsers grammar? I understand that I can create a plugin and create a parser that inherits from mdast's parser, but writing the tokenizer and whatever else is needed is unclear.

Do you mind helping me out with one example?

Let's say I have some custom markdown that looks like this:

+++small

SOME TEXT CONTENT

+++

How would one add this grammar to the parser such that content enclosed in +++ is marked as children? For example:

{
  type: MY_CUSTOM_TYPE, // captured by enclosing +++
  size: 'small',
  children: [{
    type: 'text'
    ....
  }]
}

I'm open to ideas if you have a better idea for how the ast should look. You're certainly more expert than I am. :)

Thanks for your time.

Lifecycle events for plugins

Hey! Great work on mdast, it's really rad. I'm using it to set up a build system for the Node.js documentation WG. As part of that effort, I started building count-docula, which currently consumes mdast and presents its own CLI. If possible, I'd love to make count-docula just another plugin that mdast consumes.

What count-docula is currently doing:

Given a directory, it collects every markdown file within that directory.
- This duplicates work from mdast's CLI.
For each markdown file, the plugin looks for three directives (import, export, and anchor.)
- Anchors are user-defined ids that are assigned to the closet parent block element — they're there so that heading text can be changed independent of links, and so that links can be tracked and verified across documents.
- Once all anchors are found, then all exports are determined. These are links that will be made available when "importing" the current document.
- Finally, the import directives are hit.
  - Importantly, import directives are able to bring in documents from outside the original working set.
The plugin artificially blocks process from completing (using a function passed as an option) until all documents have been visited, and their anchors, exports, and imports declared.
- Warnings are added at this stage for unknown|duplicate reference link definitions, bad imports, and bad exports.
Once all documents have been visited & resolved, the plugin continues to the "render" or "test" task.
- The test task augments lint with a test checking to see that no documents in the original working set are "orphaned" — only one document in the original working set may have no incoming links.
  - Otherwise, this step replicates much of mdast's CLI machinery.
- The build task accepts a template for rendering the document into, but otherwise works the same as mdast's CLI machinery.

In order to turn count-docula into a plugin:

mdast's plugin API would need a lifecycle event for "the CLI has collected all of the docs in this dir." That event may be asynchronous, so mdast should delegate to the plugin before continuing (via a callback or other method.)
The directory set API may have to be capable of adding new source md document paths and making the resulting ASTs available to the plugin.

Something like:

module.exports = attacher(md, opts) {
  md.onDocsCollected((workingSet, next) => {
    // workingSet is an "array-ish" set of all of the `File` objects that
    // mdast's cli found.
    workingSet.parseEach(({filename, ast}, next) => {
      // search for documents to import from the AST
      workingSet.add('some/new/path')
      next()
    }, function(err) {
      workingSet.forEach(({filename, ast}) => {
        // resolve all of the links, then let `mdast` know that
        // the workingSet's files are ready to be rendered / tested / etc.
        // if the workingSet's files were parsed, use those asts
        // instead of parsing again. Otherwise parse them.
        next()
      })
    })
  })
}

Of course, there's zero pressure to do this — or if you'd like I would be happy to take a stab at implementing it. A workingSet API seems like a natural place to add meta information for other plugins, as well — for example, providing a template/framing API for mdast-html.

Thanks again, and great work on mdast!

Should support plugins

Probably ware, retext, duo, like