remarkjs / remark Goto Github PK
View Code? Open in Web Editor NEWmarkdown processor powered by plugins part of the @unifiedjs collective
Home Page: https://remark.js.org
License: MIT License
markdown processor powered by plugins part of the @unifiedjs collective
Home Page: https://remark.js.org
License: MIT License
var mdast = require('mdast');
var emptyString = "";
var ast = mdast.parse(emptyString);
console.log(JSON.stringify(ast));
/*
{
"type": "root",
"children": [],
"position": {
"start": {
"line": 1,
"column": 1
}
}
}
*/
// position.end is undeifned
Example : http://requirebin.com/?gist=ad3c34ef897338867009
Expected:
{
"type": "root",
"children": [],
"position": {
"start": {
"line": 1,
"column": 1
},
"end": {
"line": 1,
"column": 1
}
}
}
Actual:
position.end
is undeifned
mdast is currently on a semver "unstable" 0.x.x version. Is that intentional?
It seems to have tests with full coverage, no open issues, no recent API breaks (though I haven't checked very carefully), and a little bunch of dependent packages.
What's blocking a stable release?
This would be extra powerful with plugins (like duo)
https://github.com/wooorm/mdast/blob/master/doc/mdastplugin.3.md
In this page, I am trying very hard to understand how to create and manipulate plugins. While I know concepts have been organized in a very able way, you are introducing a log of new terms as attacher
, transformer
and completer
.
There is but one code example on creating a plugin, and this only implements the transformer
. Code samples are are to coders as pictures are to anyone else in a manual. They put things in perspective. I am having a very hard time figuring out how to implement these definitions above, and if more snippits were given, I am sure this could be simplified greatly.
Example:
To access all files once they are transformed, create a completer. A completer is invoked before files are compiled, written, and logged, but after reading, parsing, and transforming. Thus, a completer can still change files or add messages.
Where does one create a signature? A simple example would be worth another 5 paragraphs of description.
Source and reprocessed versions at https://gist.github.com/anonymous/3bf6b6095f73702c187d
Seems to work currently, when typing the following in the demo.
Here’s a footnote[^1] and such[^1].
[^1]: This one’s also a footnote.
…but it seems to fail when inlining footnotes.
| | a|c|
|--|:----:|:---|
|a|b|c|
|a|b|c|
1:3: Incorrectly eaten value: please report this warning on http://git.io/vUYWz
I sometimes have to insert text-as-is into the AST, e.g. I need to insert (Taylor, Stouffer, & Meehl, 2011)
in a way that this exact text turns up in the markdown rendered to HTML. For this I need to insert something like \(Taylor, Stouffer, & Meehl, 2011\)
. Can mdast do this for me? Or should I use something like markdown-escape?
Such as, &
in AT&T
.
One of the examples in mdast --help
is not working correctly:
$ cat readme.md
- 1
- 2
$ cat readme.md | mdast -s 'setext: true, bullet: "*"' > readme-new.md
$ cat readme-new.md
* 1
* 2
<stdin>: no issues found
Just for some nice logging, I can imagine it to be useful by third party cli-engine users (projects which require mdast/cli
);
Looking here and here it seems like I need to have intimate knowledge of the how the parser works in order to define regular expressions to tokenize. The use case is detecting and linking URLs (auto-linking) and @mentions. Some of the URLs I'd like to turn into special node types – such as "twitter", which another plugin could render as HTML for an embedded tweet.
Ideally I could write a plugin that only has to specify a regular expression, a function which returns the node, and some rules about scope (for example, I wouldn't want to create a link for a URL that is already inside a link).
Hi. I'm in the middle of switching from marked
to mdast
for parsing in my mockdown
library. I've run into a slight snag, however, which is that mdast gives the start position of a fenced code block as the line where the backquotes are, but gives the start position of an indented code block as the line where the actual code starts.
When I was using marked
, this wasn't a problem because I could detect the absence of a lang
property to know that a code block was indented rather than fenced, and the presence of the attribute (even if null) to know when I need to offset the code's line position by 1. But mdast
creates the property with a null value on indented blocks as well as on fenced ones, so there is no way for me to know whether to offset the line number.
Well, technically, there is: I can count the number of lines in the code node's value
, and compare this to the number of lines in the node's position range, and if it's 2 less, I know it's a fenced code block and can offset the start position of the code accordingly.
This seems a bit fragile, though, so I was wondering if there can be some official way to do this. That is, to either be able to tell the two kinds of code blocks apart (e.g. via a fenced
property), or to have the position of a code block be registered as the position where the code starts, rather than the position where the code's block wrapper starts.
Heck, just allowing an empty string for lang
when it's a fenced block without a language would work for me. The main point is just to have an officially supported way to be able to know what line number the actual code of a code node begins on, whether the block is indented or fenced.
Thanks!
Inline- or reference-styls
When invoking mdast.stringify
on a paragraph node and all of its child nodes, it renders the original paragraph with line breaks. Example:
This is a markdown pargraph with a [link](http://this-page-intentionally-left-blank.org) to something silly.
On stringifying this, one gets:
This is a markdown pargraph with a
[link](http://this-page-intentionally-left-blank.org)
to something silly.
Looking at index.js, it seems that
mdast.use([plugin1, plugin2, plugin3])
is equivalent to
mdast.use(plugin3).use(plugin2).use(plugin1)
which is counter-intuitive if you ask me.
Why is it so?
I can imagine other tools would want to:
When I parse [][@TayEA11]
, the resulting AST is
{
"type": "root",
"children": [
{
"type": "paragraph",
"children": [
{
"type": "linkReference",
"identifier": "@tayea11",
"referenceType": "full",
"children": [],
"position": {
"start": {
"line": 1,
"column": 1
},
"end": {
"line": 1,
"column": 13
},
"indent": []
}
}
],
"position": {
"start": {
"line": 1,
"column": 1
},
"end": {
"line": 1,
"column": 13
},
"indent": []
}
}
],
"position": {
"start": {
"line": 1,
"column": 1
},
"end": {
"line": 1,
"column": 13
}
}
}
Is there a setting that keeps the casing of identifiers?
Take the following abbreviated sample of an embedded plugin:
// This will only return the first element in the .md
const processor = mdast().use(function (mdst, opt) {
function transformer(ast, file) {
ast.children = ast.children.slice(0, 1);
}
return transformer;
});
return processor.process(data);
In this example, the transformer
method is expected to mutate the incoming parameters, ast
and file
. This had me confused for quite a while as it is commonly considered a best-practice to keep parameters immutable. Due to expecting transformer
to return
the tranformed objects and not seeing it in any of your plugins, I was thrown a bit. The transformer doesn't actually do anything with its returned object.
A more optimum approach would be something like this:
// This will only return the first element in the .md
const processor = mdast().use(function (mdst, opt) {
function transformer(ast, file) {
var mutatedAst = ast.children.slice(0, 1);
return mutatedAst;
}
return transformer;
});
return processor.process(data);
While I understand two parameters are in play, they should probably be returned grouped together as an object. The point is that one should not expect the user to mutate incoming parameters and not even return a result, which is a basic in functional programming.
I know correcting this would probably break other plugins: perhaps you could schedule it in to the next major release?
They’re currently added as an escape node ({type: 'escape', value: '\n'}
), but should be added as {type: 'break'}
.
This should be accompanied by a stringily option to either use CommonMark style, or trailing-space style.
mdast parses un-aligned table column (|---|
) as left-aligned, as well as |:---|
. This makes it impossible to emulate GitHub's Markdown renderer -- it renders the header of un-aligned table column center, and the body left, by leaving their text-align
style unspecified:
|un-aligned(center)|center|left|
|---|:---:|:---|
|Lorem ipsum dolor sit amet|Lorem ipsum dolor sit amet|Lorem ipsum dolor sit amet|
|un-aligned(left)|center|left|
↓
un-aligned(center) | center | left |
---|---|---|
Lorem ipsum dolor sit amet | Lorem ipsum dolor sit amet | Lorem ipsum dolor sit amet |
un-aligned(left) | center | left |
I've just created a testcase-repo to replicate this one because it's very weird:
https://github.com/tmcw/mdast-uglify-bug
The jist is that UglifyJS
causes mdast to fail on processing input that it would otherwise be able to process.
With default options, the following…
Before
```one
```
And
```two
```
Yields:
Before
````one
```
And
```two
````
An object Instead of an array:
"footnotes": {
- "1": [
- {
- "type": "paragraph",
- "children": [
- {
- "type": "text",
- "value": "A footnote."
- }
- ]
- }
- ]
+ "1": {
+ "type": "footnoteDefinition",
+ "id": "1",
+ "children": [
+ {
+ "type": "paragraph",
+ "children": [
+ {
+ "type": "text",
+ "value": "A footnote"
+ }
+ ]
+ }
+ ]
+ }
}
Spin off from #57
I'm trying to force mdast-react to use 0.26.2 or newer because of the recently-fixed parsing bugs. Doing so results in a
~/src/mdast-react〉npm install
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install"
npm ERR! node v0.12.6
npm ERR! npm v2.12.1
npm ERR! code EPEERINVALID
npm ERR! peerinvalid The package mdast does not satisfy its siblings' peerDependencies requirements!
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.22.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.22.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.25.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.24.0
npm ERR! peerinvalid Peer [email protected] wants mdast@>=0.22.0
npm ERR! Please include the following file with any support request:
npm ERR! /Users/tmcw/src/mdast-react/npm-debug.log
Combine with npm moving away from peerDependencies, it would be awesome to use normal ol' dependencies rather than peerDependencies to string mdast packages together.
Currently, it’s impossible to pass nested objects or arrays because the parsing system is way too simple. This should be changed to accepting just JSON.
Something like mdast . -s 'foo: {bar: "baz"}'
?
In the first place, in gfm rule, Github does not follow common mark specs. Anyway default case is no problem to use.
But in commonmark: true I think it should follow specs. How do you think?
I encoutered this.
> mdast.parse('- \n')
TypeError: Cannot read property 'length' of null
at Parser.tokenizeList (/Users/mizchi/sandbox/mdast/lib/parse.js:299:25)
at Parser.tokenizeBlock (/Users/mizchi/sandbox/mdast/lib/parse.js:1572:28)
at Parser.parse (/Users/mizchi/sandbox/mdast/lib/parse.js:1225:14)
at Object.parse (/Users/mizchi/sandbox/mdast/lib/parse.js:1733:50)
at repl:1:8
at REPLServer.replDefaults.eval (/Users/mizchi/.nodebrew/node/v0.10.33/lib/node_modules/coffee-script/lib/coffee-script/repl.js:33:42)
at repl.js:239:12
at Interface.<anonymous> (/Users/mizchi/.nodebrew/node/v0.10.33/lib/node_modules/coffee-script/lib/coffee-script/repl.js:66:9)
at Interface.emit (events.js:117:20)
at Interface._onLine (readline.js:202:10)
at Interface._line (readline.js:531:8)
at Interface._ttyWrite (readline.js:760:14)
at ReadStream.onkeypress (readline.js:99:10)
at ReadStream.emit (events.js:117:20)
at emitKey (readline.js:1095:12)
at ReadStream.onData (readline.js:840:14)
at ReadStream.emit (events.js:95:17)
at ReadStream.<anonymous> (_stream_readable.js:764:14)
at ReadStream.emit (events.js:92:17)
at emitReadable_ (_stream_readable.js:426:10)
at emitReadable (_stream_readable.js:422:5)
at readableAddChunk (_stream_readable.js:165:9)
at ReadStream.Readable.push (_stream_readable.js:127:10)
at TTY.onread (net.js:528:21)
mdast should have a cool website!
Maybe http://mdast.md
? http://mdast.js.org
(free)?, or just at GitHub (free)?
Also: should be good looking and useful.
<div>div</div>
and <pre>pre</pre>
<a>foo</a>
and <span>foo</span>
It looks inline tag can't be parsed.
coffee> mdast.parse('<a>foo</a>').children[0]
{ type: 'paragraph',
children:
[ { type: 'html',
value: '<a>',
position: [Object] },
{ type: 'text',
value: 'foo',
position: [Object] },
{ type: 'html',
value: '</a>',
position: [Object] } ],
position:
{ start: { line: 1, column: 1 },
end: { line: 1, column: 11 } } }
To enable CommonMark’s tab expansion by dependants.
One of the major things to do is create a plug-in which compiles an mdast AST into HTML.
This plug-in would be a great way to test how applicable the AST is for heavy duty transpiling into another language.
Here is trivial difference.
- [x] aaa
- [ ] bbb
- [ ] ccc
It looks mdast doesn't handle nested tasklist.
What is the preferred way of using globally installed plugins with CLI?
$ echo "# hello" | mdast -u mdast-html
# hello
<stdin>
1:1 error Error: Cannot find module 'mdast-html'
It just worked before. I found two ways of working around it.
$(npm root -g)
in $NODE_PATH
:$ echo "# hello" | env NODE_PATH="$(npm root -g):$NODE_PATH" mdast -u mdast-html
$ echo "# hello" | mdast -u "$(npm root -g)/mdast-html"
Both ways are somewhat clumsy. Is there a simpler way of doing that or some relevant configuration option?
Tildes (~
) or ticks (```).
This would enable mdlint-like tools to raise an issue when a definition is forgotten.
* list item
add 2 space use stringify:
- list item
I just want 1 space after list bullet.
But I can't find any options from https://github.com/wooorm/mdast/blob/master/doc/options.md#list-item-bullets
and I found this in source code:
$ grep -r "' '" node_modules/mdast
node_modules/mdast/node_modules/concat-stream/node_modules/readable-stream/node_modules/core-util-is/float.patch:- return ' ' + line;
The current demo is horrible. It’s slow, not that useful, and more.
Currently, only global stringification settings, such as bullet
, are supported. I’d like to extend stringification style to per-node settings. Thus, a list-item can have a style.bullet = ‘*'
property.
Something like:
heading
nodes have an enum headingStyle
property set to "atx"
,"atx-closed"
, or "setext"
;tables
nodes have a boolean looseTable
property;tables
nodes have a boolean spacedTable
property;code
nodes have a nullable enum fenceMarker
property set to "
"or
"~"`;code
nodes have a boolean fences
property;listItem
nodes have an enum listItemBullet
property set to *
, -
,+
, .
, or )
.listItem
nodes have a nullable listItemIndex
property set to an integer;horizontalRule
nodes have an enum ruleMarker
property set to *
, -
, or_
.horizontalRule
nodes have a boolean ruleRepetition
property;horizontalRule
nodes have a boolean ruleSpaces
property;strong
and emphasis
nodes have an enum emphasisMarker
property_
or *
.These should be overwritten when a setting is given to mdast (this allows
mdast to fix code-style), but overwrite the default values noted in
mdast.process()
Supersedes GH-30.
Sometimes merged HTML nodes get in my way when transforming AST into vertual DOM.
We can't just split a seemingly-merged HTML node by /\n\n/
because doing so breaks <div>text\n\n</div>
[1] in <div>text
and </div>
. Though I'm fine with nodes whose value is simple tag (<div>
, </div>
) or balanced fragment (<div>text</div>
), something like <div>text
is not very acceptable.
[1] it can be obtained by parsing this Markdown document:
<div>text
</div>
Hi, thanks for your work. I'm trying to use mdast-lint
, and am thinking it'd be wonderful to have something like a --watch
option built into mdast
.
FYI, mdast does not parse HTML the way Github itself does. More specifically, it doesn't parse invalid HTML the same way Github does, or at least invalid HTML comments. If you have an HTML comment containing --
, Github ignores this invalidity and still treats the overall comment as HTML and doesn't turn it into a paragraph.
I would say this is a bug rather than a feature, since no user-facing tool I've tried (e.g. Marked 2, MarkdownPad, MacDown, etc.) ever insists on HTML being valid HTML and reverting it to a paragraph otherwise. Likewise, of the parsers I've tried, mdast seems to be unique in this respect.
This would make sure just one reference is created when stringifying with referenceLinks: true
:
[a link][link] and [another link](http://example.com)
[link]: http://example.com
Yields:
[a link][1] and [another link][2]
[1]: http://example.com
[2]: http://example.com
Hi!
I notice that CommonMark's AST has been implemented location info(but it is unstable).
{ t: 'Document',
start_line: 1,
start_column: 1,
end_line: 20,
children: []
}
An example is azu/commonmark-ast-sandbox.
Do you have any plans to support range
or location
on AST(like Esprima)?
How would one extend the parsers grammar? I understand that I can create a plugin and create a parser that inherits from mdast's parser, but writing the tokenizer
and whatever else is needed is unclear.
Do you mind helping me out with one example?
Let's say I have some custom markdown that looks like this:
+++small
SOME TEXT CONTENT
+++
How would one add this grammar to the parser such that content enclosed in +++
is marked as children
? For example:
{
type: MY_CUSTOM_TYPE, // captured by enclosing +++
size: 'small',
children: [{
type: 'text'
....
}]
}
I'm open to ideas if you have a better idea for how the ast should look. You're certainly more expert than I am. :)
Thanks for your time.
Hey! Great work on mdast, it's really rad. I'm using it to set up a build system for the Node.js documentation WG. As part of that effort, I started building count-docula, which currently consumes mdast
and presents its own CLI. If possible, I'd love to make count-docula just another plugin that mdast consumes.
What count-docula is currently doing:
import
, export
, and anchor
.)
process
from completing (using a function passed as an option) until all documents have been visited, and their anchors, exports, and imports declared.
test
task augments lint
with a test checking to see that no documents in the original working set are "orphaned" — only one document in the original working set may have no incoming links.
mdast
's CLI machinery.build
task accepts a template for rendering the document into, but otherwise works the same as mdast
's CLI machinery.In order to turn count-docula into a plugin:
mdast
's plugin API would need a lifecycle event for "the CLI has collected all of the docs in this dir." That event may be asynchronous, so mdast
should delegate to the plugin before continuing (via a callback or other method.)md
document paths and making the resulting ASTs available to the plugin.Something like:
module.exports = attacher(md, opts) {
md.onDocsCollected((workingSet, next) => {
// workingSet is an "array-ish" set of all of the `File` objects that
// mdast's cli found.
workingSet.parseEach(({filename, ast}, next) => {
// search for documents to import from the AST
workingSet.add('some/new/path')
next()
}, function(err) {
workingSet.forEach(({filename, ast}) => {
// resolve all of the links, then let `mdast` know that
// the workingSet's files are ready to be rendered / tested / etc.
// if the workingSet's files were parsed, use those asts
// instead of parsing again. Otherwise parse them.
next()
})
})
})
}
Of course, there's zero pressure to do this — or if you'd like I would be happy to take a stab at implementing it. A workingSet
API seems like a natural place to add meta information for other plugins, as well — for example, providing a template/framing API for mdast-html.
Thanks again, and great work on mdast!
Probably ware, retext, duo, like
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.