russross / blackfriday Goto Github PK
View Code? Open in Web Editor NEWBlackfriday: a markdown processor for Go
License: Other
Blackfriday: a markdown processor for Go
License: Other
Hello,
I have some uncommitted code in my GOPATH that performs a task that I think is generally useful, and I want to discuss the most appropriate way to move it "upstream" (so I can finally start making some pull requests).
Specifically, the task is very narrow and well defined:
go get
, import and start using.Effectively, the output should be the same HTML (or equivalent HTML that produces the same visual result) as what GitHub Markdown API produces. See https://developer.github.com/v3/markdown/#render-a-markdown-document-in-raw-mode for reference.
I want to provide a Go function that is very direct and doesn't require configuration:
// Best effort at generating GitHub Flavored Markdown-like HTML output locally.
func MarkdownGfm(input []byte) []byte
func WriteMarkdownGfm(w io.Writer, input []byte)
The closest I'm able to come to solving that task with existing Go code is by using blackfriday with custom extensions and html flags for blackfriday.HtmlRenderer
, see here.
However, the main missing feature is code highlighting for fenced code blocks. It is possible to rely on client-side JavaScript code to apply that in post-processing, but the GitHub Markdown API does this as part of Markdown generation and that's what I want too.
To make that possible, blackfriday would need to be modified. I see three approaches:
func MarkdownGfm(input []byte) []byte
to blackfriday directly, and import other packages that are required for it to work.
func MarkdownGfm(input []byte) []byte
to exist.
blackfriday.HtmlRenderer
type and makes the required changes there.
Given the trend that I'm seeing from the discussion in #90, IMO it's best to go with option 2.
The change to blackfriday would be an addition of an exported interface and one new field added to HtmlRendererParameters
struct. Best visualized with a diff:
+type BlockCodeHighlighter interface {
+ // Highlights text using lang syntax and returns highlighted HTML output.
+ BlockCodeHighlight(text []byte, lang string) []byte
+}
+
type HtmlRendererParameters struct {
// Prepend this text to each relative URL.
AbsolutePrefix string
// Add this text to each footnote anchor, to ensure uniqueness.
FootnoteAnchorPrefix string
// Show this text inside the <a> tag for a footnote return link, if the
// HTML_FOOTNOTE_RETURN_LINKS flag is enabled. If blank, the string
// <sup>[return]</sup> is used.
FootnoteReturnLinkContents string
+ // If not nil, this is used to highlight contents of code blocks.
+ BlockCodeHighlighter BlockCodeHighlighter
}
Update: On second thought, I think I'd just use a BlockCodeHighlighter func (text []byte, lang string) []byte
instead of an interface. Nothing to gain from it being an interface.
That's a rough draft (typed by hand based on my hacky uncommited code), feedback is welcome.
Then, to fully implement MarkdownGfm()
, it's just a matter of creating a bunch of highlighters for various languages. I imagine a high level highlighter that switches based on the lang
parameter (and maybe this part can be added to blackfriday), and then uses lower level language-specific highlighters. syntaxhighlight package can be used (/cc @sqs), or similar highlighters that are language specific (I've made one for diff
, and can use go/scanner
for highlighting Go code specifically). These probably best live in their own packages.
Looking forward to hearing feedback on this proposal, thanks!
Edit: I forgot to mention there will be a css component that contains the styles for classes. The highlighters rely on those. It's just one more detail to consider for the final design/implementation.
Headings are not correctly parsed when using the #
syntax without trailing whitespace.
Example:
# correct
#incorrect
This should generate (checked against the Daring Fireball Markdown processor):
<h1>correct</h1>
<h1>incorrect</h1>
but creates:
<h1>correct</h1>
<p>#incorrect</p>
I'm putting together a change to automatically convert all relative links in a Markdown file into absolute links during the render process. Currently this involves adding a new flag HTML_ABSOLUTE_LINKS and changing the prototype of HtmlRenderer to func HtmlRenderer(flags int, title string, css string, absolutePrefix string) Renderer
. My particular application is generating RSS feeds from Markdown, where relative links won't correctly reference other files on the original server.
This change is fine for my purposes, but if I submit a pull request for it, then I'm concerned about changing the API and causing compile errors, albeit easily fixable ones, for all existing users.
One possibility is to create an HtmlRendererWithAbsolutePrefix function which would have the new prototype. But I worry that it would encourage the proliferation of a bunch of HtmlRendererWithSomeExtraArgument
functions, which isn't too clean.
Any thoughts on this? Of course, if you don't want this change at all, then I can ust keep it in my fork and the issue is moot :)
package main
import (
"os"
"github.com/russross/blackfriday"
)
func main() {
text := []byte(`Hello <span title="<">there</span> world.`)
os.Stdout.Write(blackfriday.MarkdownBasic(text))
text = []byte(`Hello <span title=">">there</span> world.`)
os.Stdout.Write(blackfriday.MarkdownBasic(text))
}
Output:
<p>Hello <span title="<">there</span> world.</p>
<p>Hello <span title=">">there</span> world.</p>
I'm having a hard time figuring out if <span title=">">
is valid HTML, or if the title attribute value should be escaped ala <span title=">">
. Browsers obviously accept both, and I think both are valid, but not sure.
If it is valid HTML, then perhaps blackfriday output for 2nd line should be:
<p>Hello <span title=">">there</span> world.</p>
But, this may be hard to fix and not worth fixing? As far as I can tell, even GitHub's internal Markdown renderer has the same bug/behavior.
GitHub Flavored Markdown has a section on fenced code blocks, where it says "Keep in mind that both types of code blocks need to have a blank line before them".
github.com, however, ignores that statement, and renders markdown without such a blank line correctly (i.e. as if the blank line were present). blackfriday
currently does not follow that behavior (tested with blackfriday.MarkdownCommon
, and renders such markdown in an odd manner. Should it be changed to match that of GitHub?
The follow markdown reproduces the issue:
some text without a blank line afterwards
```Go
someCode()
```
For a larger example, see this Markdown source [1], how github.com displays it [2], and how blackfriday.MarkdownCommon
renders it [3].
[1] - https://raw.github.com/shurcooL/go-goon/8ddcefebec68d2dbcbac5225bf8760fbd4598c47/README.md
[2] - https://github.com/shurcooL/go-goon/blob/8ddcefebec68d2dbcbac5225bf8760fbd4598c47/README.md
[3] - http://dl.dropboxusercontent.com/u/8554242/available-for-2-weeks/fenced_code_blocks_blackfriday.html
Any thoughts on supporting PDF output as well?
It would be nice to have headings (h1, h2, ...) receive id
attribute, to have possibility to refer them in generated html. id
could be generated from content of heading, or maybe with some callback...
I've recently stumbled upon https://github.com/microcosm-cc/bluemonday, which seems to be a Go library for HTML sanitizing.
Would it be a good idea or a bad idea to use it?
I haven't really looked at it closely yet, but I just wanted to start the discussion here.
It would be really nice is to have the anchor URLs be generated as a lowercase, hyphenated version of the title text. Something like:
# Welcome to my new blog article
--> website.com/posts/page#welcome-to-my-new-blog-article
I'm currently using your library as apart of Hugo.
Here is my original post:
With Go 1 and git installed:
go get github.com/russross/blackfriday
will download, compile, and install the package into your `$GOPATH`
directory hierarchy. Alternatively, you can import it into a
project:
import "github.com/russross/blackfriday"
and when you build that project with `go build`, blackfriday will be
downloaded and installed automatically.
The "alternatively" path just disappointed a newbie. Of course, go build
doesn't download the missing remote packages.
If links are left in a list as such:
it renders as a paragraph with links and dashes (or whatever list item marker was used).
As near as I can tell, blackfriday.MarkdownCommon()
renders table elements (if present) and then strips them via the HTML_SANITIZE_OUTPUT
HtmlRenderer
flag. Possibly related to #64.
package main
import (
"fmt"
"github.com/russross/blackfriday"
)
func main() {
input := `Name | Age
--------|------
Bob | 27
Alice | 23
`
output := blackfriday.MarkdownCommon([]byte(input))
fmt.Println(string(output))
}
<table>
<thead>
<tr>
<th>Name</th>
<th>Age</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bob</td>
<td>27</td>
</tr>
<tr>
<td>Alice</td>
<td>23</td>
</tr>
</tbody>
</table>
Name
Age
Bob
27
Alice
23
I noticed that if you comment out htmlFlags |= HTML_SANITIZE_OUTPUT
on line 239 in markdown.go, you get the expected result.
It's usefully to provide a Renderer that renders markdown to raw text which can be embeded safely in the page.
The code in this gist used to render relative links and images correctly:
https://gist.github.com/ancientlore/fa1a084def32e0828a33
Now it skips them. It doesn't seem like it would be by design.
I have a struct like this:
type Page struct {
Content string
}
then I read a markdown file and assign to a variable:
data, err := ioutil.ReadFile("a.md")
lines = string(data)
page.Content = markdownRender([]byte(lines))
the markdown file like this:
##Hello World
###Holo Go
and then I put it into markdown render function and return a string value:
func markdownRender(content []byte) string {
htmlFlags := 0
htmlFlags |= blackfriday.HTML_USE_SMARTYPANTS
htmlFlags |= blackfriday.HTML_SMARTYPANTS_FRACTIONS
renderer := blackfriday.HtmlRenderer(htmlFlags, "", "")
extensions := 0
extensions |= blackfriday.EXTENSION_NO_INTRA_EMPHASIS
extensions |= blackfriday.EXTENSION_TABLES
extensions |= blackfriday.EXTENSION_FENCED_CODE
extensions |= blackfriday.EXTENSION_AUTOLINK
extensions |= blackfriday.EXTENSION_STRIKETHROUGH
extensions |= blackfriday.EXTENSION_SPACE_HEADERS
return string(blackfriday.Markdown(content, renderer, extensions))
}
and finally I call the page.Content in a html template and generate a static html:
{{.Content}}
but in the generated html it show in the browser(I try it in the chrome and safari) like this(not the source code,It just show in the page):
<p>##Hello World ###Holo Go </p>
but I want it like this
Hello World
Holo Go
So,how can i do this
Just rename utf8 to unicode/utf8 on markdown.go
.
Both gofix
or goinstall -fix=true...
work nicely.
*A[B](C)* [D](E)
gives
<p>*A<a href="C">B</a>* <a href="E">D</a></p>
but I was expecting
<p><em>A<a href="C">B</a></em> <a href="E">D</a></p>
Would it be possible to get a new version tag? ie, v1.2
or even v2.0
depending on how many changes have gone in since that 2011 v1.1
tag? 😄
As for now this can be circumvented by shadowing the header and footer methods with stubs, but it'd be more convenient if there was an option similar to the "generate complete html document" thing for html output.
I'm noticing more and more that for various input blackfriday can produce HTML that is invalidly nested and that may break the layout of the page on which it appears.
Input:
<blockquote>
A list:
1. Foo</blockquote>
1. Bar
Which is a valid HTML blockquote containing a list with Foo, and outside of the blockquote a list with Bar.
Expected (something like):
<blockquote>
A list:
<ol>
<li>Foo<br/></li>
</ol>
</blockquote>
<ol>
<li>Bar</li>
</ol>
Output:
<blockquote><br/>
A list:</p>
<ol>
<li>Foo</blockquote><br/></li>
<li>Bar<br/></li>
</ol>
Note that the blockquote is now terminated within the list item, forcing the browser to close tags wherever it feels suitable, which then introduces phantom tags which will change the page layout.
Question: Is it the job of blackfriday to produce valid HTML?
With HTML sanitize turned on, relative URLs are filtered out. I think this is because protocolAllowed is called on relative URLs, so adding !isRelativeLink(val) &&
to sanitize.go line 95 should fix this.
I haven't tested this, but it seems like the correct fix. I have verified that relative links are present in the output with sanitize disabled. Note that I've only tried this with image URLs, but it probably applies to others as well.
I used the python's markdown package with codehilite extension, it's syntax is quite similar with the regular markdown, but add one line to specify the language of the code, document is at here, I wonder if blackfriday could also support this syntax?
I forked the project and did some changes, the commit is wangbin@056f292
Similar to what GitHub does, e.g.
# This is a header
becomes
<h1 id="this-is-a-header">This is a header</h1>
Even better if there is an option to generate clickable anchor tags for each header that can be used to copy full URLs to specific parts of the document (again, like GitHub).
I'm happy to work on this if there are no objections.
[FUCKLINK][1]
[1]: javascript:alert(window.document.cookie);
Your markdown link to check if it is not strictly a hypertext link, the bug causing js script can be executed.
0.0
I use the Martini framework,I have some markdown file and I want render it as HTML in tmpl/html template.
The markdown file like this:
title: A Test Demo
---
##ABC
> 123
And the template file like this:
<head>
<title>{{name}}</title>
</head>
<body>
<h2>{{abc}}</h2>
<blockquote>
<p>{{xyz}}</p>
</blockquote>
</body>
I use the blackfriday parse the markdown and return []byte type,next step I wanna render the markdown file to this template and make each block to the right place,so how can I do this right way? Or use any way to do this better?
<a href="..." rel="nofollow">
does not pass the anchor regex.
Locally, I've worked around it by killing half the sanitizer, making it only check tags (as opposed to tags, alignments, attributes and attribute ordering.) I'm not certain that that's the best way to fix the issue.
Blackfriday currently converts all tabs to spaces (4 or 8, depending on config) as part of the pre-processing step. This irreversibly converts tabs even inside fenced code blocks.
This is bad for 2 reasons:
Typing
What is A all about?*
should be rendered in all italics, but isn't
Github also messes it up, cool. GFM != Markdown
For instance, iterating through all the parts of the document and see what format it is, etc.
blackfriday currently wont parse an image tag inside a hyperlink tag, like so:
[![alt text](image.png)](http://hyperli.nk)
It treats the image markdown as text and displays it.
Hello, I really like this package; however, a big headache for me is the inability to add metadata to documents. Apparently conventions are emerging for metadata, and I was wondering if you could add support for one. Here's an article that discusses one simple convention: http://hiltmon.com/blog/2012/06/18/markdown-metadata/
This would simplify development considerably.
In the interest of "safety against malicious user input", shouldn't there be an option to prevent the passthrough of script tags?
Hi,
there are two out of the box configurations how markdown may be rendered Basic and Common.
GitHub does some things different than the standard markdown - for example a line break in Markdown is a link break in HTML.
blackfriday has everything that is needed to make a markdownGHF configuration.
As this configuration is quite popular it would be nice to have this as a third out of the box configuration.
The documentation about how to implement it for my own is not clear enough for me - maybe a clearer documentation would be an other good solution.
This would be the extension set for it
EXTENSION_NO_INTRA_EMPHASIS
EXTENSION_HARD_LINE_BREAK
EXTENSION_AUTOLINK
EXTENSION_STRIKETHROUGH
EXTENSION_FENCED_CODE
see: https://help.github.com/articles/github-flavored-markdown
blackfriday.MarkdownCommon([]byte("[[t]](/t)"))
Calling to the above code caused panic.
panic: runtime error: index out of range
at go/src/github.com/russross/blackfriday/inline.go:184
Any plans to add textile support or willingness to include one if written by someone else?
In my project I need textile support so I decided first to port upskirt to Go, to learn how it works, and then implement textile in similar way.
Only after finishing the go port (https://github.com/kjk/go-markup) I've found your project, which is slightly ahead.
There's little point in having 2 almost identical codebases but I really want to complete the phase 2 i.e. textile support. I would be happy to drop my port and just contribute that (and possibly other improvements) to blackfriday.
Are you at all interested in extending blackfriday that way?
I've written a renderer for deck markup [1] [2] [3] [4] and I have a couple of questions:
- item 1
- item 2
should generate
<list xp="10" yp="90" sp="2">
<li>item 1</li>
<li>item 2</li>
</list>
but
* item 1
* item 2
generates:
<list xp="10" yp="90" sp="2" type="bullet">
<li>item 1</li>
<li>item 2</li>
</list>
![50,50,960,540](/Users/ajstarks/Images/desert960.jpg "The desert")
I'd like to render in deck without as:
<image name="/Users/ajstarks/Images/desert960.jpg" xp="50" yp="50" width="960" height="540" caption="The desert" />
Finally, I've updated blackdown-tool to use a -format flag so that you can say:
$ blackdown-tool -format html ...
$ blackdown-tool -format latex ...
$ blackdown-tool -format deck ...
[1] https://github.com/ajstarks/deck
[2] http://godoc.org/github.com/ajstarks/deck
[3] https://github.com/ajstarks/deck/blob/master/examples/deck.xml
[4] https://github.com/ajstarks/deck/blob/master/examples/deck.pdf?raw=true
If you put "#" in
, then the symbol is interpreted as header.
See this
Thanks,
Given this input:
<blockquote>Got this today... Leg hurts
![](http://farm6.static.flickr.com/5083/5258310683_f7c87edbc4_z.jpg)</blockquote>
The image inside the blockquote remains as markdown after processing with blackfriday.
Out:
<blockquote>Got this today... Leg hurts
![](http://farm6.static.flickr.com/5083/5258310683_f7c87edbc4_z.jpg)</blockquote>
Expected... the markdown image tag to become an HTML IMG element.
But, if you add any markdown outside of the blockquote, then the markdown image tag is converted to HTML though the HTML is invalid (nesting of paragraphs is wrong):
Input:
**bold**
<blockquote>Got this today... Leg hurts
![](http://farm6.static.flickr.com/5083/5258310683_f7c87edbc4_z.jpg)</blockquote>
Out:
<p><strong>bold</strong><br />
<blockquote>Got this today... Leg hurts</p>
<p><img src="http://farm6.static.flickr.com/5083/5258310683_f7c87edbc4_z.jpg" alt="" />
</blockquote></p>
Expected:
<p><strong>bold</strong></p>
<blockquote>Got this today... Leg hurts
<img src="http://farm6.static.flickr.com/5083/5258310683_f7c87edbc4_z.jpg" alt="" />
</blockquote>
Hi, there are still two issues with the new HTML-parser based sanitization (#69) remaining after #71 and #70. /cc @mprobst One major, one minor.
First, the minor. When HTML_SANITIZE_OUTPUT is on, self-closing tags like <hr />
get rewritten as <hr>
. See: https://github.com/russross/blackfriday/pull/70/files#r12218729.
Next, the major. When HTML_SANITIZE_OUTPUT is off, the following Markdown,
Here are some "quotes".
is converted to this HTML,
<p>Here are some "quotes".</p>
Or, with HTML_USE_SMARTYPANTS on, then,
<p>Here are some “quotes”.</p>
However, when HTML_SANITIZE_OUTPUT is turned on, the html escaped quotes are replaced with output that doesn't render correctly as HTML (update: unless I explicitly set the charset to "utf-8", see the edits below).
From my limited testing, it seemed that replacing this code with the following fixed the problem.
wr.Write(tokenizer.Raw())
But that should be carefully validated.
I'm quite surprised the tests didn't catch this. Are there any tests for HTML_SANITIZE_OUTPUT with these symbols?
Edit: The 2nd "major" issue may be a non-issue, the escaped html is replaced by valid unicode characters that may get displayed correctly under the right conditions. But, is this an intended part of the sanitization process, and what is the motivation for it? I'm pretty sure it wasn't the case before.
Edit 2: Yeah, it turns out it's valid utf-8. The curly quotes show up correctly as “
if I explicitly set the charset to "utf-8" in the HTML, otherwise I see something else like “
in Chrome when viewing the generated HTML.
"what what"! -> “what what”!
"what what!" -> “what what!“
It would be nice to be able to highlight code on server-side instead of doing that on client-side (right now I'm thinking about HTML and not Latex output). I see two possibilities here:
I believe that latter is better option both API- and performance-wise. Do you have any thoughts/opinion about this?
This is a feature request for footnotes in the style of pandoc
There is one syntax extension I had when using a PHP Markdown parser, that is the ability to write HTML description lists with the following syntax :
Cat
: Fluffy animal everyone likes
Internet
: Vector of transmission for pictures of cats
The corresponding HTML output would be the following :
<dl>
<dt>Cat</dt>
<dd>Fluffy animal everyone likes</dd>
<dt>Internet</dt>
<dd>Vector of transmission for pictures of cats</dd>
</dl>
Could you implement such a functionnality ? I would have done it myself, only I have no idea how your markdown parser works and how to modify it.
Keep up the good work !
Is it just me, or is the conversion of > He said, she said
style block quotes not supported?
Am I just doing something wrong? I'm very new to Go, but I might attempt to implement it?
This input:
<a href="javascript:(function(h){var i=h.indexOf('&');if(i>=0)
{url=h.substring(0,i);}else{url=h;}
resp=prompt('This is the address to use (Hit Ctrl+C or Cmd+C to copy)',url)})
(window.location.href);">YouTube Link</a>
produces this output:
<p><a href="javascript:(function(h){var i=h.indexOf('&');if(i>=0)
{url=h.substring(0,i);}else{url=h;}
resp=prompt(‘This is the address to use (Hit Ctrl+C or Cmd+C to copy)’,url)})
(window.location.href);“>YouTube Link</a></p>
I expected to have it render thusly
<p><a href="javascript:(function(h){var i=h.indexOf('&');if(i>=0)
{url=h.substring(0,i);}else{url=h;}
resp=prompt('This is the address to use (Hit Ctrl+C or Cmd+C to copy)',url)})
(window.location.href);">YouTube Link</a></p>
Escaping the angle bracket with a backslash made no difference.
A workaround is to preemptively convert the angle bracket (in this case the greater-than symbol) to an HTML entity, like so:
<a href="javascript:(function(h){var i=h.indexOf('&');if(i>=0)
{url=h.substring(0,i);}else{url=h;}
resp=prompt('This is the address to use (Hit Ctrl+C or Cmd+C to copy)',url)})
(window.location.href);">YouTube Link</a>
We were trying to explain how to use a backtick to create a code block.
But for the input: Just use a ` backtick
The output is: Just use a backtick
This occurs when blackfriday.HTML_USE_SMARTYPANTS
is enabled
I suspect (from glancing at the code) that it requires a pairing.
In instances where a pair does not exist, the punctuation should ideally be left as-is.
Links such as:
[foo](/my/foo)
Are of a known/trusted protocol (http).
Currently, html TOC header ids are of the form #toc_<num>
, which is not great for linking. They're not illustrative or resistant to reordering. I propose instead doing what sites like github do: use the header text (replacing special characters with '-').
Headings are not correctly parsed when using the #
syntax without trailing whitespace.
Example:
# correct
#incorrect
This should generate (checked against the Daring Fireball Markdown processor):
<h1>correct</h1> <h1>incorrect</h1>
but creates:
<h1>correct</h1>
<p>#incorrect</p>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.