stephenafamo / goldmark-pdf Goto Github PK

View Code? Open in Web Editor NEW

112.0 4.0 11.0 127 KB

A PDF renderer for the goldmark markdown parser.

License: MIT License

Go 100.00%

go golang markdown pdf commonmark goldmark goldmark-extension

goldmark-pdf's Introduction

goldmark-pdf

goldmark-pdf is a renderer for goldmark that allows rendering to PDF.

Reference

See https://pkg.go.dev/github.com/stephenafamo/goldmark-pdf

Usage

Care has been taken to match the semantics of goldmark and its extensions.

The PDF renderer can be initiated with pdf.New() and the returned value satisfies goldmark's renderer.Renderer interface, so it can be passed to goldmark.New() using the goldmark.WithRenderer() option.

markdown := goldmark.New(
    goldmark.WithRenderer(pdf.New()),
)

Options can also be passed to pdf.New(), the options interface to be satisfied is:

// An Option interface is a functional option type for the Renderer.
type Option interface {
	SetConfig(*Config)
}

Here is the Config struct that is to be modified:

type Config struct {
	Context context.Context

	PDF PDF

	// A source for images
	ImageFS fs.FS

	// All other options have sensible defaults
	Styles Styles

	// A cache for the fonts
	FontsCache fonts.Cache

	// For debugging
	TraceWriter io.Writer

	NodeRenderers util.PrioritizedSlice
}

Some helper functions for adding options are already provided. See option.go

An example with some more options:

goldmark.New(
    goldmark.WithRenderer(
        pdf.New(
            pdf.WithTraceWriter(os.Stdout),
            pdf.WithContext(context.Background()),
            pdf.WithImageFS(os.DirFS(".")),
            pdf.WithLinkColor("cc4578"),
            pdf.WithHeadingFont(pdf.GetTextFont("IBM Plex Serif", pdf.FontLora)),
            pdf.WithBodyFont(pdf.GetTextFont("Open Sans", pdf.FontRoboto)),
            pdf.WithCodeFont(pdf.GetCodeFont("Inconsolata", pdf.FontRobotoMono)),
        ),
    ),
)

Fonts

The fonts that can be used in the PDF are based on the Font struct

// Represents a font.
type Font struct {
	CanUseForText bool
	CanUseForCode bool

	Category string
	Family   string

	FileRegular    string
	FileItalic     string
	FileBold       string
	FileBoldItalic string

	Type fontType
}

To be used for text, a font should have regular, italic, bold and bold-italic styles. Each of these has to be loaded separately.

To ease this process, variables have been generated for all the Google fonts that have these styles. For example:

var FontRoboto = Font{
	CanUseForCode:  false,
	CanUseForText:  true,
	Category:       "sans-serif",
	Family:         "Roboto",
	FileBold:       "700",
	FileBoldItalic: "700italic",
	FileItalic:     "italic",
	FileRegular:    "regular",
	Type:           fontTypeGoogle,
}

For codeblocks, if any other style is missing, the regular font is used in place.

var FontMajorMonoDisplay = Font{
	CanUseForCode:  true,
	CanUseForText:  false,
	Category:       "monospace",
	Family:         "Major Mono Display",
	FileBold:       "regular",
	FileBoldItalic: "regular",
	FileItalic:     "regular",
	FileRegular:    "regular",
	Type:           fontTypeGoogle,
}

When loading the fonts, they are downloaded on the fly using the fonts.

If you'd like to use a font outside of these, you should pass your own font struct which have been loaded into the PDF object you set in the Config. Be sure to set the FontType to FontTypeCustom so that we do not attempt to download it.

Contributing

Here's a list of things that I'd love help with:

More documentation
Testing
Finish the (currently buggy) implementation based on gopdf

License

MIT

Author

Stephen Afam-Osemene

goldmark-pdf's People

Contributors

Stargazers

Watchers

Forkers

x-mod kokizzu isgasho forksarchive itohio fire988 nooperhzy iforking edouard-sn zombiemachines psyark

goldmark-pdf's Issues

Bullets

No bullets from

package main

import (
	"os"

	pdf "github.com/stephenafamo/goldmark-pdf"
	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/extension"
	"github.com/yuin/goldmark/parser"
)

var markdown []byte = []byte(`
# Hello

- item 1
- item 2

# Goodbye
`)

func main() {
	file, err := os.Create("/tmp/output.pdf")
	if err != nil {
		panic(err)
	}

	md := goldmark.New(
		goldmark.WithExtensions(extension.GFM),
		goldmark.WithRenderer(
			pdf.New(),
		),
		goldmark.WithParserOptions(
			parser.WithAutoHeadingID(),
		),
	)

	if err := md.Convert(markdown, file); err != nil {
		panic(err)
	}
}

Panic in renderText

Source:

test

My favorite search engine is [Duck Duck Go](https://duckduckgo.com).

![The San Juan Mountains are beautiful!](https://mdg.imgix.net/assets/images/san-juan-mountains.jpg?auto=format&fit=clip&q=40&w=1080 "San Juan Mountains")

Here's our logo (hover to see the title text):

Inline-style: 
![alt text](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Logo Title Text 1")

Reference-style: 
![alt text][logo]

[logo]: https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Logo Title Text 2"

ok.

returns

panic: interface conversion: ast.Node is *ast.String, not *ast.Text

goroutine 1 [running]:
github.com/stephenafamo/goldmark-pdf.(*nodeRederFuncs).renderText(0xc0002cc7ec?, 0xd2f3a0?, {0xc0003c4000?, 0x0?, 0x0?}, {0xd3dde0?, 0xc0000df3b0?}, 0x0?)
        G:/Go/pkg/mod/github.com/stephenafamo/goldmark-pdf@v0.2.0/renderer_funcs.go:138 +0xd0
github.com/stephenafamo/goldmark-pdf.(*renderer).Render.func2({0xd3dde0, 0xc0000df3b0}, 0x80?)
        G:/Go/pkg/mod/github.com/stephenafamo/goldmark-pdf@v0.2.0/renderer.go:103 +0xb6
github.com/yuin/goldmark/ast.walkHelper({0xd3dde0, 0xc0000df3b0}, 0xc000515d08)
        G:/Go/pkg/mod/github.com/yuin/goldmark@v1.5.3/ast/ast.go:492 +0x34
github.com/yuin/goldmark/ast.walkHelper({0xd3dba0, 0xc000036e00}, 0xc000515d08)
        G:/Go/pkg/mod/github.com/yuin/goldmark@v1.5.3/ast/ast.go:498 +0x8e
github.com/yuin/goldmark/ast.walkHelper({0xd3d180, 0xc0000defc0}, 0xc000515d08)
        G:/Go/pkg/mod/github.com/yuin/goldmark@v1.5.3/ast/ast.go:498 +0x8e
github.com/yuin/goldmark/ast.Walk(...)
        G:/Go/pkg/mod/github.com/yuin/goldmark@v1.5.3/ast/ast.go:487
github.com/stephenafamo/goldmark-pdf.(*renderer).Render(0xc000356440, {0xd2ede0, 0xc00008a038}, {0xc0003c4000, 0x230, 0x380}, {0xd3d180, 0xc0000defc0})
        G:/Go/pkg/mod/github.com/stephenafamo/goldmark-pdf@v0.2.0/renderer.go:98 +0x632
github.com/yuin/goldmark.(*markdown).Convert(0xc000356480, {0xc0003c4000, 0x230, 0x380}, {0xd2ede0, 0xc00008a038}, {0x0, 0x0, 0x0})

How to embed meta data?

Is there a way I can embed meta data into the document? Like exif for images but for PDF document.

how To render Unicode font，like cjk font

Chinese font are rendered as box

	notoFont := pdf.Font{
		CanUseForText:  true,
		CanUseForCode:  false,
		Category:       "sans-serif",
		Family:         "Noto Sans SC",
		FileRegular:    "regular",
		FileItalic:     "regular",
		FileBold:       "700",
		FileBoldItalic: "700",
		Type:           pdf.FontTypeGoogle,
	}
	config := pdf.DefaultConfig()
	err := pdf.AddFonts(ctx, config.PDF, []pdf.Font{notoFont}, nil)
	if err != nil {
		return nil, err
	}
	md := goldmark.New(
		goldmark.WithRenderer(pdf.New(
			pdf.WithConfig(config),
			pdf.WithBodyFont(pdf.FontNotoSans))),
	)
	buf := bytes.Buffer{}
	if err := md.Convert([]byte(markdownStr), &buf); err != nil {
		return nil, err
	}
	return buf.Bytes(), nil

Image caption is not centered

Image with caption will have it rendered aligned to the left of the document instead of centered, with the image.

For example:
![The San Juan Mountains are beautiful!](/assets/images/san-juan-mountains.jpg "San Juan Mountains")

will end up like:

Horizontal rules are not rendered

I tried to use *** and --- but neither will get rendered.

https://www.markdownguide.org/basic-syntax#horizontal-rules

Small images are stretched for the entire width of the document

I have a testing document which contains a small image of 176px*176px dimensions. The problem is that this image will be scaled to the entire width of the document, which does not look good and is most likely not desired because it would not be rendered as such in HTML document, unless specifically configured with CSS.

The code responsible for this behavior is: https://github.com/stephenafamo/goldmark-pdf/blob/master/renderer_funcs.go#L581

I think that if the image is smaller than the width of the document, it should be kept as is and if the image is wider, only then it should be scaled down to fit the document.

Remove hard-coded logging

If there is error during rendering, like missing image, I will get a log:

2023/01/11 16:45:04 IMAGE ERROR: foobar, file does not exist

This is not good because I have no control over this. I don't even think this should be a log. It should be simply returned error which can be ignored, or not. Adding something like WithLog(someInterace) may sound like good idea at first but in reality it is useless. Hence I think there should dedicated error that can join multiple errors(multiple missing images) which can be then processed by the caller or ignored, depending on severity(ie var ErrMissingImage = errors.New(...) with errors.Is()). The final document can still be used even with missing images hence why it should not be a hard error. Also, this specifically ties to #7

Inline code block is not working

When I switch render from HTML to PDF the inline code block will not work because the backticks will get escaped.

Font cache stopped working

Looks like the font cache provided via WithFontsCache is not being called in the latest version.

Lack of image mime detection

The image renderer:

func (r *nodeRederFuncs) renderImage(w *Writer, source []byte, node ast.Node, entering bool) (ast.WalkStatus, error) {
	// while this has entering and leaving states, it doesn't appear
	// to be useful except for other markup languages to close the tag
	n := node.(*ast.Image)

	if entering {
		w.LogDebug("Image (entering)", fmt.Sprintf("Destination[%v] Title[%v]", string(n.Destination), string(n.Title)))
		// following changes suggested by @sirnewton01, issue #6
		// does file exist?
		imgPath := string(n.Destination) <--------------------------------------------------
		imgFile, err := w.ImageFS.Open(imgPath)
		if err == nil {
			defer imgFile.Close()

			width, _ := w.Pdf.GetPageSize()
			mleft, _, mright, _ := w.Pdf.GetMargins()
			maxw := width - (mleft * 2) - (mright * 2)

			format := strings.ToUpper(strings.Trim(filepath.Ext(imgPath), ".")) <-------
			w.Pdf.RegisterImage(imgPath, format, imgFile)
			w.Pdf.UseImage(imgPath, (mleft * 2), w.Pdf.GetY(), maxw, 0)
		} else {
			log.Printf("IMAGE ERROR: %v", err)
			w.LogDebug("Image (file error)", err.Error())
		}
	} else {
		w.LogDebug("Image (leaving)", "")
	}

	return ast.WalkContinue, nil
}

relies on path to determine the mime type of the file. But if I use http file directory to embed/render images linked via http and not stored locally, this fails miserably unless the url contains the mime type suffix, which is rarely the case.

Hence, there should be a built-in http file directory the mime should be determined manually by github.com/gabriel-vasile/mimetype or similar library.

My quickly built fs is:

type HttpFs struct{}

func (f *HttpFs) Open(name string) (fs.File, error) {
	res, err := http.Get(name)
	if err != nil {
		return nil, err
	}
	return &HttpFile{r: res}, nil
}

type HttpFile struct {
	r *http.Response
}

func (f *HttpFile) Stat() (fs.FileInfo, error) {
	return &HttpInfo{r: f.r}, nil
}

func (f *HttpFile) Read(p []byte) (int, error) {
	return f.r.Body.Read(p)
}

func (f *HttpFile) Close() error {
	return f.r.Body.Close()
}

type HttpInfo struct {
	r *http.Response
}

func (i *HttpInfo) Name() string {
	fn := strings.TrimPrefix(i.r.Request.URL.Path, "/")
	if fn == "" {
		if _, params, err := mime.ParseMediaType(i.r.Header.Get("Content-Disposition")); err == nil {
			fn = params["filename"]
		}
	}
	if filepath.Ext(fn) == "" {
		mt, _, _ := mime.ParseMediaType(i.r.Header.Get("Content-Type"))
		if spl := strings.Split(mt, "/"); len(spl) > 0 {
			if fn == "" {
				fn = spl[0]
			}
			fn += "." + spl[len(spl)-1]
		}
	}
	return filepath.Base(fn)
}

func (i *HttpInfo) Size() int64 {
	return i.r.ContentLength
}

func (i *HttpInfo) Mode() fs.FileMode {
	return fs.ModeIrregular
}

func (i *HttpInfo) ModTime() time.Time {
	if t, err := time.Parse(time.RFC1123, i.r.Header.Get("Last-Modified")); err == nil {
		return t
	}
	return time.Time{}
}

func (i *HttpInfo) IsDir() bool {
	return false
}

func (i *HttpInfo) Sys() any {
	return i.r
}

Skip the rest of the page?

I want to put some content as the last page of the document, is there a way to tell the renderer to skip the rest of the current page and render the next page instead?

Examples are not working

The examples listed here are not working. The link color does not accept string and the fonts are constantly throwing missing errors.

Add option to recode embedded images when they are too large

When I link big image(say 4MB), it will be embedded into the pdf file as is, making the file size that much larger.

I think there should be an option to convert images above configured size into jpeg and embed those re-coded images instead of the originals. This can save a lot of disk space in case the assets' quality is not of utmost importance(which is usually the case, very few people embed high-resolution images into PDF documents).

With the FS branch with the new file system, there is access to file size via Stat() and mime detection is present as well, which can be used to invoke the code, that will look something like this:

import (
	"errors"
	"github.com/disintegration/imaging"
	"golang.org/x/image/webp"
	"image"
	"image/gif"
	"image/jpeg"
	"image/png"
	"io"
)

func Resize(srcMime string, src io.Reader, dst io.Writer, documentWidth int) error {
	var img image.Image
	var err error

	switch srcMime {
	case "image/jpeg":
		img, err = jpeg.Decode(src)
	case "image/gif":
		img, err = gif.Decode(src)
	case "image/png":
		img, err = png.Decode(src)
	case "image/webp":
		img, err = webp.Decode(src)
	default:
		return errors.New("unsupported input mime type: " + srcMime)
	}

	if err != nil {
		return err
	}

       if img.Bounds().Dx() < documentWidth {
              documentWidth = img.Bounds().Dx()
       }

	filter := imaging.Lanczos // best quality but slow
	img = imaging.Resize(img, documentWidth, 0, filter)
	return jpeg.Encode(dst, img, nil) // jpeg has the best compression with sufficient quality for this purpose
}

Local links/navigation is not working

Having local navigation, like:

# <a name="top"></a>Markdown Test Page

* [Headings](#Headings)
* [Paragraphs](#Paragraphs)
* [Blockquotes](#Blockquotes)
* [Lists](#Lists)
* [Horizontal rule](#Horizontal)
* [Table](#Table)
* [Code](#Code)
* [Inline elements](#Inline)

***

# <a name="Headings"></a>Headings

# Heading one

Sint sit cillum pariatur eiusmod nulla pariatur ipsum. Sit laborum anim qui mollit tempor pariatur nisi minim dolor. Aliquip et adipisicing sit sit fugiat commodo id sunt. Nostrud enim ad commodo incididunt cupidatat in ullamco ullamco Lorem cupidatat velit enim et Lorem. Ut laborum cillum laboris fugiat culpa sint irure do reprehenderit culpa occaecat. Exercitation esse mollit tempor magna aliqua in occaecat aliquip veniam reprehenderit nisi dolor in laboris dolore velit.

## Heading two

[[Top]](#top)

will render links but they won't work.

Tables do not render correctly

If the cell contents are longer than the heading length of the column then the next column will overwrite the last part of the previous column. Column widths are determined based on the values in the header row. Column contents do not wrap (at least not without changing the row height, which I have not tested).

This makes the table functionality not very useful except for the simplest of tables, which is a shame since otherwise the generated PDF looks very good.

Image ALT is ignored

When image is not loaded, it's alt text is ignored:

![The San Juan Mountains are beautiful!](san-juan-mountains.jpggg "San Juan Mountains Alt Text")

Rendering quotes

Thanks for a great package. Is there a way to customise the renderer to parse quotes correctly.

Consider source:

source := `"Just a test"`
markdown := goldmark.New(
		goldmark.WithRenderer(
		pdf.New(
				pdf.WithHeadingFont(pdf.GetTextFont("Times", pdf.FontTimes)),
				pdf.WithBodyFont(pdf.GetTextFont("Times", pdf.FontTimes)),
		)),
	
	)

var buf bytes.Buffer
if err := markdown.Convert([]byte(source), &buf); err != nil {
return err
}

buf.Bytes()

Will produce:

&quot;Just a test&quot;

The desired output is "Just a test"

I presume this has to do with passing correct options to WithNodeRenderers(), but struggling to see what those are from the source code.

Thanks for the help!