htdebeer / paru Goto Github PK
View Code? Open in Web Editor NEWControl pandoc with Ruby and write pandoc filters in Ruby
Home Page: https://heerdebeer.org/Software/markdown/paru/
License: GNU General Public License v3.0
Control pandoc with Ruby and write pandoc filters in Ruby
Home Page: https://heerdebeer.org/Software/markdown/paru/
License: GNU General Public License v3.0
Hello!
I wrote to you a few years ago asking about using EPUB as an input method for paru, and I've been using it ever since you implemented the convert_file method.
Inspired by the code in this related issue #9, I'm using a paru filter to fine-tune the latex output of a document and it's proving really successful.
This is the relevant snipped from my filter:
# Change the output so chapters and sections aren't numbered
with 'Header' do |header|
header.inner_markdown = header.children.first.inner_markdown if header.has_children?
if header.level == 1
header.markdown = "\\chapter\*\{\\texorpdfstring\{\{#{header.inner_markdown}\}\}\{#{header.inner_markdown}\}\}"
end
if header.level == 2
header.markdown = "\\section\*\{\\texorpdfstring\{\{#{header.inner_markdown}\}\}\{#{header.inner_markdown}\}\}"
end
end
In general, this works absolutely fine. However, in some instances, the result output includes only the first word in that header, rather than the full header text.
For example, in the result file I would see something like this
\section*{\texorpdfstring{{London,
}}{London,
}}
or
\section*{\texorpdfstring{{The
}}{The
}}
when the expected output would have been
\section*{\texorpdfstring{{London, February 1907
}}{London, February 1907
}}
and
\section*{\texorpdfstring{{The Orient Express, August 1905
}}{The Orient Express, August 1905
}}
I can't seem to find any commonality in the instances where this occurs, but when there is a comma, that bit of punctuation is always preserved, and otherwise the string gets truncated at the first space.
Am I accessing the wrong method with .inner_markdown
?
I've been looking at the input carefully and it's not that there is a child node in there I don't think – the source HTML from the EPUB input in this instance was <h2>The Orient Express, August 1905</h2>
, for example.
This issue doesn't occur in every instance of a heading, but when it does occur in a specific heading it will happen every time I run paru, but as I said I can't quite ascertain what is causing it.
Do you have any ideas? Am happy to provide more examples and information to help debug this issue.
Thanks!
I was looking at commit ec42eb7 and how it's added support for pandoc --metadata-file, but I can't seem to get this to work with paru.
I assumed the rubyesque way of setting this option would be with an underscore instead of hyphen "metadata_file 'filename.yaml'"
file = 'test.md'
Paru::Pandoc.new do
from 'markdown-smart'
to 'markdown-smart'
standalone true
output 'test-1.md'
metadata_file 'metadata.yaml'
end.convert_file file
However, I get an error when I try this.
NoMethodError: undefined method 'metadata_file' for #<Paru::Pandoc:0x00007fc55d30ec68> did you mean? metadata-file
When I try and use metadata-file
in my options instead, I get a ruby syntax error because of the hyphen.
Is there a bug in this implementation, or am I not configuring this option correctly in my example above?
When replacing elements outer_markdown:
it sometimes work (see attached sample code) e.g. when I try to replace an image-object with **Bold Text
sometimes it does not completely work, e.g. when I try to replace it with # Chapter
, then the resulting file will omit the #
and will just include the word “Chapter”.
When I try to replace it with embedded LaTeX command, it reports an error:
/var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:103:in
'each': undefined method
each' for nil:NilClass (NoMethodError)
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/markdown.rb:87:inouter_markdown=' from testFilter.rb:10:in
block (2 levels) in
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:289:inwith' from testFilter.rb:7:in
block in '
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:268:ininstance_eval' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:268:in
block in filter'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:101:ineach_depth_first' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:102:in
block in each_depth_first'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:104:inblock in each' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:103:in
each'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:103:ineach' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:102:in
each_depth_first'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:102:inblock in each_depth_first' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:104:in
block in each'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:103:ineach' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:103:in
each'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:102:ineach_depth_first' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:266:in
filter'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:228:inrun' from testFilter.rb:6:in
'
pandoc: Error running filter testFilter.rb
You can run the example code with:
pandoc "test.markdown" --filter testFilter.rb -o output.markdown
Again I’m not 100% how to write a replacing filter. I also tried to replace it with an Paru-Object (e.g. RawBlock or RawInline - but it did not work either)
Ian mentioned that pandocomatic has trouble with finding files in the data-directory (pandocomatic issue #64) in pandoc 2.7 because pandoc now implements the XDG Base Directory Specification and the default data-directory changed from ~/.pandoc
to ~/.local/share/pandoc
. In the release notes I read that the new location has preference.
Hi!
Guess it's not a bug :), simply my low ruby skills. How can I pass arguments from command line to a filter?
The following code snippet fails in line
if image.attr.has_key? "width" with error
gems/paru-0.3.0.1/lib/paru/filter/attr.rb:72:inhas_key?': undefined method
key_exists?' for #Array:0x00005604c3b8bc80
`require "paru/filter"
Paru::Filter.run do
with "Image" do |image|
if image.attr.has_key? "width"
STDERR.puts "has width"
end
end
end
`
There are some bigger changes to pandoc we should support:
--ipynb-output=all|none|best
--metadata-file
Pandoc has just gotten to release 2.10. There seems to be some changes to pandoc-types, so check if those changes needs to be propagates to paru or have other implications for paru.
This is quite bizarre, but when I try to use the excellent Pry (0.10.4) to inspect a filter, I get a Paru error.
test.md
A minimal **pandoc** example.
noop filter
#!/usr/bin/env ruby
require 'pry'
require 'paru/filter'
binding.pry
Paru::Filter.run do
end
Output:
👉 pandoc -t json -F noop test.md
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.8/lib/paru/filter/document.rb:61:in `rescue in from_JSON': Unable to read document. (Paru::FilterError)
Most likely cause: Paru expects a pandoc installation that has been
compiled with pandoc-types >= 1.17.0.5. You can
check which pandoc-types have been compiled with your pandoc installation by
running `pandoc -v`.
Original error message: 765: unexpected token at ''
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.8/lib/paru/filter/document.rb:57:in `from_JSON'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.8/lib/paru/filter.rb:237:in `document'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.8/lib/paru/filter.rb:264:in `filter'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.8/lib/paru/filter.rb:228:in `run'
from /Users/ian/.pandoc/filters/noop:8:in `<main>'
pandoc: Error running filter /Users/ian/.pandoc/filters/noop
Filter returned error status 1
What is strange is that pry should stop execution at the run() — if you comment out binding.pry
then there is no error. I also tried the new binding.irb
and that also has problems.
Hi, the first version of the epic changelog for Pandoc 2 is now available:
and I saw this:
Set `PANDOC_READER_OPTIONS` in environment where filters are run. This contains a JSON representation of `ReaderOptions`, so filters can access it.
Worth noting, though doesn't need any explicit paru support I suppose? There is also a detailed section of API changes, which may be useful for paru...
If I try to run the "hello world" paru example with 0.2.4.7 (installed via gem) I get this error:
👉 test2
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter.rb:22:in `require_relative': /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:64: syntax error, unexpected tIDENTIFIER, expecting keyword_do or '{' or '(' (SyntaxError)
Most likely cause: Paru expects a pandoc installation that has been
^
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:66: syntax error, unexpected tIDENTIFIER, expecting keyword_do or '{' or '('
check which pandoc-types have been compi...
^
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:66: syntax error, unexpected tIDENTIFIER, expecting keyword_do or '{' or '('
check which pandoc-types have been compiled with your pand...
^
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:76: syntax error, unexpected tCONSTANT, expecting keyword_do or '{' or '('
pandoc-types API version used in document (ve...
^
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:76: syntax error, unexpected keyword_in, expecting keyword_end
...andoc-types API version used in document (version = #{versio...
... ^
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:77: syntax error, unexpected tIDENTIFIER, expecting keyword_do or '{' or '('
smaller than the version of pandoc-types used by paru
^
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:77: syntax error, unexpected tIDENTIFIER, expecting keyword_do or '{' or '('
smaller than the version of pandoc-types used by paru
^
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:81: syntax error, unexpected keyword_end, expecting ')'
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:85: syntax error, unexpected keyword_end, expecting ')'
/Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter/document.rb:136: syntax error, unexpected keyword_end, expecting ')'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter.rb:22:in `<module:Paru>'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru/filter.rb:19:in `<top (required)>'
from /Library/Ruby/Site/2.0.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Library/Ruby/Site/2.0.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru.rb:21:in `<module:Paru>'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.7/lib/paru.rb:19:in `<top (required)>'
from /Library/Ruby/Site/2.0.0/rubygems/core_ext/kernel_require.rb:133:in `require'
from /Library/Ruby/Site/2.0.0/rubygems/core_ext/kernel_require.rb:133:in `rescue in require'
from /Library/Ruby/Site/2.0.0/rubygems/core_ext/kernel_require.rb:40:in `require'
from ../test2:2:in `<main>'
What I think is happening is that you are using the indent-aware <<~
for your HEREDOC errors (https://github.com/htdebeer/paru/blob/master/lib/paru/filter/document.rb#L61), which was only introduced in Ruby V2.3, and macOS uses V2.0 by default. I think it would be better to try to keep V2.0 compatibility, especially as you aren't really using the indentation feature of <<~
anyway
Hi Huub, hope you are well. This was just a small heads-up that the upcoming V2.8 of Pandoc will include a few command-line option changes:
https://github.com/jgm/pandoc/blob/master/changelog.md
There is a nice new feature to call YAML defaults, and I hope this will work alongside pandocomatic (I haven't tested anything yet though):
https://github.com/jgm/pandoc/blob/master/MANUAL.txt#L1423
I'm looking for a way to embed filters "inline", for example:
converter = Paru::Pandoc.new do
from "textile"
to "html"
filter do
...
end
end
Is this possible? Or is there another way?
Hi, I tried using the new example filter, which I renamed addToday in my local install and I get the following noMethod error:
/Users/ian/.pandoc/filters/addToday:8:in `block in <main>': undefined method `[]=' for #<Paru::PandocFilter::Meta:0x007fb7d612bc00> (NoMethodError)
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.3/lib/paru/filter.rb:132:in `instance_eval'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.3/lib/paru/filter.rb:132:in `block in filter'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.3/lib/paru/filter/ast_manipulation.rb:92:in `each_depth_first'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.3/lib/paru/filter.rb:130:in `filter'
from /Library/Ruby/Gems/2.0.0/gems/paru-0.2.4.3/lib/paru/filter.rb:108:in `run'
from /Users/ian/.pandoc/filters/addToday:7:in `<main>'
pandoc: Error running filter /Users/ian/.pandoc/filters/addToday
Filter returned error status 1
filter:
#!/usr/bin/env ruby
## Add today's date to the metadata
require "paru/filter"
require "date"
Paru::Filter.run do
metadata["date"] = Paru::PandocFilter::MetaString.new(Date.today.to_s)
end
Related to pandocomatic issue: trying out pandocomatic 0.2 first time user, unable to execute hello world example #43 by agusmba.
When I try to run
require 'paru/pandoc'
converter = Paru::Pandoc.new do
from "markdown"
to "html"
end
result = converter << "hello *world*"
on Windows, I get the message:
$ ruby test_paru.rb
pandoc: \
: openFile: invalid argument (Invalid argument)
C:/Ruby24/lib/ruby/gems/2.4.0/gems/paru-0.2.5.9/lib/paru/pandoc.rb:299:in `run_converter': error while running: (Paru::Error)
pandoc --from=markdown \
--to=html
Pandoc responded with:
pandoc: \
: openFile: invalid argument (Invalid argument)
from C:/Ruby24/lib/ruby/gems/2.4.0/gems/paru-0.2.5.9/lib/paru/pandoc.rb:153:in `convert'
from test_paru.rb:8:in `<main>'
After fixing running paru on windows, tests with paths with spaces fail on windows. Just run rake test
on Windows.
This isn't really important, but if you look at the diff of each commit, every HTML file has a changed date, and this makes looking for the actual change really difficult in the git commit history. Easiest option is just remove date in the erb footer:
Make pandoc2yaml.rb and do-pandoc.rb from the examples executables so these can be used easily by users who do not want to use more than just that and/or are inexperienced using Ruby.
As we discussed by email, if I use a conditional to test a metadata key exists at the start of a filter, stop! does not actually stop, but carries on.
Paru::Filter.run do
stop! unless metadata.key?('institute')
#do something with metadata key, this gets called even if 'institute' key never existed
end
The simple fix is to just wrap the rest of the filter in an if
statement, but it seem to be ruby "style" to return early in this fashion...
Hi, I'm testing the a metadata filter and I notice that the filter code is triggered 7 times.
---
title: 'My Title'
author: John Doe
...
Minimal **example**.
#!/usr/bin/env ruby
require 'paru/filter'
testkey = 'author'
Paru::Filter.run do
if metadata.has_key?(testkey)
nau = nil
au = metadata[testkey]
if au.is_a?(String)
warn "It's a string"
nau = [Hash["name" => au]]
elsif au.is_a?(Array)
warn "It's an array"
if au[0].is_a?(String)
nau = Array.new(au.length) {Hash.new}
au.each_index {|i| nau[i] = Hash["name" => au[i]]}
end
elsif au.is_a?(Hash)
warn "It's a hash"
else
warn "Who know's what it is?"
end
if not nau.nil?
metadata[testkey] = nau
end
end
end
👉 pandoc -s -f markdown -t markdown -F testFilter test.md
It's a string
It's an array
It's an array
It's an array
It's an array
It's an array
It's an array
---
author:
- name: John Doe
title: My Title
---
Minimal **example**.
As you can see the warn
output is 7 lines long. Using byebug, when the metadata on line 32 is set I see:
[26, 35] in /Users/ian/.dotfiles/pandoc/filters/noop2
26: elsif au.is_a?(Hash)
27: warn "It's a hash"
28: else
29: warn "Who know's what it is?"
30: end
31: if not nau.nil?
=> 32: metadata[testkey] = nau
33: end
34: end
35: end
(byebug) n
[96, 105] in /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/paru-0.2.5f/lib/paru/filter/ast_manipulation.rb
96: # tree
97: #
98: # @yield node
99: def each_depth_first(&block)
100: yield self
=> 101: each {|child| child.each_depth_first(&block)} if has_children?
102: end
103:
104: end
105: end
And this leads to another run through the filter. The longer the document, the more superfluous loops through the filter.
The Selector class matcher uses the regular expression \.[a-zA-Z-]+
, which prevents selectors with underscores. Technically, any character can be used in a CSS class name as long as it's escaped properly in the stylesheet rule; but the _
character is valid without being escaped, and indeed underscores are commonly used in modern CSS namespacing conventions such as BEM. There's an alternative regex in this StackOverflow answer.
Pandoc 2 should be getting close to release (there is one outstanding issue, and documentation to finish).
pandoc-types has been upgraded to 1.17.1 — and thus paru seems currently broken with the latest Pandoc 2 nightlies:
👉 pandocomatic -b 'Lu.md'
pandoc --standalone \
--filter=/Users/ian/.pandoc/filters/removeHR \
--filter=/Users/ian/.pandoc/filters/authorRemoveHash \
--bibliography=/Users/ian/.pandoc/Core.json \
--csl=/Users/ian/.pandoc/csl/neuron.csl \
--from=markdown \
--to=docx \
--reference-doc=/Users/ian/.pandoc/templates/custom.docx \
--dpi=300 \
--output=./Lu.docx
Error running filter /Users/ian/.pandoc/filters/removeHR:
Error in $['pandoc-api-version'][3]: expected Int, encountered Null
Pandocomatic needed 1.2 seconds to convert 'Lu.md'.
I tried to find how this works, but can't see why this bug is triggering (how is pandoc-api-version
generated?).
👉 pandoc --version
pandoc 2.0
Compiled with pandoc-types 1.17.1, texmath 0.9.4.1, skylighting 0.3.3
Default user data directory: /Users/ian/.pandoc
Copyright (C) 2006-2017 John MacFarlane
Web: http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
Hi Huub, long time! Hope all is good with you.
There is a small naming bug with V2 Pandoc options:
https://pandoc.org/MANUAL.html#citation-rendering
Should be citation_abbreviations
but in https://github.com/htdebeer/paru/blob/master/lib/paru/pandoc_options_version_2.yaml#L116 it is citation_abbreviation
and so we get the error:
The pandoc option 'citation_abbreviations' (with value '/Users/ian/.pandoc/cite-abbr.json') is not recognized by paru. This option is skipped.
I'll create a pull request...
👉 pandoc -h
pandoc [OPTIONS] [FILES]
...
--toc, --table-of-contents
--toc-depth=NUMBER
...
https://github.com/htdebeer/paru/blob/paru-for-pandoc2/lib/paru/pandoc_options_version_2.yaml
I'm not sure what the default value is...
Erro: JSON parse error: Error in $: Incompatible API versions: encoded with [1,20] but attempted to decode with [1,21].
/var/lib/gems/2.7.0/gems/paru-0.4.0.1/lib/paru/filter/metadata.rb:49:in `initialize': undefined method `empty?' for nil:NilClass (NoMethodError)
from /var/lib/gems/2.7.0/gems/paru-0.4.0.1/lib/paru/filter.rb:272:in `new'
from /var/lib/gems/2.7.0/gems/paru-0.4.0.1/lib/paru/filter.rb:272:in `filter'
from /var/lib/gems/2.7.0/gems/paru-0.4.0.1/lib/paru/filter.rb:244:in `run'
from ./filtro.rb:6:in `<main>'
Error running filter filtro.rb:
Filter returned error status 1
Can you help me?
I don't know if this is a bug or not but testing the "hello world" test script on Windows 10 it complains with this:
pandoc:
: openFile: invalid argument (Invalid argument)
Ruby version: ruby 2.4.0p0 (2016-12-24 revision 57164) [i386-mingw32]
Tell me if you need more information.
Thanks in advance!
Upgrade to pandoc 1.18. In particular process the added command line options and the changes to the JSON format
In previous version I could delete metadata entries by typing:
metadata.delete("date”)
as of version paru-0.2.4.6 it does not work anymore and I get:
/var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/meta_map.rb:185:in
’
select': undefined method
has_key?' for []:Array (NoMethodError)
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/meta_map.rb:139:inhas?' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/meta_map.rb:150:in
delete'
from bin/removeMetadataFilter.rb:6:inblock in <main>' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:268:in
instance_eval'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:268:inblock in filter' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:101:in
each_depth_first'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:102:inblock in each_depth_first' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:104:in
block in each'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:103:ineach' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/node.rb:103:in
each'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter/ast_manipulation.rb:102:ineach_depth_first' from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:266:in
filter'
from /var/lib/gems/2.3.0/gems/paru-0.2.4.8/lib/paru/filter.rb:228:inrun' from bin/removeMetadataFilter.rb:4:in
Of course I’m not 100% sure if this is a correct way of removing metadata (I have metadata in latex/pdf output but remove them for HTML output)
I have attached a test example - you can execute:
pandoc "test.markdown" --filter testFilter.rb -o test.html
Hi after a long time!
I'm struck trying to capture this block of paragraphs to generate a custom html output. For example. I want to capture all the markdown code inside :::alert and :::
:::alert
A simple **alert**
with a few paragraphs and...
other stuff like lists:
* One
* Two
:::
and produce some custom HTML output like this:
<div class="alert">
<p>A simple <strong>alert</strong></p>
<p>with a few paragraphs and...</p>
<p>other stuff like lists:</p>
<ul><li>One</li><li>Two</li></ul>
</div>
Is that possible with Paru?
Thanks in advance!
I found this Paru documentation (https://heerdebeer.org/Software/markdown/paru/#frequently-asked-questions) very heplful but I'm stuck at the time of extract the metadata from the markdown file (section 2.1)
$ ./pandoc2yaml.rb:32:in `values_at': no implicit conversion of String into Integer (TypeError)
I googled a bit (https://stackoverflow.com/questions/20790499/no-implicit-conversion-of-string-into-integer-typeerror) but I did'nt obtain a way to workaround that issue.
I'm having this issue with paru:
C:/abc/methods.rb:44:in `block in gen_epub': undefined method
`epub_stylesheet' for #<Paru::Pandoc:0x0000000002a08be0> (NoMethodError)
Did you mean? epub_chapter_level
from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/paru-0.3.0.0/lib/paru/pandoc.rb:137:in `instance_eval'
from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/paru-0.3.0.0/lib/paru/pandoc.rb:137:in `configure'
from C:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/paru-0.3.0.0/lib/paru/pandoc.rb:107:in `initialize'
What is going on here?
When I try to run the filter
#!/usr/bin/env ruby
require "paru/filter"
Paru::Filter.run do
with "Code" do |n|
warn n.string
end
end
on the text
This is a *line* with some `code` in it
I get the error
selector.rb:105:in `expect_pandoc_type': Expected a Pandoc type, got 'Code' instead (Paru::SelectorParseError)
Using the pattern suggested in #54, I have setup a filter to be able to inject an HTML class to a blockquote element in HTML.
filter.rb:
require 'paru/filter'
require 'paru/pandoc'
def html_convert(string)
Paru::Pandoc.new do
from 'markdown'
to 'html'
end << string
end
def classed_blockquote(blockquote_class, contents)
Paru::PandocFilter::RawBlock.new([
'html',
"<blockquote class='#{blockquote_class}'>\n#{html_convert(contents)}</blockquote>"
])
end
Paru::Filter.run do
with 'Div.classed_blockquote' do |div|
content = div.inner_markdown
blockquote_class = div.attr['blockquote_class']
div.parent.replace(div, classed_blockquote(blockquote_class, content))
end
end
The intention here, as an example, would be to apply a class which is related to CSS presentation of poetry. Given the following markdown:
test.md
:::{.classed_blockquote blockquote_class='hanging_indent'}
‘Let 'em talk 'bout what they think they see
Let ’em talk 'bout how they see us be
'Cause baby, we got nothin' to prove
We earned our scars and put in our time
:::
I expect this output:
<blockquote class='hanging_indent'>
<p>‘Let ’em talk ’bout what they think they see</p>
<p>Let ’em talk ’bout how they see us be</p>
<p>‘Cause baby, we got nothin’ to prove</p>
<p>We earned our scars and put in our time</p>\n</blockquote>
However, I think related to my use of inner_markdown
to access the node's children, the result I'm getting turns the initial quote mark the other way around, no matter what I do, even if I escape it as "\‘"
:
<blockquote class='hanging_indent'>
<p>’Let ’em talk ’bout what they think they see</p>
<p>Let ’em talk ’bout how they see us be</p>
<p>‘Cause baby, we got nothin’ to prove</p>
<p>We earned our scars and put in our time</p>\n</blockquote>
I've tried a few ways of individually parsing the nodes along the lines of the below snippet, as I thought perhaps due to the nesting I was confusing pandoc, but still suffer from the same issue.
children = div.children
children.each do |node|
html_convert(node.inner_markdown)
end
Anytime I attempt to convert a node to html from within a filter, I'm seeing results in the quote mark being turned the wrong way around. This doesn't happen if I simply run the markdown above without the filter (though of course the output paragraphs are wrapped in <div class="classed_blockquote" data-blockquote_class="hanging_indent">
instead.
I'm obviously approaching this in a way that is confusing pandoc's implementation of --smart, likely due to the nesting of elements or the use of 'inner_markdown', but I'm not sure how else to access a node's contents.
Should I be individually converting them to_ast/json and then transforming that into my result HTML?
Hi!
That's not an issue. I started to write my own filter based on this: https://github.com/jdittrich/SplitMarkdownFilter/blob/master/writeSplitPandocJSON.js The problem is that my ruby skills are very limited. At the moment I'm stuck to figure out how to capture on Paru with every markdown content between level one headers.
I do appreciate any help.
Hello,
I just discovered this neat project and will look into it more closer in the future. I already have one question: Is it possible to use the paru filters within the original pandoc command, e.g. pandoc --filter=my-paru-filter.rb
?
Hello,
you provide the following example code:
Paru::Filter.run do
metadata.delete "pandoc"
end
I added a print statement and noted that this line is actually called many times. It would be nice if it could be executed only once and if one can control if it is executed before other with "selector"
blocks or after.
It is not always possible to add or change the metadata in a filter. For example, I can delete a key-value from a MetaMap, but I cannot add one. Adding a simple string involves creating a MetaString, which is more complex than just setting a string. And I am not sure how to set MetaInline or MetaBlock easily, if at all.
Hello,
I wont to write a filter to replace keywords with variables. Keywords use the same syntax like Pandoc template variables, i.e. $keyword$
.
My filter-replace.rb
:
#!/usr/bin/env ruby
require 'paru/filter'
Paru::Filter.run do
with "Math" do |str|
STDERR.puts str.inner_markdown.inspect
end
end
My markdown file for testing:
# Title
In Paragraph: $replace.in_para$.
In Table
---------
$replace.in_table$
However, the selector filters only for the first key word in the paragraph, not the second in the table. How can I get both?
I'm writing a filter where I want to change my modification based on whether the pandoc output format is docx or latex. It would be good if pandocomatic could somehow pass this information to paru for use in filters. As far as I can tell you strip the pandocomatic_ field before invoking pandoc (I can't see it in metadata anyway). The easiest way to solve this is to keep pandocomatic_: and add the to: field from the pandocomatic.yaml template to the document metadata before pandoc gets it. That way a filter would have this info available to use.
Is it possible to choose an specific output format on the context of a filter? Let me explain: for a some reason, I want to change the uri of the wholes images of a document but only for the HTML ouput, not for PDF output (and mantaining the original images uri on that last case)
Hello,
I experienced problems with paru because some packaged ruby version is too old and does not support require_relative. I think the minimum ruby version can be specified in the gemspec file.
According to https://www.rubydoc.info/gems/require_relative/1.0.3 , require_relative needs 1.9.2 or later.
The gem require_relative provides a backport
Hi, using the new metadata.yaml version of the add_today.rb, my pandocomatic compiles are hanging, and I have to do a CTRL+C to force break:
👉 pandocomatic --debug "SNN-Attention-V1.4.md"
^Cpandoc: Error running filter /Users/ian/.pandoc/filters/add_today.rb
user interrupt
/Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/paru-0.2.4.8/lib/paru/pandoc.rb:161:in `read': Interrupt
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/paru-0.2.4.8/lib/paru/pandoc.rb:161:in `block in convert'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/paru-0.2.4.8/lib/paru/pandoc.rb:158:in `popen'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/paru-0.2.4.8/lib/paru/pandoc.rb:158:in `convert'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/command/convert_file_command.rb:157:in `pandoc'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/command/convert_file_command.rb:92:in `convert_file'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/command/convert_file_command.rb:59:in `run'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/command/command.rb:87:in `execute'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/command/convert_file_multiple_command.rb:81:in `block in execute'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/command/convert_file_multiple_command.rb:80:in `each'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/command/convert_file_multiple_command.rb:80:in `execute'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/lib/pandocomatic/pandocomatic.rb:108:in `run'
from /Users/ian/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/pandocomatic-0.1.4.1/bin/pandocomatic:3:in `<top (required)>'
from /Users/ian/.rbenv/versions/2.4.1/bin/pandocomatic:22:in `load'
from /Users/ian/.rbenv/versions/2.4.1/bin/pandocomatic:22:in `<main>'
other paru filters are working.
EDIT: this filter also contains a <<~
HEREDOC, but making it <<-
and it still hangs, and this was tested on Ruby 2.4.1 (installed via rbenv)
#!/usr/bin/env ruby
## Add today's date to the metadata
require "paru/filter"
require "date"
Paru::Filter.run do
metadata.yaml <<-YAML
---
date: #{Date.today.to_s}
...
YAML
end
Issue created in reaction to the pandocomatic issue: Broken with Pandoc 2.
Currently pandoc 2 is in the making. This new major version promises some API changes. Although it might be a while before it is released, it is good to be prepared for when it is. Particularly because it is not unlikely that some users will keep on using the 1.x range while others start using the 2.x range.
I suggest adding an environment variable, PARU_PANDOC_PATH
, that can be used to choose which version of pandoc to use. If no such environment variable exists, or if the path does not resolve, try to use the pandoc
executable in the system's PATH
. Paru's pandoc API will change with the pandoc version used: some pandoc CLI options in 1.x are not there in 2.x and vice versa.
Trying to compile a document using Pandoc 2.10 and the new underline type, I am getting an error when running a Paru filter:
pandocomatic.yaml template:
templates:
test:
pandoc:
from: markdown
to: html5
filter:
- ./noop.rb
metadata:
lang: 'EN-GB'
noop.rb filter:
#!/usr/bin/env ruby
require 'paru/filter'
Paru::Filter.run do
stop!
end
test.md
---
title: "Underline test"
pandocomatic_:
use-template: test
---
# Abstract #
[Lørem ipsum dolør sit amet]{.underline} , eu ipsum movet vix, veniam låoreet posidonium te eøs, eæm in veri eirmod. Sed illum minimum at, est mægna alienum mentitum ne. Amet equidem sit ex. Ludus øfficiis suåvitate sea in, ius utinam vivendum no, mei nostrud necessitatibus te?
The error I get is the following:
➜ pandocomatic -b -c pandocomatic.yaml test.md
pandoc --from=markdown \
--to=html5 \
--filter=noop.rb
Error running filter noop.rb:
Error in $.blocks[1].c[0]: mempty
Error running pandoc => error while running:
pandoc --from=markdown \
--to=html5 \
--filter=noop.rb
Pandoc responded with:
Error running filter noop.rb:
Error in $.blocks[1].c[0]: mempty
If I change the class of the [inline]{} to something other than underline
, then it compiles without issue...
➜ pandoc -v
pandoc 2.10
Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5
Default user data directory: /Users/ian/.local/share/pandoc or /Users/ian/.pandoc
Copyright (C) 2006-2020 John MacFarlane
Web: https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
➜ pandocomatic -v
Pandocomatic version 0.2.7.2
© 2014—2020 Huub de Beer <[email protected]>
Pandocomatic is free software; pandocomatic is released under the GPLv3.
For more information about pandocomatic run 'pandocomatic --help' or read its
documentation at https://heerdebeer.org/Software/markdown/pandocomatic/.
➜ gem list | grep paru
paru (0.4.1)
Working with insert_code_block.rb filter doesnt allow to modify the first of the block element of the document inserted. I attached a zip with a simple demo (add a silly text in capitals at the beginning of every paragraph). Check out the PDF to see the issue.
Hello,
I wrote a filter that replaces some ::paru
paragraph with a markdown table. Preparing the table as markdown does not feel like a very clean solution.
Is there a way to let paru directly convert a ruby 2D array/Hash to a table? Is there support to set a caption?
Best,
Robert
The pandoc option --pdf-engine-opt
should be allowed to be used multiple times. Issue originates from pandocomatic issue
Hi,
I just wanted to say "Thank You" for this project. I just used it to implement an include filter for pandoc, and with paru it was a dead-simple task.
Thanks again, and keep up the good work!
Steffen
I'm not sure why, but in certain circumstances, a filter will fail to match an element that it should match.
For example, given this markdown source:
# Chapter One
## A location
This is chapter one. It has a bunch of text.
# Chapter Two
## Another location
This is chapter two. It has some more text.
## A Subsequent location
The end.
The following filter will fail to match the first two H2 headings:
require 'paru/filter'
Paru::Filter.run do
with 'Header' do |header|
warn "Header is #{header.inner_markdown}"
if header.level == 1
header.markdown = "\\chapter\*\{#{header.inner_markdown.strip}\}\\label\{#{header.attr.id}\}"
end
if header.level == 2
header.markdown = "\\section\*\{#{header.inner_markdown.strip}\}\\label\{#{header.attr.id}\}"
end
if header.level == 3
header.markdown = "\\subsection\*\{#{header.inner_markdown.strip}\}\\label\{#{header.attr.id}\}"
end
end
end
I guess the manipulation of a given node's markdown is resulting in the following node being skipped over by the filter, as the nodes only seem to be passed over when they are immediately following another node that has matched.
It's definitely related to the 'markdown' method manipulation, as if I simply run a filter like this one below, every heading node is listed:
Paru::Filter.run do
with 'Header' do |header|
warn "Header is #{header.inner_markdown}"
end
end
At the moment I'm resorting to a relatively hacky check of the next node in the index's type, and if it is also a Heading I am manipulating it as well. But ideally I would expect this filter to match every heading node in the document.
Should I be manipulating the output of those nodes in a different way? Or is there a bug that is resulting in a skipped over node in the index when a mode has been altered?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.