dcsunset / pandoc-include Goto Github PK

View Code? Open in Web Editor NEW

63.0 3.0 13.0 359 KB

A pandoc filter to allow file and header inclusion

License: MIT License

Python 70.63% Makefile 2.50% TeX 3.04% JavaScript 2.83% C++ 0.31% XSLT 7.64% C 10.75% Nix 2.28%

pandoc pandoc-filter markdown python

pandoc-include's People

Stargazers

Watchers

Forkers

ma10 cwgoes gullumluvl w-floyd ljcolling joaosreis qzdeng nando-kawka towi callmenp ypsomed kanarip belonesox

pandoc-include's Issues

No longer automatically inserting pagebreaks

I recently changed laptop for writing and set up my writing environment, now running pandoc-include 1.2.0 and pandoc 3.0.1.

When I included .md-files in a main one, I didn't need to include pagebreaks for each chapter of my book, which are in separate .md-files. Now I have to do this for it to work:

\pagebreak

!include forord.md

\pagebreak

!include 01.md

\pagebreak

Anything I have missed?

Invalid API Version

I have a test file that looks like this, note that it doesn't use any include syntax.

---
title: Title
---

# Header

Text

When I compile this using the filter like so

pandoc -t latex -f markdown --filter pandoc-include -o test.pdf test.md

I get the following error

Traceback (most recent call last):
  File "/home/rutrum/.local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/home/rutrum/.local/lib/python3.8/site-packages/pandoc_include/main.py", line 333, in main
    return pf.run_filter(action, doc=doc)
  File "/home/rutrum/.local/lib/python3.8/site-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/home/rutrum/.local/lib/python3.8/site-packages/panflute/io.py", line 200, in run_filters
    doc = load(input_stream=input_stream)
  File "/home/rutrum/.local/lib/python3.8/site-packages/panflute/io.py", line 58, in load
    doc = json.load(input_stream, object_hook=from_json)
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/home/rutrum/.local/lib/python3.8/site-packages/panflute/elements.py", line 1362, in from_json
    return Doc(*items, api_version=api, metadata=meta)
  File "/home/rutrum/.local/lib/python3.8/site-packages/panflute/elements.py", line 66, in __init__
    raise TypeError("invalid api version", api_version)
TypeError: ('invalid api version', [1, 17, 5, 4])
Error running filter pandoc-include:
Filter returned error status 1

I get similar errors when using the include syntax. I installed this using pip, and pip show pandoc-include reveals that I am using version 1.2.

Non english characters are not processed correctly

If I have these spanish words in my main *.md file:
Descripción del producto
It gets correctly converted to PDF.

However, if these same words are included and hence processed by the pandoc-include filter, I get this result in the PDF:

DescripciÃ³n del producto

As you can see, that special "o" character got messed up. Any idea how to prevent this?

If this can work, it will save me a lot of time!

Thanks!

Too strict on "File not found"

When attempting to include a file that doesn't exist, version 1.2.0 would simply print a warning:

pandoc-include/pandoc_include/main.py

Line 194 in a51b798

eprint('[Warn] included file not found: ' + name)

But version 1.2.1 throws an IOError, essentially aborting pandoc:

pandoc-include/pandoc_include/main.py

Line 261 in 1401013

raise IOError(f"Included file not found: {name}")

That's too harsh, in my opinion. For example, I was trying to render a file which is an "instruction" on how to use pandoc-include itself. So there would be a line there, like

!include <your_file>

And this line is now failing the whole rendering.
I believe a warning was more appropriate.

Using pandoc-include results in NotImplementedError

I installed pandoc-include through pip and added the executable to the path. Upon trying to use the filter I get this error:

Traceback (most recent call last):
  File "/Users/axelkennedal/.local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/Users/axelkennedal/.local/lib/python3.7/site-packages/pandoc_include.py", line 166, in main
    return pf.run_filter(action, doc=doc)
  File "/Users/axelkennedal/.local/lib/python3.7/site-packages/panflute/io.py", line 224, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/Users/axelkennedal/.local/lib/python3.7/site-packages/panflute/io.py", line 197, in run_filters
    doc = load(input_stream=input_stream)
  File "/Users/axelkennedal/.local/lib/python3.7/site-packages/panflute/io.py", line 58, in load
    doc = json.load(input_stream, object_hook=from_json)
  File "/Users/axelkennedal/miniconda3/lib/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/Users/axelkennedal/miniconda3/lib/python3.7/json/__init__.py", line 361, in loads
    return cls(**kw).decode(s)
  File "/Users/axelkennedal/miniconda3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/axelkennedal/miniconda3/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/Users/axelkennedal/.local/lib/python3.7/site-packages/panflute/elements.py", line 1375, in from_json
    raise NotImplementedError(f'Unknown tag: {tag}')
NotImplementedError: Unknown tag: Caption
Error running filter pandoc-include:
Filter returned error status 1

Is this a bug or an issue with my local installation? I use the filter like this in a bash script: pandoc ${DOCPATH} -o ${OUTPUTFILE} --filter pandoc-include. Compilation with pandoc works perfectly fine if I remove the pandoc-include filter.

I'm on macOS 10.15.6 with Python 3.7.6.

invalid api version?

I am trying to run the example file provided in the project homepage and I get this error:

Traceback (most recent call last):
  File "/home/NAME/.local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/home/NAME/.local/lib/python3.10/site-packages/pandoc_include/main.py", line 333, in main
    return pf.run_filter(action, doc=doc)
  File "/home/NAME/.local/lib/python3.10/site-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/home/NAME/.local/lib/python3.10/site-packages/panflute/io.py", line 200, in run_filters
    doc = load(input_stream=input_stream)
  File "/home/NAME/.local/lib/python3.10/site-packages/panflute/io.py", line 58, in load
    doc = json.load(input_stream, object_hook=from_json)
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/home/NAME/.local/lib/python3.10/site-packages/panflute/elements.py", line 1413, in from_json
    return Doc(*items, api_version=api, metadata=meta)
  File "/home/NAME/.local/lib/python3.10/site-packages/panflute/elements.py", line 66, in __init__
    raise TypeError("invalid api version", api_version)
TypeError: ('invalid api version', [1, 20])
Error running filter pandoc-include:
Filter returned error status 1

The installation is local (I edited the real folder name for security). The problem seems to be an invalid api version; I am almost illiterate insofar as Python is concerned and I ask your kind help.

Pandoc version: 2.9.2.1-3ubuntu2.
I am on Linux Mint 21.
pandoc-include version: 1.2.0

This filter looks excellent and I would really like to be able to use it.

Thank you.

reST directives are not executed in the included files

Hi,

I have two reST files, let's call them A and B:

A.rst :

Heading
---------- 

.. figure:: imageA.png

$include B.rst

B.rst :

.. figure:: image.png

The ..figure line is inserted vertbatim in the output document, instead of being replaced by an image.

I'm new to pandoc, and I assume this issue arises because the filters run after the reST directives have been processed. Is there any workaround for this ?

Thanks !

Could not find executable pandoc-include

Hey @DCsunset ,

Thanks for this cool Pandoc feature. While trying it on Windows 10, running pandoc paper.md --filter pandoc-include -o output.pdf gives me:

Error running filter pandoc-include:
Could not find executable pandoc-include

My paper.md file follows the same syntax as the example provided, although it seems a binary problem. If I search the other Pandoc filters I use, they're all in the Scripts/ folder inside Anaconda. However, I cannot find pandoc-include.

where pandoc-*
...\Anaconda3\Scripts\pandoc-citeproc.exe
...\Anaconda3\Scripts\pandoc-eqnos.exe
...\Anaconda3\Scripts\pandoc-fignos.exe
...\Anaconda3\Scripts\pandoc-tablenos.exe

Any ideas? Thanks in advance!

Error: invalid api version

Hi there,

I've tried to run a very simple test but I got the error below.

> python -V
Python 3.9.2
> pip -V
pip 22.3.1 from /home/souto/.local/lib/python3.9/site-packages/pip (python 3.9

> pip install --user pandoc-include
Collecting pandoc-include
  Downloading pandoc_include-1.2.0-py3-none-any.whl (10 kB)
Collecting panflute>=2.0.5
  Downloading panflute-2.2.3-py3-none-any.whl (36 kB)
Collecting natsort>=7
  Downloading natsort-8.2.0-py3-none-any.whl (37 kB)
Requirement already satisfied: pyyaml<7,>=3 in /home/souto/.local/lib/python3.9/site-packages (from panflute>=2.0.5->pandoc-include) (6.0)
Requirement already satisfied: click<9,>=6 in /home/souto/.local/lib/python3.9/site-packages (from panflute>=2.0.5->pandoc-include) (8.1.3)
Installing collected packages: panflute, natsort, pandoc-include
Successfully installed natsort-8.2.0 pandoc-include-1.2.0 panflute-2.2.3

> pip show pandoc-include
Name: pandoc-include
Version: 1.2.0
Summary: Pandoc filter to allow file and header includes
Home-page: https://github.com/DCsunset/pandoc-include
Author: DCsunset
Author-email: [email protected]
License: MIT
Location: /home/souto/.local/lib/python3.9/site-packages
Requires: natsort, panflute
Required-by: 


> cat snippet.md             
## Hello world
> cat main.md 
# Main

!include snippet.md

> pandoc main.md --filter pandoc-include -o output.pdf
Traceback (most recent call last):
  File "/home/souto/.local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/home/souto/.local/lib/python3.9/site-packages/pandoc_include/main.py", line 333, in main
    return pf.run_filter(action, doc=doc)
  File "/home/souto/.local/lib/python3.9/site-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/home/souto/.local/lib/python3.9/site-packages/panflute/io.py", line 200, in run_filters
    doc = load(input_stream=input_stream)
  File "/home/souto/.local/lib/python3.9/site-packages/panflute/io.py", line 58, in load
    doc = json.load(input_stream, object_hook=from_json)
  File "/usr/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.9/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/home/souto/.local/lib/python3.9/site-packages/panflute/elements.py", line 1362, in from_json
    return Doc(*items, api_version=api, metadata=meta)
  File "/home/souto/.local/lib/python3.9/site-packages/panflute/elements.py", line 66, in __init__
    raise TypeError("invalid api version", api_version)
TypeError: ('invalid api version', [1, 20])
Error running filter pandoc-include:
Filter returned error status 1

I'd be grateful for any tips.
Cheers, Manuel

(Feature request) Include only parts of files

It is my understanding that pandoc-include necessarily includes the included file in its entirety.

Pandoc-include-code have the ability of including only parts of the linked files, either by

Adding such a feature to pandoc-include would be absolutely amazing, but I don't have a good sense of how feasible that is.

support mdbook include syntax

It would be nice to have an option to support the mdbook include syntax,
that'd help portability of markdown documents across tools
(e.g. using mdbook to generate html and pandoc to generate pdf from the same markdown files)

files with _ (underscore) in their name

works fine with normal filenames but

```py
!include sound/__init__.py
‌‍```

does nothing:

add id-prefix option

Add an id-prefix option for included files, that'd help to avoid conflicting heading IDs from multiple files.
(see the --id-prefix command line option and the identifier-prefix in YAML)

This could be either set explicitly or optionally auto-generated based on the basename of the included path

newlines in src do not work

When my included source code contained a newline I got a backtrace:

Traceback (most recent call last):
  File "/usr/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3.10/site-packages/pandoc_include/main.py", line 333, in main
    return pf.run_filter(action, doc=doc)
...
  File "/usr/lib/python3.10/site-packages/pandoc_include/main.py", line 309, in action
    codes.append(read_file(fn, config))
  File "/usr/lib/python3.10/site-packages/pandoc_include/main.py", line 160, in read_file
    content = "\n".join(dedent(content, config["dedent"]))
TypeError: sequence item 4: expected str instance, NoneType found
Error running filter pandoc-include:
Filter returned error status 1

I fixed this by adding a check in removeLeadingWhitespaces():

def removeLeadingWhitespaces(s, num):
    if not s.strip():  # <<< empty lines didn't work
        return s
    regex = re.compile(r"[^\s]")
    m = regex.search(s)
    if m == None:
        return
    pos = m.span()[0]
    if num < 0:
        return s[pos:]
    else:
        return s[min(pos, num):]

Recursive includes not working

I have a master.md file which has !include statements for dep1.md and dep2.md. dep2.md also has !include statements. When I run

$ pandoc master.md --filter pandoc-include -o processed.md -t markdown -s

the include statements in dep2.md are not processed. Documentation says that recursive includes are supported since v0.3.1. I am on v0.3.2. Is there anything else needed to make recursive imports work?

Parameterized include

Hi,

Is it possible to parameterize include ?

Our use case is to include a customized code block.

Maybe is there a workaround ?

Regards,
Étienne

Possible to use within code blocks?

Just wanted to ask if there's a way (or alternative syntax) to use pandoc-include to process include statements within code blocks, a la the below?

```c
!include foo.c
```

We noticed that, by default, the above simple outputs !include foo.c literally, as via:

<div class="sourceCode" id="cb3"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>!include foo.c</span></code></pre>

Many thanks!

Any way to pass options to the executed pandoc?

I execute pandoc with several options to controll the resulting format, etc.

So far, however, I haven't found any way to pass the same options to the executed pandoc via pandoc-include.

This will make some inconsistency in resulting files, between the including and the included files.

It would be extremely useful if I could list options to be passed to the child process in the header of files to be included, for example.

Bug with code blocks (containing tab?)

The following markdown file:

```
test
```

when compiled with

pandoc --filter pandoc-include test.md -o test.html

returns

Traceback (most recent call last):
  File "~/.local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 380, in main
    return pf.run_filter(action, doc=doc)
  File "~/.local/lib/python3.7/site-packages/panflute/io.py", line 224, in run_filter
    return run_filters([action], *args, **kwargs)
  File "~/.local/lib/python3.7/site-packages/panflute/io.py", line 205, in run_filters
    doc = doc.walk(action, doc)
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 264, in walk
    ans = list(chain.from_iterable(ans))
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 262, in <genexpr>
    ans = ((item,) if type(item) != list else item for item in ans)
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 259, in <genexpr>
    ans = (item.walk(action, doc) for item in obj)
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 275, in walk
    altered = action(self, doc)
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 363, in action
    includeType, name, config = is_code_include(elem)
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 98, in is_code_include
    value, name, config = is_include_line(new_elem)
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 67, in is_include_line
    if (len(elem.content) not in [3, 4]) \
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 106, in content
    return self._content
AttributeError: 'CodeBlock' object has no attribute '_content'
Error running filter pandoc-include:
Filter returned error status 1

Note that the file does not perform any actual inclusion and compile just fine without the filter.

Raw latex in yaml headers

I have an issue when using raw Latex in the document's yaml header:

---
title: Title
date: \today{}
---

Here is a file in the same directory:

!include file.md

Yields the following pdf:

This is not a huge problem on its own, but header-includes also doesn't work, which is a bit more of an issue.

Any thoughts?

Could not find executable pandoc-include

I know that Issue #1 seems very similar to this, however I have followed the solution there and it does not work for me.

Problem

When trying to use pandoc-include like pandoc .\Book.md --filter pandoc-include -o book.pdf I get the error
Error running filter pandoc-include: Could not find executable pandoc-include

Background

I have done the following:

Install Python from https://www.python.org/downloads/
Run pip install --upgrade pandoc, pip install pandoc-include
Moved pandoc-include from the Roaming folder it was installed in (C:\Users\myuser\AppData\Roaming\Python\Python311\site-packages), to the same folder as where pandoc is installed in (C:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages)
Made sure that pandoc-include is in the path variable: running echo $env:PATH returns my path, which contains C:\Users\myuser\AppData\Local\Programs\Python\Python311\Lib\site-packages

My pandoc version is 2.3 and pandoc-include is 1.2.0.

Comments in code blocks returns error

It seems as if the issue with code blocks runs deeper than expected. The filter returns an error when the code block contains a comment containing too much hashtags.

# This is a title

This is some text.

```python
################## Comment ##################
```

Compiling this with the filter active gives the following error:

> pandoc -f markdown -t latex test.md -o test.pdf --filter pandoc-include
TypeError: Header level not between 1 and 10
Error running filter pandoc-include:
Filter returned error status 1

Without the filter it compiles just fine. It seems like the filter handles the code blocks like seperate markdown files.

Traceback

Traceback (most recent call last):
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/bin/pandoc-include", line 10, in <module>
    sys.exit(main())
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/pandoc_include/main.py", line 382, in main
    return pf.run_filter(action, doc=doc)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/io.py", line 224, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/io.py", line 205, in run_filters
    doc = doc.walk(action, doc)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/base.py", line 264, in walk
    ans = list(chain.from_iterable(ans))
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/base.py", line 262, in <genexpr>
    ans = ((item,) if type(item) != list else item for item in ans)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/base.py", line 259, in <genexpr>
    ans = (item.walk(action, doc) for item in obj)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/base.py", line 275, in walk
    altered = action(self, doc)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/pandoc_include/main.py", line 365, in action
    includeType, name, config = is_code_include(elem)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/pandoc_include/main.py", line 99, in is_code_include
    new_elem = pf.convert_text(elem.text)[0]
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/tools.py", line 459, in convert_text
    out = json.loads(out, object_hook=from_json)
  File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/elements.py", line 1371, in from_json
    return _res_func[tag](c)
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/elements.py", line 1282, in <lambda>
    'Header': lambda c: Header(
  File "/mnt/c/Users/Tom Van Eyck/Documents/KULeuven/Studentenjob/documentation/venv/lib/python3.8/site-packages/panflute/elements.py", line 379, in __init__
    raise TypeError('Header level not between 1 and 10')

Adjust section level in included markdown files

Consider the following markdown

# Software components
## Platform

!include J1939Tp/J1939Tp.md

J1939Tp.md:

# J1939Tp

Implements the Transport Protocol (TP) 
.
.
.

Then I would like the heading J193Tp to be on level 3 in the merge output. "Software components" is level 1, "Platform" is level 2. The filter should know the heading level at the point of include and adjust heading levels in included file accordingly. This makes it possible to reuse included .md files in several documents.

Included figures with relative paths are not found

Hi,

I have the following folder structure and files:

.
├── A.rst
├── included 
│   ├── B.rst
│   ├── diagram.jpg

A.rst:

Heading
----------

$include included/B.rst

B.rst:

.. figure:: imageB.png

The figure in B.rst is included via a relative path (relative to B). When B is included into A, this path is included as-is and becomes invalid:

> pandoc  --filter=pandoc-include -f rst -t pdf A.rst
[WARNING] Could not fetch resource diagram.jpg: replacing image with description

For that use case to work, I guess the $include should rewrite the paths to the figures and prepend the path from A to B, so that the relative paths become relative to A. Another option would be to convert them to absolute paths during the include, although I'm not sure how that fits into the overall pandoc-filter philosophy.

Filename's underscore gets escaped?

This problem may be too niche to be investigated here, but I thought it could be useful to mention it in case other have the same issue.

When I compile this website on-line (using github's action), then the name of the files see their "_" being escaped, which breaks the inclusion.

What I write: !include code/overflow_example.cs
Where the file is: code/overflow_example.cs

What github's action says:

 [WARNING] Included file not found: code/overflow\_example.cs

As a side note, if I get rid of the underscore but use quotes, that is, if I write!include "code/overflowExample.cs" (using quotes, as indicated at https://github.com/DCsunset/pandoc-include?tab=readme-ov-file#syntax), then the quotes themselves get modified and I get

[WARNING] Included file not found: “code/overflowExample.cs”

I discuss this issue further here: csci-1301/csci-1301.github.io#155 (comment) . I am writing that this issue may be too niche because I din't see it when I compile this source code locally on my machine (even though github's action uses the exact same version of pandoc and of pandoc-include than I do, the most recent at the time of writing this issue in both cases).

Support for `sourcepos` extension recursively, to provide sync-preview.

I am working on vscode extension that provide WYSIWYM/Latex-convinienient way
to work on complex markdown documentation with inclusions
using pandoc-include (with my patch to support it).
And with back/forward syncing like that (small demo):

demo-screencast.mp4

The key feature from pandoc core is core sourcepos extension that can add JSON-tags "datapos" to output.
The problem is that pandoc-include does "not inherit" filters from main document.

So to support it, I wrote couple of fast LUA filters:

desourcepos.lua — special LUA-filter to remove sourcepos tags to read "pure" inclusion command in pandoc-include.
filename-sourcepos.lua — special LUA-filter to add filename to sourcepos tags when we including files in pandoc-include

And add supporting PANDOC_SOURCE_FORMAT environment variable, if it exists, that would mean that we use this format for included files, and add special processing to add included filename to sourcepos tags.

First, before we get into PR, I want to discuss whether this approach looks good.

Include support for bibliography

Pandoc now allows native support for bibliography. However when a citation is made inside an included document, the citation is not rendered.

Is it possible to add support for bibliography ?

pandoc --citeproc --bibliography=doc.bib  main.md --filter pandoc-include -o main.html

with main.md

# Hello world

!include chap01.md

## References

::: {#refs}
:::

with chap01.md

# Chapter 1

[@MyBibliographicRef]

pandoc-include crash when the first line of some code is indented

Consider the following MWE:

```
    I'm indented!
I'm not.
```

pandoc compiles it just fine to

<pre><code>    I&#39;m indented!
I&#39;m not.</code></pre>

but with the latest update, pandoc test.md --filter pandoc_include.py -o test.html crashes with

Traceback (most recent call last):
  File "pandoc_include.py", line 272, in <module>
    main()
  File "pandoc_include.py", line 268, in main
    return pf.run_filter(action, doc=doc)
  File "/home/caubert/.local/lib/python3.7/site-packages/panflute/io.py", line 224, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/home/caubert/.local/lib/python3.7/site-packages/panflute/io.py", line 205, in run_filters
    doc = doc.walk(action, doc)
  File "/home/caubert/.local/lib/python3.7/site-packages/panflute/base.py", line 264, in walk
    ans = list(chain.from_iterable(ans))
  File "/home/caubert/.local/lib/python3.7/site-packages/panflute/base.py", line 262, in <genexpr>
    ans = ((item,) if type(item) != list else item for item in ans)
  File "/home/caubert/.local/lib/python3.7/site-packages/panflute/base.py", line 259, in <genexpr>
    ans = (item.walk(action, doc) for item in obj)
  File "/home/caubert/.local/lib/python3.7/site-packages/panflute/base.py", line 275, in walk
    altered = action(self, doc)
  File "pandoc_include.py", line 251, in action
    includeType, name, config = is_code_include(elem)
  File "pandoc_include.py", line 80, in is_code_include
    value, name, config = is_include_line(new_elem)
  File "pandoc_include.py", line 48, in is_include_line
    firstElem = elem.content[0]
  File "/home/caubert/.local/lib/python3.7/site-packages/panflute/base.py", line 106, in content
    return self._content
AttributeError: 'CodeBlock' object has no attribute '_content'
Error running filter pandoc_include.py:
Filter returned error status 1

This was not the case before this update, and can be resolved by removing the indentation from the first line.

Hard line breaks are not preserved

Pandoc offers an extension for managing hard line breaks within markdown: hard_line_break extension. Unfortunately this extension is not preserved on included files.

For example, issuing:

pandoc --filter pandoc-include --from=markdown+hard_line_breaks -o mainfile.pdf mainfile.md

with this file:

mainfile.md

This will support
hard line break.

<!--- this will not  -->
!include inclusion.md

All the hard brakes are skipped on the included file.
Also html tags (such as <br />) seems to be stripped off the included file.

Implicit header references do not work, when file with header is included after the reference

Pandoc supports implicit header references, as described here.

But when using pandoc-include AND when the file with the referenced header is included after the file with the implicit reference, the reference is not rendered properly. Note that explicit references work fine in the same situation.

For example, consider the following markdown source:

---
title: Testing Implicit References
---

!include 10-chapter1.md

!include 20-chapter2.md

Here is 10-chapter1.md:

[This is Header]

[Link][This is Header]

[Explicit Link](#this-is-header)

Here is 20-chapter2.md:

# This is Header

If I convert it to HTML, I get this:

<p>[This is Header]</p>
<p>[Link][This is Header]</p>
<p><a href="#this-is-header">Explicit Link</a></p>
<h1 data-number="1" id="this-is-header"><span class="header-section-number">1</span> This is Header</h1>

Note that only the explicit reference has been properly created.
If I swap the MD files (so that 20-chapter2.md comes first), there will be no such issue.

Another issue with code blocks and horizontal lines

The following markdown file:

```
* * *
```

when compiled with

pandoc --filter pandoc-include test.md -o test.html

returns

Traceback (most recent call last):
  File "~/.local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 380, in main
    return pf.run_filter(action, doc=doc)
  File "~/.local/lib/python3.7/site-packages/panflute/io.py", line 224, in run_filter
    return run_filters([action], *args, **kwargs)
  File "~/.local/lib/python3.7/site-packages/panflute/io.py", line 205, in run_filters
    doc = doc.walk(action, doc)
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 264, in walk
    ans = list(chain.from_iterable(ans))
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 262, in <genexpr>
    ans = ((item,) if type(item) != list else item for item in ans)
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 259, in <genexpr>
    ans = (item.walk(action, doc) for item in obj)
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 275, in walk
    altered = action(self, doc)
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 363, in action
    includeType, name, config = is_code_include(elem)
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 98, in is_code_include
    value, name, config = is_include_line(new_elem)
  File "~/.local/lib/python3.7/site-packages/pandoc_include/main.py", line 67, in is_include_line
    if (len(elem.content) not in [3, 4]) \
  File "~/.local/lib/python3.7/site-packages/panflute/base.py", line 106, in content
    return self._content
AttributeError: 'HorizontalRule' object has no attribute '_content'
Error running filter pandoc-include:
Filter returned error status 1

Note that the file does not perform any actual inclusion and compile just fine without the filter.

Shared library error

Whenever I run pandoc with --filter pandoc-include I get an error about a shared library:

$ cat test.md | pandoc --filter pandoc-include
pandoc-include: error while loading shared libraries: libHSpandoc-types-1.17.5.1-8DdN87tlyJbAzvpbzH68uw-ghc8.4.3.so: cannot open shared object file: No such file or directory
Error running filter pandoc-include:
Filter returned error status 127

I do have this file on my system however, but as a newer version:

$ locate libHSpandoc-types
/usr/lib/libHSpandoc-types-1.17.5.4-8Iu0ZQ3DY7ZCF41XD1ccvX-ghc8.6.5.so

This seems more likely to be a problem in my installation, although I'm not really sure how. I installed pandoc-include from the Arch Linux AUR (and pip, for that matter). I've already tried reinstalled a lot of the dependencies of this filter, in case something went wrong during linking.

How to debug json error?

Hi,
I am using pandoc 2.11.1.1
and pandoc-include

pip show pandoc-include
Name: pandoc-include
Version: 0.8.4

I split a large markdown file that was generated from a docx and unfortunately pandoc-include fails with a json error. I checked the individual files, they all conform with utf8. So what could be the problem here?

error message attached. Could it be the case that the pandoc and corresponding panflute updates created some incompatibilities with pandoc-include? I had to debug other filters using panflute for that, but I am completely have no idea on how to treat the underlying json error.

Thanks for help

Regards
Peter.

  File "/usr/local/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/pandoc_include.py", line 166, in main
    return pf.run_filter(action, doc=doc)
  File "/usr/local/lib/python3.9/site-packages/panflute/io.py", line 224, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/panflute/io.py", line 205, in run_filters
    doc = doc.walk(action, doc)
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 264, in walk
    ans = list(chain.from_iterable(ans))
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 262, in <genexpr>
    ans = ((item,) if type(item) != list else item for item in ans)
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 259, in <genexpr>
    ans = (item.walk(action, doc) for item in obj)
  File "/usr/local/lib/python3.9/site-packages/panflute/base.py", line 275, in walk
    altered = action(self, doc)
  File "/usr/local/lib/python3.9/site-packages/pandoc_include.py", line 137, in action
    new_elems = pf.convert_text(
  File "/usr/local/lib/python3.9/site-packages/panflute/tools.py", line 393, in convert_text
    out = json.loads(out, object_hook=from_json)
  File "/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
  File "/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20078 (char 20077)
Error running filter pandoc-include:
Filter returned error status 1

Release 1.3.1 not available in Pypi

Latest available release on PiPy is 1.3.0. Version 1.3.1 is not available.

Release was not pushed by latest GitHub Action because Action tried to push version 1.3.0 which already exist.

WARNING  Skipping pandoc_include-1.3.0-py3-none-any.whl because it appears to   
         already exist                                                          
WARNING  Skipping pandoc-include-1.3.0.tar.gz because it appears to already     
         exist

May be the error come from setup.py file which still declare version 1.3.0:

pandoc-include/setup.py

Line 4 in 1d29355

version = '1.3.0'

Bump this version could fix the issue.

My original problem was because pandoc-include version 1.3.0 is not compatible with latest Pandoc release 3.2.
Full error log if needed:

Traceback (most recent call last):
  File "/opt/homebrew/bin/pandoc-include", line 5, in <module>
    from pandoc_include.main import main
  File "/opt/homebrew/lib/python3.11/site-packages/pandoc_include/main.py", line 11, in <module>
    import panflute as pf
  File "/Users/user/Library/Python/3.11/lib/python/site-packages/panflute/__init__.py", line 40, in <module>
    from .autofilter import main, panfl, get_filter_dirs, stdio
  File "/Users/user/Library/Python/3.11/lib/python/site-packages/panflute/autofilter.py", line 13, in <module>
    import click
ModuleNotFoundError: No module named 'click'
Error running filter pandoc-include:
Filter returned error status 1

Workaround is to install with one of the following pip install instruction

pip install --upgrade --force --no-cache git+https://github.com/DCsunset/[email protected]
# or current dev version from Readme:
pip install --upgrade --force --no-cache git+https://github.com/DCsunset/pandoc-include

Thank you for your work on this project.

I can open a PR is needed.

Including latex usepackege in include header reusult in error

I have in my document

# Formatting Settings

documentclass: article
fontsize: 12pt
geometry: margin=1.0in
#geometry: "left=3cm,right=3cm,top=2cm,bottom=2cm"
header-includes:
  - \usepackage{times}
  - \usepackage[mathlines,displaymath]{lineno}
  - \linenumbers
urlcolor: blue

They work, when I include them into header.yaml and use include-header I got following error

! LaTeX Error: Missing \begin{document}.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.55 \textbackslash
                    usepackage\{times\}
!  ==> Fatal error occurred, no output PDF file produced!
Transcript written on /tmp/tex2pdf.-98c2dc4c9fc43ed6/input.log.