cysouw / pandoc-ling Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 6.0 8.63 MB

Pandoc Lua filter for linguistic examples

License: Creative Commons Zero v1.0 Universal

Lua 100.00%

interlinear-gloss linguistic pandoc pandoc-filter

pandoc-ling's Issues

Grammaticality judgements not aligned in interlinear examples when targeting Linguex

When targeting Linguex, grammaticality judgements are not aligned in interlinear examples.

See the output for example (4.12c) on page 10 of https://github.com/cysouw/pandoc-ling/blob/main/docs/readme_linguex.pdf

I think it's because the filter is outputting the interlinear lines for Linguex using \gll, etc., with the judgment intervening.

pandoc-ling/docs/readme_linguex.tex

Lines 595 to 599 in a9eae71

 \b. Dutch (Germanic) Note the grammaticality judgement! 

 \gll \textsuperscript{:--)}\emph{Deze} \emph{zin} \emph{is} 

 \emph{(dit~is~test)} \emph{nederlands.} \\ 

 \textsc{dem} sentence \textsc{aux} ~ dutch. \\ 

 \glt `This sentence is dutch.'

However the Linguex docs point out that judgments won't be aligned unless using \exg. or \ag., etc instead of \ex. \gll (see the bottom of page 3 and top of page 4 of their docs):

Likewise, writing \ex.\gll instead of \exg. will have the effect of not prefixing the grammaticality judgment.

I'm guessing that the decision to use \gll with the filter was to support preambles, however, as Linguex doesn't support them using \exg, etc. Is that right?

Question particle/marker `Q` is not rendered

I use the latest pandoc-ling 2a1e55a as of July 4, 2022. I found that the question particle/marker Q, which is listed in Leipzig Glossing Rules, is not rendered as shown below. Generally speaking, the current implementation may not render one-character abbreviations such as Q, A, S, P, F, M and so on.

I think the issue should be resolved if [%u%d][%u%d] in l.447 and l.455 of pandoc-ling.lua is replaced with one [%u%d]. Could you mind pointing out whether any problem arises when we replace [%u%d][%u%d] with [%u%d] or why [%u%d][%u%d] is originally used there?

pandoc-ling/pandoc-ling.lua

Line 447 in 2a1e55a

for lower,upper in string.gmatch(s, "(.-)([%u%d][%u%d]+)") do

pandoc-ling/pandoc-ling.lua

Line 455 in 2a1e55a

for leftover in string.gmatch(s, "[%u%d][%u%d]+(.-[^%u%s])$") do

MWE

::: {#Q-particle .ex formatGloss=true}
| **これは絵ですか？** `\\`{=latex}
| Kore=wa e-desu-ka?
| DEM=TOP picture-AFF.POL-Q
| Is this a picture?
:::

    ::: {#Q-particle .ex formatGloss=true}
    | **これは絵ですか？** `\\`{=latex}
    | Kore=wa e-desu-ka?
    | DEM=TOP picture-AFF.POL-Q
    | Is this a picture?
    :::

header-includes in the YAML header causes an error when exporting to LaTeX or pdf

When the YAML header contains header-includes, pandoc (2.13) cannot export to LaTeX or to pdf.
Exporting to docx works as expected.

Here is a minimal example and the command line run:

---
title: Testing pandoc-ling
header-includes: |
  \usepackage{hyperref}
---

# This is a test

::: ex
This is the most basic structure of a linguistic example.
:::

pandoc pandoc-ling_test.md -o pandoc-ling_test.tex --lua-filter pandoc-ling.lua

Links for cross-references with explicit an ID don't work with HTML output

With the following markdown from readme.md, the div's ID in the HTML output gets overwritten to ex4.13 instead of test.

::: {#test .ex}
This is a test
:::

pandoc-ling/docs/readme.html

Lines 722 to 731 in a9eae71

 <div id="ex4.13"> 

 <table class="linguistic-example"> 

 <tbody> 

 <tr class="odd"> 

 <td class="linguistic-example-number" style="vertical-align: top;">(4.13)</td> 

 <td class="linguistic-example-content" style="text-align: left;">This is a test</td> 

 </tr> 

 </tbody> 

 </table> 

 </div>

However, the cross-references using [@last], etc. maintain test as the ID to link to (the following is taken from readme.html, line 732):

<code>[@last]</code> will be formatted as <a href="#test">(4.13)</a>

Likewise, in line 733:

<code>[@last hA1l0]</code> will work also, leading to <a href="#test">(4.13 hA1l0)</a>

and line 717:

<code>[@test]</code>, leading to <a href="#test">(4.13)</a>

The ID for the div gets reset for HTML output in line 276 of pandoc-ling.lua

pandoc-ling/pandoc-ling.lua

Lines 268 to 277 in a9eae71

 -- reformat! 

 local example 

 if FORMAT:match "latex" or FORMAT:match "beamer" then 

 example = texMakeExample(parsedDiv) 

 else 

 example = parsedDiv.examples 

 example = pandocMakeExample(parsedDiv) 

 example = pandoc.Div(example) 

 example.attr = {id = "ex"..parsedDiv.number} 

 end

Following the proposal one should be able to link from an external source to something like webpage.url/#ex4.13, should the code that makes the cross-references for HTML also change the ID to #ex4.13 instead of #test?

pandoc-ling/pandoc-ling.lua

Lines 1538 to 1547 in a9eae71

 -- make the cross-reference 

 if FORMAT:match "latex" then 

 if latexPackage == "expex" then 

 return pandoc.RawInline("latex", "(\\getref{"..id.."}"..suffix..")") 

 else 

 return pandoc.RawInline("latex", "(\\ref{"..id.."}"..suffix..")") 

 end 

 else 

 return pandoc.Link("("..indexEx[id]..suffix..")", "#"..id) 

 end

Subscripts in the source line of an interlinear example are not properly rendered

Professor Dr. @cysouw

I use the latest pandoc-ling 2a1e55a as of June 26, 2022. I want to add subscripts that indicate coindexation to the source line of an interlinear example, as shown below.

However, I found I cannot achieve this either using ~...~ notation of pandoc markdown or using `\textsubscript{...}`{=latex} notation, although these notations successfully work in the translation line.

Using `~...~` notation of pandoc markdown

I implemented the following notation but the subscripts are rendered as normal characters (e.g. not '_i' but 'i').

::: {#subscript-superscript-md .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō~i~=wa Jirō~j~=ga zibun~i/j~=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::

Using `\textsubscript{...}`{=latex} notation

The subscripts disappear when I added subscripts using `\textsubscript{...}`{=latex} notation.

::: {#subscript-superscript-tex .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō`\textsubscript{i}`{=latex}=wa Jirō~`\textsubscript{j}`{=latex}~=ga zibun`\textsubscript{i/j}`{=latex}=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::

Would you mind telling me whether there is any workaround or what I can do to resolve the issue?

MWE

---
author: CLRR
to: pdf
title: Sub/Superscript workaround
latexPackage: gb4e
---

I want to add subscripts that indicate coindexation to the source line of an interlinear example, as shown below.

```{=latex}
\begin{exe} \judgewidth{}
  \ex [] { 
       Japanese coindex using subscript
  \gll \emph{Tarō\textsubscript{i}=wa} \emph{Jirō\textsubscript{j}=ga} \emph{zibun\textsubscript{i/j}=o} \emph{suki-da-to}
\emph{omot-te} \emph{i-ru.} \\
       T.=\textsc{top} J.=\textsc{nom} self=\textsc{acc}
like-\textsc{aff}-\textsc{quot} think-\textsc{cvb}
\textsc{prog}-\textsc{npst} \\
  \glt `Tarō\textsubscript{i} assumes that Jirō\textsubscript{j} likes
him\textsubscript{i}/himself\textsubscript{j}.' }
  \label{subscript-superscript-tex}
\end{exe}
```

However, using `~...~` notation in the source line fails to add subscripts.
The subscripts are rendered as normal characters
(e.g. not '~i~' but 'i'),
as demonstrated in [@subscript-superscript-md].

```
::: {#subscript-superscript-md .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō~i~=wa Jirō~j~=ga zibun~i/j~=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
```


::: {#subscript-superscript-md .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō~i~=wa Jirō~j~=ga zibun~i/j~=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::

Using ```\textsubscript{i}`{=latex}`` notation in the source line also fails to add subscripts.
In this case, the subscripts disappear, as illustrated in [@subscript-superscript-tex].

```
::: {#subscript-superscript-tex .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō`\textsubscript{i}`{=latex}=wa Jirō~`\textsubscript{j}`{=latex}~=ga zibun`\textsubscript{i/j}`{=latex}=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
```

::: {#subscript-superscript-tex .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō`\textsubscript{i}`{=latex}=wa Jirō~`\textsubscript{j}`{=latex}~=ga zibun`\textsubscript{i/j}`{=latex}=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::

Interlinear examples without free translation

When the meaning of a sentence in an interlinear example is obvious from the glosses (or not important for the argument), I like to exclude the free translation. I currently approximate this with:

::: ex
| Dutch
| Het meisje dat over straat liep.
| The girl that on street walked.
|
:::

But then an empty free translation is included. This is visible with ExPex; note the space between the examples:

When I omit the empty last line, the glosses are used as a translation (not sure if this is a feature or a bug):

::: ex
| Dutch
| Het meisje dat over straat liep.
| The girl that on street walked.
:::

Would it be possible to omit the free translation when it's empty (first example) and/or unspecified (second example)? Or is there already another way to accomplish this?

I would be happy to try and implement this myself, if you let me know what the desired behaviour is.

Beamer frame attributes not always output when using pandoc-ling filter

Hello,

Thanks for this wonderful filter.

I was trying to create a slide show with Beamer using the pandoc-ling filter. However, I discovered that when using the filter, Beamer frame attributes don't always get passed through.

For example, given the following input:

---
title: In the morning
---

# In the morning {.t}
- Test list
- Test list

# Getting up {.t}

- Turn off alarm
- Get out of bed

If I run pandoc without the filter, for example pandoc -t beamer beamer-ling.md --pdf-engine=xelatex -o beamer.tex, I get the following tex output. Note the [t] option passed to the frame environments:

\begin{frame}[t]{In the morning}
\phantomsection\label{in-the-morning}
\begin{itemize}
\tightlist
\item
  Test list
\item
  Test list
\end{itemize}
\end{frame}

\begin{frame}[t]{Getting up}
\phantomsection\label{getting-up}
\begin{itemize}
\tightlist
\item
  Turn off alarm
\item
  Get out of bed
\end{itemize}
\end{frame}

However, when running pandoc with the filter (pandoc -t beamer beamer-ling.md --pdf-engine=xelatex --lua-filter ./pandoc-ling.lua -o beamer-ling.tex), even when not including and kinds of numbered/glossed examples, the [t] option is not included in the tex output for the frames:

\begin{frame}{In the morning}
\phantomsection\label{in-the-morning}
\begin{itemize}
\tightlist
\item
  Test list
\item
  Test list
\end{itemize}
\end{frame}

\begin{frame}{Getting up}
\phantomsection\label{getting-up}
\begin{itemize}
\tightlist
\item
  Turn off alarm
\item
  Get out of bed
\end{itemize}
\end{frame}

Any ideas?

Thanks again for this great filter!

One-line examples with a judgement are unprocessible when producing an HTML file

I am trying to produce an HTML file which contains one-line examples with a judgement. If I correctly understand, the following markdown syntax will work:

::: ex
^* This traditionally signals ungrammaticality.
:::

However, pandoc throws an error on lua when processing the markdown file, as follows:

Error running filter configuration/pandoc-ling/pandoc-ling.lua:
configuration/pandoc-ling/pandoc-ling.lua:607: attempt to index a nil value (field '?')
stack traceback:
	configuration/pandoc-ling/pandoc-ling.lua:607: in function 'pandocMakeSingle'
	configuration/pandoc-ling/pandoc-ling.lua:527: in function 'pandocMakeExample'
	configuration/pandoc-ling/pandoc-ling.lua:256: in function 'processDiv'
Error: pandoc document conversion failed with error 83

The problem looks similar to the issue on the following link, but I cannot work around by myself since I am not familiar with lua...
https://stackoverflow.com/q/12153537/10215301

What am I missing...?

MWE

Here I put an Rmarkdown example with its subsidiary R packages, bookdown. The contents are exactly same as pandoc's markdown, except what I wrote in YAML section.

---
author: Masataka OGAWA
title: Using pandoc-ling for HTML

output: 
  bookdown::html_document2:
    pandoc_args:
      - --lua-filter=configuration/pandoc-ling/pandoc-ling.lua

  ## No problem with pdf-producing
  bookdown::pdf_document2:
    pandoc_args:
      - --lua-filter=configuration/pandoc-ling/pandoc-ling.lua
---

These two examples throw an error when I produce an HTML file:

```
Error running filter configuration/pandoc-ling/pandoc-ling.lua:
configuration/pandoc-ling/pandoc-ling.lua:607: attempt to index a nil value (field '?')
stack traceback:
	configuration/pandoc-ling/pandoc-ling.lua:607: in function 'pandocMakeSingle'
	configuration/pandoc-ling/pandoc-ling.lua:527: in function 'pandocMakeExample'
	configuration/pandoc-ling/pandoc-ling.lua:256: in function 'processDiv'
Error: pandoc document conversion failed with error 83
```

::: ex
^* This traditionally signals ungrammaticality.
:::

::: ex
a. ^* This traditionally signals ungrammaticality.
:::

<!-- 

This commented-out example works fine.

::: ex

a. ^* This traditionally signals ungrammaticality.
d. However, such long sequences sometimes lead to undesirable effects in the layout.

:::

-->

Note but that I can produce a PDF file without any problem with the Rmd file above, as shown in the following screen capture.

Environment

Environment
pandoc-ling: v1.4
pandoc: v2.11.4
OS: Windows 10 x64 (build 19042)

Option to output glosses without example number

Is it possible to use the filter to output aligned interlinear glosses without using an example number?

For example, with langsci-gb4e, I can use the following without an example environment and still produced aligned glosses:

    Dutch (Germanic)\\
    \gll Deze zin is in het nederlands. \\
          DEM sentence AUX in DET dutch. \\
    \glt This sentence is dutch.

I know this is a corner use case.

Currently, I'm working on a presentation with a slide where I want aligned glosses, but the example number isn't needed, doesn't make sense.

error running filter with recent pandoc

Using the filter with pandoc version 2.11.3.2 results in the following error, potentially related to jgm/pandoc#5331

Error running filter ./pandoc-ling.lua:
Could not get Meta value: Could not read value of key-value pair: Could not get MetaValue: Could not read list: Could not get Block value: Unknown block type: MetaBlocks

Fix: Changing line 83:

    tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("tex", s))

to the following fixes the problem, and seems to produce an identical result (I haven't tested this thoroughly):

    tmp[#tmp+1] = pandoc.RawBlock("tex", s)

Escape meta characters of LaTeX in "judgeOffset" to avoid the compilation failure

I found that PDF compilation via LaTeX halts when judgeMax, the character/word used for the grammaticality/acceptability judgement whose number of characters are the largest in the ex environment, contains one of the LaTeX's meta-characters such as #, % or &. So, the following two markdown examples are not properly processed in the compilation procedure and throw an error: ! Illegal parameter number in definition of \@jwidth.

::: {.ex #grammaticality-judgement}
^# I'm not sure this is grammatical.
:::

::: {.ex #grammaticality-judgement}
a. ^# I'm not sure this is grammatical.
a. This is grammatical.
:::

I suspect that the line in texMakeGb4e function and the line in texMakeLangsci function render the characters like #, % or & to LaTeX as they are (i.e. without escaping them) and that this causes the above-mentioned error.

Those lines can be replaced with the following lua command (I verified this works in my environment):

local judgeOffset = "\\judgewidth{"..string.gsub(pandoc.utils.stringify(judgeMax), "([#$%&_{}~^])", "\\%1").."}"

Would you mind considering and verifying this patch?

Environment

pandoc-ling: the latest version as of 2021/02/24
pandoc: v2.11.4
OS: Windows 10 x64 (build 19042)

Beamer slides (PDF) show linguistic examples in a overt table

When I create Beamer slides including some linguistic examples produced via pandoc-ling, the output slides undesirably have these examples in a visible table, as shown in the following screen capture. This does not occur when I create a usual LaTeX-based documents (e.g. article class).

The behaviour is understandable since pandoc-ling is a table-environment-based glossing system, as you stated in the manual. However, is there any good practice to make these tables invisible?

Working example

Here I put an Rmarkdown example with two subsidiary R packages, bookdown and officedown. The contents are exactly same as pandoc's markdown, except what I wrote in YAML section.

---
title: "Pandoc-ling with Beamer"
author: "CLRR"
output: 
  bookdown::beamer_presentation2:
    latex_engine: xelatex
    keep_tex: TRUE
    keep_md: TRUE
    toc: TRUE
    number_sections: TRUE
    slide_level: 2
    pandoc_args:
      - --lua-filter=path-to/pandoc-ling/pandoc-ling.lua
  bookdown::pdf_document2:
    latex_engine: xelatex
    keep_tex: true
    keep_md: true
    pandoc_args:
      - --lua-filter=path-to/pandoc-ling/pandoc-ling.lua
  officedown::rdocx_document:
    mapstyles:
      Normal: ['First Paragraph']
    keep_md: TRUE
    pandoc_args:
      - --lua-filter=path-to/pandoc-ling/pandoc-ling.lua
always_allow_html: yes
link-citations: yes
---

## Basic example

::: ex
This is the most basic structure of a linguistic example. 
:::

## Multi-line example

::: {#id .ex formatGloss=false}

This is a multi-line example.
But that does not mean anything for the result
All these lines are simply treated as one paragraph.
They will become one example with one number.

:::

## Example with a preamble

:::ex
Preamble

This is an example with a preamble.
:::

## Multiple examples

:::ex
a. This is the first example.
b. This is the second.
a. The actual letters are not important, `pandoc-ling` will put them in order.

e. Empty lines are allowed between labelled lines
Subsequent lines are again treated as one sequential paragraph.
:::

## Examples with a judgement

:::ex
Throwing in a preamble for good measure

a. ^* This traditionally signals ungrammaticality.
b. ^? Question-marks indicate questionable grammaticality.
c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
d. However, such long sequences sometimes lead to undesirable effects in the layout.
:::

## Formatted interlinear example

::: {.ex formatGloss=true}
| Dutch (Germanic)
| Deze zin is in het nederlands.
| DEM sentence AUX in DET dutch.
| This sentence is dutch.
:::

## Cross-referenceing

::: {#test .ex}
This is a test
:::

@test is grammatical.

## Complex example

::: {.ex formatGloss=true}
Completely superfluous preamble, but it works ...

a. Mixing single line examples with interlinear examples.
a. This is of course highly unusal.
Just for this example, let's add some extra material in this example.

a.
| Dutch (Germanic) Note the grammaticality judgement!
| ^^:-)^ Deze zin is (dit\ is&nbsp;test) nederlands.
| DEM sentence AUX ~ dutch.
| This sentence is dutch.

b.
|
| Deze tweede zin heeft geen header.
| DEM second sentence have.3SG.PRES no header.
| This second sentence does not have a header.
:::

Environment

pandoc-ling: v1.1
pandoc: v2.11.3.2
OS: Windows 10 x64 (build 19042)

	\b. Dutch (Germanic) Note the grammaticality judgement!
	\gll \textsuperscript{:--)}\emph{Deze} \emph{zin} \emph{is}
	\emph{(dit~is~test)} \emph{nederlands.} \\
	\textsc{dem} sentence \textsc{aux} ~ dutch. \\
	\glt `This sentence is dutch.'

	<div id="ex4.13">
	<table class="linguistic-example">
	<tbody>
	<tr class="odd">
	<td class="linguistic-example-number" style="vertical-align: top;">(4.13)</td>
	<td class="linguistic-example-content" style="text-align: left;">This is a test</td>
	</tr>
	</tbody>
	</table>
	</div>

	-- reformat!
	local example
	if FORMAT:match "latex" or FORMAT:match "beamer" then
	example = texMakeExample(parsedDiv)
	else
	example = parsedDiv.examples
	example = pandocMakeExample(parsedDiv)
	example = pandoc.Div(example)
	example.attr = {id = "ex"..parsedDiv.number}
	end

	-- make the cross-reference
	if FORMAT:match "latex" then
	if latexPackage == "expex" then
	return pandoc.RawInline("latex", "(\\getref{"..id.."}"..suffix..")")
	else
	return pandoc.RawInline("latex", "(\\ref{"..id.."}"..suffix..")")
	end
	else
	return pandoc.Link("("..indexEx[id]..suffix..")", "#"..id)
	end

cysouw / pandoc-ling Goto Github PK

pandoc-ling's Issues

MWE

Using ~...~ notation of pandoc markdown

Using `\textsubscript{...}`{=latex} notation

MWE

MWE

Environment

Environment

Working example

Environment

Recommend Projects

Recommend Topics

Recommend Org

Using `~...~` notation of pandoc markdown