cysouw / pandoc-ling Goto Github PK
View Code? Open in Web Editor NEWPandoc Lua filter for linguistic examples
License: Creative Commons Zero v1.0 Universal
Pandoc Lua filter for linguistic examples
License: Creative Commons Zero v1.0 Universal
When targeting Linguex, grammaticality judgements are not aligned in interlinear examples.
See the output for example (4.12c) on page 10 of https://github.com/cysouw/pandoc-ling/blob/main/docs/readme_linguex.pdf
I think it's because the filter is outputting the interlinear lines for Linguex using \gll
, etc., with the judgment intervening.
pandoc-ling/docs/readme_linguex.tex
Lines 595 to 599 in a9eae71
However the Linguex docs point out that judgments won't be aligned unless using \exg.
or \ag.
, etc instead of \ex. \gll
(see the bottom of page 3 and top of page 4 of their docs):
Likewise, writing \ex.\gll instead of \exg. will have the effect of not prefixing the grammaticality judgment.
I'm guessing that the decision to use \gll
with the filter was to support preambles, however, as Linguex doesn't support them using \exg
, etc. Is that right?
I use the latest pandoc-ling 2a1e55a as of July 4, 2022. I found that the question particle/marker Q
, which is listed in Leipzig Glossing Rules, is not rendered as shown below. Generally speaking, the current implementation may not render one-character abbreviations such as Q
, A
, S
, P
, F
, M
and so on.
I think the issue should be resolved if [%u%d][%u%d]
in l.447 and l.455 of pandoc-ling.lua
is replaced with one [%u%d]
. Could you mind pointing out whether any problem arises when we replace [%u%d][%u%d]
with [%u%d]
or why [%u%d][%u%d]
is originally used there?
Line 447 in 2a1e55a
Line 455 in 2a1e55a
::: {#Q-particle .ex formatGloss=true}
| **これは絵ですか?** `\\`{=latex}
| Kore=wa e-desu-ka?
| DEM=TOP picture-AFF.POL-Q
| Is this a picture?
:::
::: {#Q-particle .ex formatGloss=true}
| **これは絵ですか?** `\\`{=latex}
| Kore=wa e-desu-ka?
| DEM=TOP picture-AFF.POL-Q
| Is this a picture?
:::
When the YAML header contains header-includes
, pandoc (2.13) cannot export to LaTeX or to pdf.
Exporting to docx works as expected.
Here is a minimal example and the command line run:
---
title: Testing pandoc-ling
header-includes: |
\usepackage{hyperref}
---
# This is a test
::: ex
This is the most basic structure of a linguistic example.
:::
pandoc pandoc-ling_test.md -o pandoc-ling_test.tex --lua-filter pandoc-ling.lua
With the following markdown from readme.md
, the div's ID in the HTML output gets overwritten to ex4.13
instead of test
.
::: {#test .ex}
This is a test
:::
Lines 722 to 731 in a9eae71
However, the cross-references using [@last
], etc. maintain test
as the ID to link to (the following is taken from readme.html, line 732):
<code>[@last]</code> will be formatted as <a href="#test">(4.13)</a>
Likewise, in line 733:
<code>[@last hA1l0]</code> will work also, leading to <a href="#test">(4.13 hA1l0)</a>
and line 717:
<code>[@test]</code>, leading to <a href="#test">(4.13)</a>
The ID for the div gets reset for HTML output in line 276 of pandoc-ling.lua
Lines 268 to 277 in a9eae71
Following the proposal one should be able to link from an external source to something like webpage.url/#ex4.13
, should the code that makes the cross-references for HTML also change the ID to #ex4.13
instead of #test
?
Lines 1538 to 1547 in a9eae71
Professor Dr. @cysouw
I use the latest pandoc-ling 2a1e55a as of June 26, 2022. I want to add subscripts that indicate coindexation to the source line of an interlinear example, as shown below.
However, I found I cannot achieve this either using ~...~
notation of pandoc markdown or using `\textsubscript{...}`{=latex}
notation, although these notations successfully work in the translation line.
~...~
notation of pandoc markdownI implemented the following notation but the subscripts are rendered as normal characters (e.g. not 'i' but 'i').
::: {#subscript-superscript-md .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō~i~=wa Jirō~j~=ga zibun~i/j~=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
`\textsubscript{...}`{=latex}
notationThe subscripts disappear when I added subscripts using `\textsubscript{...}`{=latex}
notation.
::: {#subscript-superscript-tex .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō`\textsubscript{i}`{=latex}=wa Jirō~`\textsubscript{j}`{=latex}~=ga zibun`\textsubscript{i/j}`{=latex}=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
Would you mind telling me whether there is any workaround or what I can do to resolve the issue?
---
author: CLRR
to: pdf
title: Sub/Superscript workaround
latexPackage: gb4e
---
I want to add subscripts that indicate coindexation to the source line of an interlinear example, as shown below.
```{=latex}
\begin{exe} \judgewidth{}
\ex [] {
Japanese coindex using subscript
\gll \emph{Tarō\textsubscript{i}=wa} \emph{Jirō\textsubscript{j}=ga} \emph{zibun\textsubscript{i/j}=o} \emph{suki-da-to}
\emph{omot-te} \emph{i-ru.} \\
T.=\textsc{top} J.=\textsc{nom} self=\textsc{acc}
like-\textsc{aff}-\textsc{quot} think-\textsc{cvb}
\textsc{prog}-\textsc{npst} \\
\glt `Tarō\textsubscript{i} assumes that Jirō\textsubscript{j} likes
him\textsubscript{i}/himself\textsubscript{j}.' }
\label{subscript-superscript-tex}
\end{exe}
```
However, using `~...~` notation in the source line fails to add subscripts.
The subscripts are rendered as normal characters
(e.g. not '~i~' but 'i'),
as demonstrated in [@subscript-superscript-md].
```
::: {#subscript-superscript-md .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō~i~=wa Jirō~j~=ga zibun~i/j~=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
```
::: {#subscript-superscript-md .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō~i~=wa Jirō~j~=ga zibun~i/j~=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
Using ```\textsubscript{i}`{=latex}`` notation in the source line also fails to add subscripts.
In this case, the subscripts disappear, as illustrated in [@subscript-superscript-tex].
```
::: {#subscript-superscript-tex .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō`\textsubscript{i}`{=latex}=wa Jirō~`\textsubscript{j}`{=latex}~=ga zibun`\textsubscript{i/j}`{=latex}=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
```
::: {#subscript-superscript-tex .ex formatGloss=true}
| Japanese coindex using subscript
| Tarō`\textsubscript{i}`{=latex}=wa Jirō~`\textsubscript{j}`{=latex}~=ga zibun`\textsubscript{i/j}`{=latex}=o suki-da-to omot-te i-ru.
| T.=TOP J.=NOM self=ACC like-AFF-QUOT think-CVB PROG-NPST
| Tarō~i~ assumes that Jirō~j~ likes him~i~/himself~j~.
:::
When the meaning of a sentence in an interlinear example is obvious from the glosses (or not important for the argument), I like to exclude the free translation. I currently approximate this with:
::: ex
| Dutch
| Het meisje dat over straat liep.
| The girl that on street walked.
|
:::
But then an empty free translation is included. This is visible with ExPex; note the space between the examples:
When I omit the empty last line, the glosses are used as a translation (not sure if this is a feature or a bug):
::: ex
| Dutch
| Het meisje dat over straat liep.
| The girl that on street walked.
:::
Would it be possible to omit the free translation when it's empty (first example) and/or unspecified (second example)? Or is there already another way to accomplish this?
I would be happy to try and implement this myself, if you let me know what the desired behaviour is.
Hello,
Thanks for this wonderful filter.
I was trying to create a slide show with Beamer using the pandoc-ling filter. However, I discovered that when using the filter, Beamer frame attributes don't always get passed through.
For example, given the following input:
---
title: In the morning
---
# In the morning {.t}
- Test list
- Test list
# Getting up {.t}
- Turn off alarm
- Get out of bed
If I run pandoc without the filter, for example pandoc -t beamer beamer-ling.md --pdf-engine=xelatex -o beamer.tex
, I get the following tex output. Note the [t]
option passed to the frame environments:
\begin{frame}[t]{In the morning}
\phantomsection\label{in-the-morning}
\begin{itemize}
\tightlist
\item
Test list
\item
Test list
\end{itemize}
\end{frame}
\begin{frame}[t]{Getting up}
\phantomsection\label{getting-up}
\begin{itemize}
\tightlist
\item
Turn off alarm
\item
Get out of bed
\end{itemize}
\end{frame}
However, when running pandoc with the filter (pandoc -t beamer beamer-ling.md --pdf-engine=xelatex --lua-filter ./pandoc-ling.lua -o beamer-ling.tex
), even when not including and kinds of numbered/glossed examples, the [t]
option is not included in the tex output for the frames:
\begin{frame}{In the morning}
\phantomsection\label{in-the-morning}
\begin{itemize}
\tightlist
\item
Test list
\item
Test list
\end{itemize}
\end{frame}
\begin{frame}{Getting up}
\phantomsection\label{getting-up}
\begin{itemize}
\tightlist
\item
Turn off alarm
\item
Get out of bed
\end{itemize}
\end{frame}
Any ideas?
Thanks again for this great filter!
I am trying to produce an HTML file which contains one-line examples with a judgement. If I correctly understand, the following markdown syntax will work:
::: ex
^* This traditionally signals ungrammaticality.
:::
However, pandoc throws an error on lua when processing the markdown file, as follows:
Error running filter configuration/pandoc-ling/pandoc-ling.lua:
configuration/pandoc-ling/pandoc-ling.lua:607: attempt to index a nil value (field '?')
stack traceback:
configuration/pandoc-ling/pandoc-ling.lua:607: in function 'pandocMakeSingle'
configuration/pandoc-ling/pandoc-ling.lua:527: in function 'pandocMakeExample'
configuration/pandoc-ling/pandoc-ling.lua:256: in function 'processDiv'
Error: pandoc document conversion failed with error 83
The problem looks similar to the issue on the following link, but I cannot work around by myself since I am not familiar with lua...
https://stackoverflow.com/q/12153537/10215301
What am I missing...?
Here I put an Rmarkdown example with its subsidiary R packages, bookdown
. The contents are exactly same as pandoc's markdown, except what I wrote in YAML section.
---
author: Masataka OGAWA
title: Using pandoc-ling for HTML
output:
bookdown::html_document2:
pandoc_args:
- --lua-filter=configuration/pandoc-ling/pandoc-ling.lua
## No problem with pdf-producing
bookdown::pdf_document2:
pandoc_args:
- --lua-filter=configuration/pandoc-ling/pandoc-ling.lua
---
These two examples throw an error when I produce an HTML file:
```
Error running filter configuration/pandoc-ling/pandoc-ling.lua:
configuration/pandoc-ling/pandoc-ling.lua:607: attempt to index a nil value (field '?')
stack traceback:
configuration/pandoc-ling/pandoc-ling.lua:607: in function 'pandocMakeSingle'
configuration/pandoc-ling/pandoc-ling.lua:527: in function 'pandocMakeExample'
configuration/pandoc-ling/pandoc-ling.lua:256: in function 'processDiv'
Error: pandoc document conversion failed with error 83
```
::: ex
^* This traditionally signals ungrammaticality.
:::
::: ex
a. ^* This traditionally signals ungrammaticality.
:::
<!--
This commented-out example works fine.
::: ex
a. ^* This traditionally signals ungrammaticality.
d. However, such long sequences sometimes lead to undesirable effects in the layout.
:::
-->
Note but that I can produce a PDF file without any problem with the Rmd file above, as shown in the following screen capture.
Environment
pandoc-ling: v1.4
pandoc: v2.11.4
OS: Windows 10 x64 (build 19042)
Is it possible to use the filter to output aligned interlinear glosses without using an example number?
For example, with langsci-gb4e, I can use the following without an example environment and still produced aligned glosses:
Dutch (Germanic)\\
\gll Deze zin is in het nederlands. \\
DEM sentence AUX in DET dutch. \\
\glt This sentence is dutch.
I know this is a corner use case.
Currently, I'm working on a presentation with a slide where I want aligned glosses, but the example number isn't needed, doesn't make sense.
Using the filter with pandoc version 2.11.3.2 results in the following error, potentially related to jgm/pandoc#5331
Error running filter ./pandoc-ling.lua:
Could not get Meta value: Could not read value of key-value pair: Could not get MetaValue: Could not read list: Could not get Block value: Unknown block type: MetaBlocks
Fix: Changing line 83:
tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("tex", s))
to the following fixes the problem, and seems to produce an identical result (I haven't tested this thoroughly):
tmp[#tmp+1] = pandoc.RawBlock("tex", s)
I found that PDF compilation via LaTeX halts when judgeMax
, the character/word used for the grammaticality/acceptability judgement whose number of characters are the largest in the ex
environment, contains one of the LaTeX's meta-characters such as #
, %
or &
. So, the following two markdown examples are not properly processed in the compilation procedure and throw an error: ! Illegal parameter number in definition of \@jwidth.
::: {.ex #grammaticality-judgement}
^# I'm not sure this is grammatical.
:::
or
::: {.ex #grammaticality-judgement}
a. ^# I'm not sure this is grammatical.
a. This is grammatical.
:::
I suspect that the line in texMakeGb4e
function and the line in texMakeLangsci
function render the characters like #
, %
or &
to LaTeX as they are (i.e. without escaping them) and that this causes the above-mentioned error.
Those lines can be replaced with the following lua command (I verified this works in my environment):
local judgeOffset = "\\judgewidth{"..string.gsub(pandoc.utils.stringify(judgeMax), "([#$%&_{}~^])", "\\%1").."}"
Would you mind considering and verifying this patch?
pandoc-ling: the latest version as of 2021/02/24
pandoc: v2.11.4
OS: Windows 10 x64 (build 19042)
When I create Beamer
slides including some linguistic examples produced via pandoc-ling
, the output slides undesirably have these examples in a visible table, as shown in the following screen capture. This does not occur when I create a usual LaTeX-based documents (e.g. article
class).
The behaviour is understandable since pandoc-ling
is a table-environment-based glossing system, as you stated in the manual. However, is there any good practice to make these tables invisible?
Here I put an Rmarkdown example with two subsidiary R packages, bookdown
and officedown
. The contents are exactly same as pandoc's markdown, except what I wrote in YAML section.
---
title: "Pandoc-ling with Beamer"
author: "CLRR"
output:
bookdown::beamer_presentation2:
latex_engine: xelatex
keep_tex: TRUE
keep_md: TRUE
toc: TRUE
number_sections: TRUE
slide_level: 2
pandoc_args:
- --lua-filter=path-to/pandoc-ling/pandoc-ling.lua
bookdown::pdf_document2:
latex_engine: xelatex
keep_tex: true
keep_md: true
pandoc_args:
- --lua-filter=path-to/pandoc-ling/pandoc-ling.lua
officedown::rdocx_document:
mapstyles:
Normal: ['First Paragraph']
keep_md: TRUE
pandoc_args:
- --lua-filter=path-to/pandoc-ling/pandoc-ling.lua
always_allow_html: yes
link-citations: yes
---
## Basic example
::: ex
This is the most basic structure of a linguistic example.
:::
## Multi-line example
::: {#id .ex formatGloss=false}
This is a multi-line example.
But that does not mean anything for the result
All these lines are simply treated as one paragraph.
They will become one example with one number.
:::
## Example with a preamble
:::ex
Preamble
This is an example with a preamble.
:::
## Multiple examples
:::ex
a. This is the first example.
b. This is the second.
a. The actual letters are not important, `pandoc-ling` will put them in order.
e. Empty lines are allowed between labelled lines
Subsequent lines are again treated as one sequential paragraph.
:::
## Examples with a judgement
:::ex
Throwing in a preamble for good measure
a. ^* This traditionally signals ungrammaticality.
b. ^? Question-marks indicate questionable grammaticality.
c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
d. However, such long sequences sometimes lead to undesirable effects in the layout.
:::
## Formatted interlinear example
::: {.ex formatGloss=true}
| Dutch (Germanic)
| Deze zin is in het nederlands.
| DEM sentence AUX in DET dutch.
| This sentence is dutch.
:::
## Cross-referenceing
::: {#test .ex}
This is a test
:::
@test is grammatical.
## Complex example
::: {.ex formatGloss=true}
Completely superfluous preamble, but it works ...
a. Mixing single line examples with interlinear examples.
a. This is of course highly unusal.
Just for this example, let's add some extra material in this example.
a.
| Dutch (Germanic) Note the grammaticality judgement!
| ^^:-)^ Deze zin is (dit\ is test) nederlands.
| DEM sentence AUX ~ dutch.
| This sentence is dutch.
b.
|
| Deze tweede zin heeft geen header.
| DEM second sentence have.3SG.PRES no header.
| This second sentence does not have a header.
:::
pandoc-ling: v1.1
pandoc: v2.11.3.2
OS: Windows 10 x64 (build 19042)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.