Giter Club home page Giter Club logo

harf's Introduction

This is no longer maintained, future development will be integrated into luaotfload

Harf

A HarfBuzz-based font loader and shaper for LuaTeX. It requires the experimental luahbtex engine, or installing luaharfbuzz module for the regular luatex engine.

History

The initial version of the shaping code was inspired by luatex-harfbuzz but was completely rewritten to use new HarfBuzz APIs and features.

There are few other projects for using HarfBuzz with LuaTeX:

harf's People

Contributors

khaledhosny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

zauguin

harf's Issues

Fix hyphenation

Currently hyphenation kinda works, but there are few issues:

  • The hyphen glyph does not show up, because we do not setup any characters with Unicode values in the font1, so we need to shape the hyphen to get proper glyph there.
  • Hyphenation points in the middle of ligatures get dropped.
  • Kerning across the hyphenation point is preserved after line breaking while it shouldn’t.

1 All our font “characters” use glyph indices prefixed by the largest valid Unicode value, and mapping characters to glyphs happens during shaping not font loading.

harfload and tlig-ligatures

If no mode or mode=harf is used the tlig-ligatures don't work. Does that mean that if no mode is set, harf mode is enforced? And could harf mode support tlig?

(The current behaviour is a problem for the fonts preset by the latex format in the TU-fd-files, as they don't set a mode currently, but this can naturally be changed).

\documentclass{article}
\usepackage{harfload}
\begin{document}
{\small a--b}

\font\test={name:texgyretermes:+tlig;} 
\test
a--b

\font\test={name:texgyrebonum:mode=harf;+tlig;}
\test
a--b

\font\test={name:texgyreheros:mode=node;+tlig;}
\test
a--b

\font\test={name:texgyreheros:mode=base;+tlig;}
\test
a--b

\end{document}

image

With fontspec luaotfload seems to still load the fonts itself

Not sure if this is new, but today the LaTeX example took so long and froze my system while consuming all available memory, which usually happens when luaotfload loads Noto Serif CJK, but it shouldn’t be loading any fonts. Might be related to the fact that with fontspec both mode=node and mode=harf are used.

Report missing characters

For missing characters we currently output .notdef glyph (glyph 0 in the font), this is usually a box glyph or something similar which helps give visual indication that a character is not supported by the font (in contrast to traditional TeX behavior of outputting nothing).

This have the downside that messing character log messages are not shown, so we need to fix this.

Move handling /ActualText to after line breaking

We can’t have pdf_literal whatsits in discretionary nodes and this is a big limitation, so we need to move inserting these nodes to after line breaking has been done and the discretionary nodes are gone.

Noto Color Emoji is not copyable

\documentclass{minimal}
\usepackage{harfload}
\font\testnoto={name:Noto Color Emoji:mode=harf}
\begin{document}
\testnoto
😄👌☃⛄
\end{document}

The above latex file produced a correctly looking PDF, but it cannot be copy-and-pasted
and it cannot be converted to a text file by, for example, pdftotext program.
When I use other emoji fonts, e.g. Symbola or Microsoft's Segoi UI Emoji, the resulted PDF are copy-and-pastable and can be converted to text files
by pdftotext.

I cannot exclude the possibility that it is a bug in Noto Color Emoji, and as harflatex is the only latex variant that can handle Noto Color Emoji, I cannot compare the above behavior with other variants of latex.

Cache fonts in memory

If the same file and font index is requested again, we should not load a new a font and instead re-use the already loaded one. Even if the font size is different, we should be able to shape every thing at UPEM size and scale the output.

Fonts with "Em Size" other than 1000

Fonts with Em Size 1000 work fine (Fontforge: Element→Font Info→General→Em Size), but other values have size problems. In this case Adobe Arabic has the value 2048.

\documentclass{article}
\usepackage{harfload}
\usepackage{fontspec}
\setmainfont{AdobeArabic-Regular.otf}[RawFeature={mode=harf}]

\begin{document}

åäöüñÅÄÖÜÑ

اللُّغَة العَرَبِيّة هي أكثر اللغات تحدثاً ونطقاً

\end{document}

harf


  • Git revisions 7657fe3 (harf) and e11c8e8 (harftex)

  • ./build.sh --tlopt="-C --with-system-cairo --with-system-freetype2 --with-system-gd --with-system-gmp --with-system-graphite --with-system-harfbuzz --with-system-icu --with-system-libpaper --with-system-libpng --with-system-mpfr --with-system-pixman --with-system-poppler --with-system-potrace --with-system-teckit --with-system-xpdf --with-system-zlib --with-system-zziplib --with-x-dvi-toolkit=xaw --build=x86_64-linuxmusl --host=x86_64-linuxmusl --target=x86_64-linuxmusl --datarootdir=/opt/texlive/2019 --prefix=/opt/texlive/2019 --bindir=/opt/texlive/2019/bin/ --libdir=/usr/lib --includedir=/usr/include --mandir=/usr/share/man --infodir=/usr/share/info --disable-native-texlive-build --enable-shared --disable-static --enable-largefile --disable-dependency-tracking --with-pic"

Handle over/underfull box messages in a better way

I see stuff like:

Underfull \hbox (badness 1902) in paragraph at lines 874--874
[] [][][]\TU/Sanskrit2003NM(1)/bx/n/12 �
warning  (print): bad raw byte to print (c=280), skipped

warning  (print): bad raw byte to print (c=281), skipped

warning  (print): bad raw byte to print (c=1157), skipped

Looks like LuaTeX is trying to print the glyph node’s char fields, which (mis)use as glyph indexes and contain invalid Unicode characters. We need to figure out what is best to be shown here (by e.g. checking XeTeX) and see if we can make LuaTeX show that.

Make text extraction from PDF work

Set LuaTeX’s char tounicode or use PDF’s /ActuaText employing a strategy similar to the one I used in LibreOffice:

  • If there is unique one to one or one to many mapping between each glyph index and Unicode code points, use ToUnicode CMAP.
  • If there is many to one or many to many mapping, use an ActualText span embedding the original string, since ToUnicode can’t handle these.
  • If the same glyph index is used for several Unicode code points, also use ActualText since ToUnicode can map each glyph in the font only once.
  • Limit ActualText to single cluster at a time, since using it for whole words or sentences breaks text selection and highlighting in PDF viewers (there will be no way to tell which glyphs belong to which characters).
  • Keep generating some redundant ToUnicode entries for compatibility with old tools not supporting ActualText.

Support bitmap color fonts

I think it should be possible to insert images in LuaTeX from the image data we get from HarfBuzz, need to investigate.

Add OpenType debugger

HarfBuzz has support for OpenType debugger that shows what lookups are applied in what order and their input/output. See if we can make any use of that.

bad argument #4 to `shape_full` error.

Running a simple document like

\documentclass{article}

\usepackage{luatexbase}
\directlua{require("harf-luaotfload.lua")}

\begin{document}
\font\test={file:texgyreheros-regular.otf:mode=harf;}
\test abc
\end{document}

I get the error

warning  (node filter): error: .../luaotfload/texmf/tex/luatex/luahbtex-harf/ha
rf-node.lua:339: bad argument #4 to 'shape_full' (number expected, got table)

setting shapers to 0 in harf-node.lua (instead of a table) works, but I have naturally no idea which number the function expects.

Incorrect color of text after emoji

The below is probably somewhat related to khaledhosny/harftex#11.
In beamer.cls, color of text can be white. After use of emoji in Segoe UI Emoji font, the color of text becomes black. A short example is below. PDF generated by harflatex 0.4.1 is attached below.

\documentclass{beamer}
\usepackage{harfload}
\usepackage{fontspec}
\setsansfont{Segoe UI Emoji}[
  RawFeature={mode=harf,+dist;+ccmp}]
\usetheme{Madrid}

\begin{document}
\begin{frame}{text 😂 text after emoji}
\end{frame}
\end{document}

incorrect-color.pdf

Support loading fonts by name

We need a way to load fonts by name and query system fonts. An easy short cut would to use luaotfload for this.

Ideally we should integrate with system font finding libraries, but linking that in LuaTeX will likely not fly with the LuaTeX team and I’m not sure what are the other options.

PDF/A validation fails with harflatex + emoji

I am not sure if the below is a problem in harflatex. When I compile the below latex file

\begin{filecontents*}{\jobname.xmpdata}
  \Title{test}
  \Author{author}
\end{filecontents*}

\RequirePackage{harfload}
\documentclass[luatex,unicode,a4paper,12pt]{article}
\usepackage[a-2u]{pdfx}
\usepackage{fontspec}
\setmainfont{Segoe UI Emoji}[
  RawFeature={mode=harf;+dist;+ccmp},
  BoldFont={Segoe UI Bold},
  ItalicFont={Segoe UI Italic},
  BoldItalicFont={Segoe UI Bold Italic}]

\begin{document}
\section{Test}
Test 😃.
\end{document}

the generated PDF fails with the PDF/A validation at
https://www.pdf-online.com/osa/validate.aspx
with error


File | pdfa-test4.pdf
-- | --
Compliance | pdfa-2u
Result | Document does not conform to PDF/A.
Details | Validating file "pdfa-test4.pdf" for conformance level pdfa-2u
The Unicode for cid 9687 is unknown.
The Unicode for cid 9688 is unknown.
The Unicode for cid 9689 is unknown.
The Unicode for cid 9690 is unknown.
The Unicode for cid 9691 is unknown.
The document does not conform to the requested standard.
The document contains fonts without appropriate character to unicode mapping information (ToUnicode maps).The document does not conform to the PDF/A-2u standard.Done.

On the other hand, lualatex and the following latex file generated a PDF without validation error.

\begin{filecontents*}{\jobname.xmpdata}
  \Title{test}
  \Author{author}
\end{filecontents*}

\documentclass[luatex,unicode,a4paper,12pt]{article}
\usepackage[a-2u]{pdfx}
\usepackage{fontspec}
\setmainfont{Segoe UI Emoji}[
  RawFeature={+dist;+ccmp},
  BoldFont={Segoe UI Bold},
  ItalicFont={Segoe UI Italic},
  BoldItalicFont={Segoe UI Bold Italic}]

\begin{document}
\section{Test \textnormal{😃}}
Test 😃.
\end{document}

Two generated PDF files are attached below:
pdfa-test5.pdf
pdfa-test4.pdf

Support transparent colors

Issues #14 and #15 implemented support for color options and fonts, but transparency (alpha) is currently ignored because it needs cooperation between various LaTeX package. We need to figure what should be done here.

Calculate glyph bounding box after shaping

We currently calculate glyph bounding box for all glyphs in the font at font loading time, but this seems to be a waste; we should do it instead after shaping and for glyphs that are actually used in the document.

Support XeTeX font syntax

We don’t currently do any parsing of font names, so anything other than paths or file names will fail to load. At bare minimum, we should support a XeTeX compatible syntax.

Cache shaped words

We should be able to speed shaping considerably by caching words (splitting at spaces) if fonts do not have any substitution or positioning rules involving spaces.

SVG in OpenType unsupported?

HarfBuzz seems to support color SVG in OpenType font format from Oct. 2018 as
harfbuzz/harfbuzz#1192 (comment)

As I understand, the PDF spec. does not support SVG-in-OT, so SVG has to be converted to PDF or Type3 font to render it in PDF. If one downloads Abelone font (or anything) to current directory and process the below file by harflatex, no SVG-to-PDF or SVG-to-Type3 conversion seems taking place. Can I consider SVG-in-OT as unsupported by harftex? I have little opinion on whether or not SVG-in-OT should be supported, though use of stylish color SVG font may be fun.

Location of Abelone font: https://www.fontself.com/colorfontweek/2018#abelone

LaTeX file:

\documentclass{minimal}
\usepackage{harfload}
\font\svgcolorfont={file:./Abelone-FREE.otf:mode=harf;+svg}
\begin{document}
\noindent
\svgcolorfont ABCDEF abcdef
\end{document}

google noto color emoji attempt to compare number with boolean

I load noto color emoji font, but it fails to generate PDF correctly

\newfontfamily{\fallbackfont}{NotoColorEmoji.ttf}[RawFeature={mode=harf}]

warning (node filter): error: ./harf-node.lua:741: attempt to compare number with boolean

I fixed it locally with this change, but not sure if this will have the correct effect in all cases:

diff --git a/src/harf-node.lua b/src/harf-node.lua
index 24f6699..2cb2690 100644
--- a/src/harf-node.lua
+++ b/src/harf-node.lua
@@ -738,12 +738,14 @@ local function tonodes(head, current, run, glyphs, color)
         -- May be it is checking for the italic correction before we have had
         -- loaded the glyph?
         local prevchar, prevfontid = isglyph(current)
-        if prevchar > 0 then
-          local prevfontdata = font.getfont(prevfontid)
-          local prevcharacters = prevfontdata and prevfontdata.characters
-          local italic = prevcharacters and prevcharacters[prevchar].italic
-          if italic then
-            setkern(n, italic)
+        if type(prevchar) ~= 'boolean' then
+          if prevchar > 0 then
+            local prevfontdata = font.getfont(prevfontid)
+            local prevcharacters = prevfontdata and prevfontdata.characters
+            local italic = prevcharacters and prevcharacters[prevchar].italic
+            if italic then
+              setkern(n, italic)
+            end
           end
         end
         head, current = insertafter(head, current, n)

kernfactor is not recognized

luaotfload has a kernfactor method which is like letterspace only with decimal instead of percentage. The following document works fine with mode=node (both outputs are the same), but with mode=node the kernfactor version is not letterspaced:

(kernfactor is used in the microtype code. fontspec uses letterspace)

\documentclass{article}

\usepackage{harfload}
\begin{document}

 \font \testA = "file:Iwona-Regular.otf:+liga;mode=harf;+tlig;kernfactor=0.125;"
 \testA abc -- fi ff

 \font \testA = "file:Iwona-Regular.otf:+liga;mode=harf;+tlig;letterspace=12.5;"
 \testA abc -- fi ff

\end{document}

image

Support loading math fonts

Right now we don’t populate any of the math data in the font, also the way we load the glyphs is not compatible with the engine doing layout, so we would need to conditionally change that too.

url package gives broken output

This shows only h. in the PDF:

\documentclass{article}
\usepackage{url}
\usepackage{harfload}
\usepackage{fontspec}
\begin{document}
\url{https://example.com}
\end{document}

Do script itemization

We should resolve script for Common and Inherited characters if no script was selected during font loading.

An open question is whether we should respect the specified script at all. Automatic script resolution should allow setting multi-script text without loading the font multiple times, but there might be corner cases where the auto-detection would fail. To be investigated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.