linebender / norad Goto Github PK

View Code? Open in Web Editor NEW

44.0 44.0 12.0 876 KB

Rust crate for working with Unified Font Object files

License: Apache License 2.0

Rust 100.00%

norad's People

Contributors

Stargazers

Watchers

Forkers

eliheuer moyogo mfek chrissimpkins benkiel xorgy cmyr simoncozens rsheeter waywardmonkeys alpha-tango-kilo cagancosan

norad's Issues

Pen Protocol

There exists in fontTools the pen protocol, which acts as a mediator between things using their own outline drawing methods. The scope of the API is to standardise the way outlines and components are drawn, without providing channels for metadata like glyph width.

What's interesting is that the protocol is not only used for mapping outline data from one internal representation to another, but also for e.g. filtering. It is a sort-of-pipeline for outline and component processing, available in a point-based and PostScript-like drawing model. Examples:

Adapting a point-based drawing model to a PostScript-like drawing model: https://github.com/fonttools/fonttools/blob/4fb22ee/Lib/fontTools/pens/pointPen.py#L147
Reversing a contour, e.g. for after you converted cubic to quadratics for a TTF: https://github.com/fonttools/fonttools/blob/4fb22ee/Lib/fontTools/pens/reverseContourPen.py#L9
Recording the pen movement, e.g. for later replay into a different pen or for testing (with components optionally decomposed): https://github.com/fonttools/fonttools/blob/4fb22ee/Lib/fontTools/pens/recordingPen.py
Testing whether a point falls inside the "black" area of a glyph: https://github.com/fonttools/fonttools/blob/4fb22ee/Lib/fontTools/pens/pointInsidePen.py#L13
Converting cubic outlines to quadratic TTF-ready ones: https://github.com/fonttools/fonttools/blob/4fb22ee/Lib/fontTools/pens/cu2quPen.py
Compute a hash for an outline for checking if some metadata is out of date: https://github.com/fonttools/fonttools/blob/4fb22ee/Lib/fontTools/pens/hashPointPen.py

This allows e.g. ufoLib2 and glyphsLib to implement just a drawPoints method on their glyph objects and use the existing pens to transport or transform the data without reimplementing the functionality at the source or destination, e.g. for compilation inside ufo2ft.

Point out glyph name in error message when contents.plist points to glif file that does not exist

Otherwise I get just a generic "file not found" error.

Extend crate::glyph::default_file_name_for_glyph_name to reference algorithm

Needs to be used for both glyph and layer filenames.

https://github.com/fonttools/fonttools/blob/master/Lib/fontTools/ufoLib/filenames.py

Unconditionally overwrite `meta.creator` on writing

Since the creator value is meant to represent the writer of the UFO, norad should always write down itself there. I don#t think it is necessary anymore to refuse writing out UFOs not created in norad.

Annoying edge case to think about: save() should not modify self, so it would ignore what the user put into meta. This is slightly iffy.

Equality testing

All UFO objects should support equality testing.

I'd like the following code to work:

#[test]
fn test_upconvert_ufo() {
    let ssp_v2 = Ufo::load("testdata/sourcesanspro_v2.ufo").unwrap();
    let ssp_v3 = Ufo::load("testdata/sourcesanspro_v3.ufo").unwrap();

    assert_eq!(ssp_v2, ssp_v3);
}

Some corner cases I can think of:

Everything that is lazily loaded (e.g. data, images once implemented) must be loaded in for the comparison. Potentially costly.
Some data structures like a glyph can have an empty outline container or None. A purely derived PartialEq is probably going to treat those differently. Does it matter?

Norad bloat checks

What is the approach to make the CI bloat check pass? Does this automatically fail on PR's that come in from downstream GH repos?

Can you please update crates.io?

I would like some version post-2233ab4 on crates.io if possible

Read/write performance

I am trying to figure out serial read and write performance of norad compared to Python code. Parallel reading and writing I can compare later. Test UFOs from https://gitlab.gnome.org/GNOME/cantarell-fonts/tree/6f8bb4bc/src (roughly 3 times 1000 glyphs).

Rust code in some test section somewhere:

#[test]
fn save_fancy2() {
    let mut my_ufo = Ufo::load("cantarell-fonts/src/Cantarell-Light.ufo".to_owned()).unwrap();
    my_ufo.meta.creator = "org.linebender.norad".to_owned();
    my_ufo.save("/tmp/test1.ufo").unwrap();

    let mut my_ufo = Ufo::load("cantarell-fonts/src/Cantarell-Regular.ufo".to_owned()).unwrap();
    my_ufo.meta.creator = "org.linebender.norad".to_owned();
    my_ufo.save("/tmp/test2.ufo").unwrap();

    let mut my_ufo = Ufo::load("cantarell-fonts/src/Cantarell-Bold.ufo".to_owned()).unwrap();
    my_ufo.meta.creator = "org.linebender.norad".to_owned();
    my_ufo.save("/tmp/test3.ufo").unwrap();
}

Python equivalent:

from pathlib import Path

import ufoLib2

for i, p in enumerate(Path("cantarell-fonts/src").glob("*.ufo")):
    u = ufoLib2.Font.open(p)
    for g in u:
        pass
    u.save(f"/tmp/ptest{i}.ufo", overwrite=True)

Of note may be that /tmp is a RAM-backed tmpfs on my machine. Multiple runs so that target paths are always overwritten. Fedora 31 x64, Ryzen 1700.

Rust:

$ time cargo test save_fancy2
    Finished test [unoptimized + debuginfo] target(s) in 0.03s
     Running target/debug/deps/norad-ddd9e0934bc9c964

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 26 filtered out

     Running target/debug/deps/save-2001b18a33bb825e

running 1 test
test save_fancy2 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out

1.69user 0.12system 0:01.84elapsed 98%CPU (0avgtext+0avgdata 19560maxresident)k
0inputs+0outputs (0major+4154minor)pagefaults 0swaps

Python 3.8.1:

$ time python Untitled-1.py
1.50user 0.13system 0:01.65elapsed 99%CPU (0avgtext+0avgdata 34212maxresident)k
0inputs+0outputs (0major+7388minor)pagefaults 0swaps

😱

Glif: serialize guidelines after anchors

ufonormalizer does it.

crash when saving after renaming a glyph

Not super surprising, never really tried to exercise any of this.

thread 'main' panicked at 'all glyphs in contents must exist.', ~/.cargo/git/checkouts/norad-38f495afcd82c43a/0a18f6c/src/layer.rs:70:25

Tabs vs. spaces

defcon and FontForge seem to prefer two-space indentation in UFO XML files, but Norad uses tabs.

I see a PR in the past saying that ufonormalizer used a single tab instead, but everything else uses two spaces so I'm a bit confused. From where I stand, more tools use two spaces than one tab, and ufonormalizer is the odd one out. The UFO specification itself has all of its examples set with two spaces per indent as well, which I'm sure contributes to it being a popular option for tools.

New parameter for margin setters to counteract movements of components in composites?

I noticed that in Glyphs.app, when you manually change the sidebearings of a glyph, the app will counteract margin changes in composites that use that glyph as a component. It doesn't do it under all circumstances (e.g. using HT Letterspacer won't trigger the mechanism, at least all the time(?)), but I wonder if ufoLib2's margin setters should get a parameter to do the same?

See if we can have compile times stay low

Found this very interesting article: https://matklad.github.io/2021/09/04/fast-rust-builds.html

This would be relevant for norad itself and for client code.

UFO implementation checklist

What's missing until Norad is basically feature complete:

Very nice to have:

When serializing property lists like ~~fontinfo~~ and kerning, make an effort to write out integers when we have integers instead of <real> elements everywhere. The values are the same, but other tools make the same effort, and we want to keep down on diff noise.
Better error types/API so that e.g. loading a faulty UFO produces a legible error message? Need to check all possible errors.
- More detailed fontinfo errors
- If missing e.g. a glif file listed in contents.plist, note the name, not just the OsError.

Deferred to later:

#262
#263
#198
Norad output is ideally indistinguishable from ufonormalizer output.
- #189
- #179

~~Other things to consider~~:

The UFO roadmap wants to drop the name attribute from the glyph structure, since the name is held by the layer already. Norad should probably avoid having an API that exposes the name of a Glyph object and only ever reference it through a layer to ensure a smooth transition, should v4 ever come. Side note: I think urges to make it possible to make several names point to the same glyph object should be avoided, unless Rust makes that hard anyway. There might be weird corner cases lurking in e.g. equality testing.
The UFO roadmap furthermore mentions a charactermapping.plist, which should contain a mapping of Unicode code point to glyph name. The side-effect is that there is a canonical source of truth; right now, the same glyph names can have different Unicode values in different layers. Since a central mapping register is the only way of handling this that makes sense to me, Norad should probably implicitly do it anyway. The v3 spec explicitly allows libs to do it: http://unifiedfontobject.org/versions/ufo3/#character-mapping. When reading, the mapping should be filled with values found in the default layer first, and with other layers on a first-come-first-map basis.
~~Component circular reference checking?~~ #219
The above things may change, may need more thinking and probably shouldn't hold up a v1.

API: expose Kerning and Groups

iondrive refers to both but uses the verbatim type; would be nicer to be able to reuse the typedefs.

Handling serialization errors

Invoking Ufo::save may error out early, but erroring out on the Glyph::save level leaves a half-written UFO on disk. How to handle? Also needs a review of structures that may fail validation mid-flight, like fontinfo.

Compare validation to ufoLib

Look through https://github.com/fonttools/fonttools/blob/master/Lib/fontTools/ufoLib/validators.py to see if we validate everything as ufoLib does.

E.g. check the Postscript blue zone lists, the number of items must be divisible by 2 because they describe start and end points each.

Glif lib: serialization ends up with unordered keys

The commented out test in #126 fails because the keys of the serialized lib section are unordered when they should be alphabetical with capital letters sorting first. See the referenced sample_period_normalized.glif file.

Glyph: remove `name` field?

The format saves the glyph name in the layer and in the glyph itself. The spec used to say (before the redesign) that this was redundant and may be changed in the next version of the format.

Removing the name would make it possible to implement Default on Glyph.

Glyph: get Vecs out of Options?

Currently, contours, components, anchors, codepoints and guidelines are Vecs hidden behind an Option, which seems to be because they allow for non-existence of attributes in a glif file. This leads to code like

fn move_glyph(glyph: &mut Glyph, delta: f32) {
    let Some(outline) = &mut glyph.outline {
        // ...
    }
}

which I don't find especially, uh, ergonomic? Same goes for anchors, etc. I'd rather have a

pub struct Glyph {
    pub name: GlyphName,
    pub format: GlifVersion,
    pub advance: Option<Advance>,
    pub codepoints: Vec<char>,
    pub note: Option<String>,
    pub guidelines: Vec<Guideline>,
    pub anchors: Vec<Anchor>,
    pub contours: Vec<Contour>,
    pub components: Vec<Component>,
    pub image: Option<Image>,
    pub lib: Option<Plist>,
}

Since empty Vecs can just not be serialized (and don't alloc until they have one member), I don't think making them optional holds semantic value? Same for lib maybe?

Related to #27.

unexpected 'data' field in some glyphs?

Playing around loading various UFOs into runebender and I hit a panic in the following glyph:

<?xml version="1.0" encoding="UTF-8"?>
<glyph name="bar" format="1">
  <advance width="247"/>
  <unicode hex="007C"/>
  <outline>
    <contour>
      <point x="169" y="-214" type="line" name="hr00"/>
      <point x="169" y="694" type="line"/>
      <point x="78" y="694" type="line"/>
      <point x="78" y="-214" type="line"/>
    </contour>
  </outline>
  <lib>
  <dict>
  <key>com.typemytype.robofont.mark</key>
  <array>
    <real>1</real>
    <real>0.6</real>
    <real>0.751</real>
    <real>1</real>
  </array>
  <key>com.adobe.type.autohint</key>
  <data>
  <hintSetList>
    <hintset pointTag="hr00">
      <hstem pos="-193" width="-21" />
      <hstem pos="694" width="-20" />
      <vstem pos="78" width="91" />
    </hintset>
  </hintSetList>
  </data>
  </dict>
  </lib>
</glyph>

The issue here is the <data> tag; we don't seem to now what to do with it. I'm really not sure what we should be doing with it; plist expects <data> elements to contain b64 encoded data, and this is just... arbitrary xml? Is this common in the wild? Clearly other tooling supports this somehow.

How to implement public.objectLibs

They glif and font-level key https://unifiedfontobject.org/versions/ufo3/glyphs/glif/#publicobjectlibs is meant to provide identifiable objects (anchors, guidelines, components, outlines, contours, points) with their own lib. Before, there was only the global lib.plist and the glif-specific embedded lib. The object lib is stored separately from the object because putting it where it belongs would require a new format plus version bump, that's reserved for UFO v4. Global guideline libs are stored in lib.plist, everything glyph-specific in the embedded lib there.

In ufoLib2, this is implemented in fonttools/ufoLib2#113. The PR added the new methods Font.objectLib(obj: Identifiable) and Glyph.objectLib(obj: Identifiable), where Identifiable is anything with an identifier attribute. The methods look at the object identifier (and assign a new UUIDv4 one if there isn't one already) and then fetch the dictionary in the font's or glyph's lib keyed to that identifier (or create and return a new one). We specifically avoided just adding a lib to the objects themselves, as would be ideal, because then you'd have a chance of font/glyph lib and object lib getting out of sync unless you store a back-reference to the containing font or glyph object and we didn't want that in ufoLib2. What if you wrote something to a glyph's objectLibs that was different from what the actual object had? Which would win at serialization time? What to do with the losing data? The (Glyph|Font).objectLib(obj: Identifiable) approach keeps the libs where they are saved and free of conflicts.

So, how to do this for Norad? There is no way yet to have a trait that just has a struct field in it, so you'd need a getter and setter.. and probably take the pub away from identifiers? Haven't looked into that, not sure how it impacts ergonomics... Maybe there's a better approach? Since the objectLib approach above modifies both object and containing lib, it is essentially a fn object_lib(&mut self, &mut Identifiable) -> &mut Dictionary, not sure how that vibes with the borrow checker.

Poor UFO writing performance on Windows

See

https://gregoryszorc.com/blog/2021/04/06/surprisingly-slow/ -> Closing File Handles on Windows
https://github.com/rust-lang/rustup/pull/1850/files
https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-January/078404.html

As far as I can tell, as long as Windows Defender (and presumably other A/V scanners) are running, there's no way to make the Windows I/O APIs consistently fast. You can disable A/V scanning (at your own peril). But the trick that Mercurial employs (which has later been emulated by rustup among other tools) is to use a thread pool for calling CloseHandle(). Even if you perform all file open and write I/O on a single thread and use a background thread pool only for calling CloseHandle(), you can see a >3x speedup in time to write files. This optimization should ideally be employed by any software that creates or mutates as little as a few hundred files on Windows. This includes version control tools, installers, and archive extraction tools. Fun fact: rustup can extract tar files on Windows faster than open source and commercial fast extraction/copy tools because it employs this trick and more. I believe rustup on Windows is actually faster at extracting tar archives than it is on Linux!

Maybe we can make norad close file handles in a different thread on windows? On my Linux machine, writing a 1000 glyph UFO takes 0.2s or something, on my Windows work machine 5s.

Why `f64`?

This is in some ways a follow up to #107, but is also different.

glifparser uses f32 to represent its point data. Specifically, it converts UFO glif's to a representation which is like this simplified one:

type Outline = Vec<Contour>;
type Contour = Vec<Point>;

struct Point {
  x: f32,
  y: f32,
  a: Handle,
  b: Handle
}

enum Handle {
  At(f32, f32),
  Colocated
}

I was wondering why Norad uses f64. I think this is perhaps overly optimistic. TTF fonts, of course, always round to int. But it's my understanding that the PostScript CFF format is also f32. Per here, here. So, I think, the maximum precision of a point must be f32. Anyway, I'm uneasy about what people's rasterizers etc would do to my fonts, so consider f32 the max available precision.

You can say, "well, UFO could compile to some as yet unknown font format which will be f64", but I don't think it's very clear that future font formats would even be f64. It's a big increase in filesize for no benefit, due to UPM in the thousands in most high quality fonts.

It also leads to smaller files to use f32, as you get some free rounding.

This is obviously a major incompatibility, so I guess one of us needs to convince the other on this issue. I can be convinced, if the argument is good.

`new` constructors for Contour, ContourPoint and possibly others need to deal with identifier/lib args

E.g. https://github.com/linebender/norad/blob/v0.3.1/src/glyph/mod.rs#L397-L403 needs to be

    pub fn new(
        points: Vec<ContourPoint>,
        identifier: Option<Identifier>,
        lib: Option<Plist>,
    ) -> Self {
        let mut c = Self { points, identifier: None, lib: None };
        if let Some(id) = identifier {
            c.replace_identifier(id);
        }
        if let Some(lib) = lib {
            c.replace_lib(lib);
        }
        c
    }

or something. Same for other structs that have a identifier/lib field.

Then linebender/runebender#221 can be reverted.

Please make NameRecord fields public

I can access a bunch of info.open_type_name_records but I can't actually do anything with them, which is a bit of shame.

Color::to_rgba_string: Add precision parameter?

Serializing f32s can lead to noisy strings like "0.80700004,1,0.6,1", where our tools limit things to three decimal places to facilitate exchange with Glyphs.app and FontLab 5. Three decimal places are enough to convert losslessly from (u8, u8, u8) to (f32, f32, f32).

To what extent, if any, should norad collaborate with MFEK/glifparser.rlib?

Hello,

I just got done writing glifparser v1.0.

glifparser was written primarily for MFEKglif but it's also used in MFEKstroke, and in MFEKufo (unreleased) for previewing glyphs.

It's currently in mfek branch, awaiting @MatthewBlanchard and I to patch MFEKglif and MFEKstroke to support the changes. The biggest change is that there are no more panic's, originally, I just panicked via .expect(...) in a lot of cases because I knew MFEKglif was the only consumer and had a custom panic unwinder that would pretty print the message in red and make it clear what went wrong.

glifparser can do quite a few things norad cannot:

"flatten" a glyph with components to its constituent contours, detect loops and bail out (uses kurbo::Affine!)
allow arbitrary data storage along with points via generic type attached to points
most importantly for MFEKglif project, deal with unattached .glif files that aren't part of a parent UFO
convert glif to SkPath (Skia path), piecewise contour (needs MFEK/math.rlib for this one)
load image data if glif has a parent UFO and image exists

I was thinking, maybe norad can support using glifparser at user option, via compile-time feature. Right now what I'm doing is, using norad with the UfoDataRequest I added and just not reading glyphs with it, using it for metadata only.

I think closer collaboration is possible. Is it desirable?

Allow customizing quote char in xml header

add a quote_char method to WriteOptions
When serializing the .glifs, (which we serialize manually) write the header to the stream ourselves, using the specified character
When serializing plists, we don't control writing the header, so we'll have to rewrite. There are options here. All plist writing is now handled through a method in write.rs.
- One option is to write to disk, and then use the unix/windows specific methods to go and overwrite the header in place, before closing the file handle: see std::os::unix::fs::FileExt and std::os::windows::fs::FileExt; you'll need slightly different impls on either platform.
- the other option would be to have a custom Write type, that would wrap or replace the BufWriter; this would intercept the header and replace it, if needed, otherwise forwarding the bytes to the inner writer.

Thinking about this, I think the first option is totally reasonable. The second option is probably marginally more 'efficient', but I don't think it matters; the main thing we want to avoid is closing and reopening all the file handles.

Originally posted by @cmyr in #148 (comment)

Glyph: Treatment of empty values vs. None

While (de)serializing a glif file's <outline> element, it is written if aGlyph.outline is Some(...) and not if it is None. The official(?) https://github.com/unified-font-object/ufoNormalizer/ will omit the element during "serialization" if it is empty. I think norad should do the same, but doing so makes the tests fail because no outline element is considered None and an empty one is Some(Outline { components: [], contours: [] }).

One way to handle this is to not de/serialize values if they are empty, assigning None instead if Some(empty_something).

Another option would be to treat some values with None and empty values as the same. This would need more thought because for some values, you need to distinguish between "no value set" and "empty value".

Yet another would be to not use Option in Glyph and test for emptiness when serializing.

Mid/long-term vision for Norad in the current font lib ecosystem

Colin asked me to write down what I would like norad to be. In short, I would like to see it as the entry point into the font compilation pipeline.

Why the entry point? Because the process of compiling a font needs a library that can construct general SFNT tables and compile AFDKO feature files and possibly other formats into OpenType Layout tables, which are both much bigger efforts than writing the handling of input formats. It's a start. For the rest, a Python bridge into fontTools and friends is needed.

Norad already almost covers the UFOv3 input format. It should also be able to read and write Designspace files, which hold UFOs as children.

There's roughly two things you can do with a Designspace: make instances or pass it to fontTools.varLib to make a variable font. Making instances can be implemented in Norad, but it needs to call into Python at a few points. Variable fonts should be left to fontTools for now.

Instances: In Python, the fontmake.instantiator module loads a Designspace and wraps all the master data (glyphs, kerning values, fontinfo values, ...) into fontMath objects, which implement arithmetic on them. It then uses fontTools.varLib to compute how to mix the masters to arrive at an instance, which is applied to the fontMath wrappers. The interpolated result is put into an instance UFO object and passed on or written to disk.

So, these pieces of code would be neccessary to implement in Rust:

My Designspace reader and writer implementation: https://gist.github.com/madig/5a3e13894f75287272a06983052b4322 or the original designspaceLib.
My instantiator module: https://github.com/googlefonts/fontmake/blob/master/Lib/fontmake/instantiator.py.
fontMath
The part of varLib that computes interpolation factors?
Complicating factor: How to wire up Rust code to behave like the Python libs they are pretending to be?

The biggest gains I would expect with this are:

SPEED.
Potentially higher correctness, as Rust forces error management on you.

Implement transformation for glyph contents when kurbo available

I'd like to be able to do kurbo::Affine * Contour or some such, found at least two instances where that would come in handy. Maybe something for returning a transformed copy and for transforming in place?

glif serialization: advance of zero should result in no <advance/> element

Also, empty outline can be serialized without <outline></outline>.

Font::load <-> Font::load_ufo?

What's the difference?

Ensure default layer is always present

The layer list is currently just a Vec, maybe it should be a wrapper struct that ensures the default layer is always present. This can also enforce that the name public.default is only used for the default layer.

Once this is ensured, Ufo::get_default_layer can return &Layer instead of an Option.

Experiment with test suite layout

I'm reading https://matklad.github.io//2021/05/31/how-to-test.html and it contains several things that look like they're worth exploring for a test suite.

One suggestion is trying to avoid I/O and instead making tests work in-memory. The cargo test suite seems to use builders for setting up "multi-file" tests, see e.g. https://github.com/rust-lang/cargo/blob/master/tests/testsuite/build_plan.rs. UFOs are multi-file objects after all. The builder can take strings for input to exercise parsers. The only thing you'd cut out would be read calls to the filesystem.

Add Ufo::new_layer(name) method

Should insert an empty new named layer and return a mutable reference to it. UX problem: what if your chosen layer name exists already? Return Result<Error, &mut Layer>?

Maybe even a combination method like get_or_insert_new(name)? Or something mirroring https://doc.rust-lang.org/std/option/enum.Option.html#method.get_or_insert. Use case is mutably grabbing e.g. the public.background layer or creating it if it doesn't exist yet. For when you just want to add stuff into it.

saving file removes unicode codepoints?

reported privately by @eliheuer; this would be undesirable.

Update `rust-plist`, use new features

There's a new 1.3.1 release. This update lets us make a few desired enhancements:

we can sort plist keys on write, which is very helpful for reducing unnecessary diff noise
we can customize whitespace behaviour. I forget what the exact motivation was for this, but I remember we wanted it.
we can get more information about the position of parse errors ebarnard/rust-plist#64
we can review the code to do with plist serialization and use Value directly instead of workarounds (e.g. layerinfo_from_file?)

Use f64 for floating point numbers

Python apparently uses f64 internally for floats but norad uses f32. This leads to vanilla Python code and norad to read the odd floating point coordinate differently (the f32 is upconverted to f64 in PyO3 bindings, changing the precision and making equality testing difficult). Maybe norad should use f64 where the spec talks about floats?

API: Make `Font.{groups|kerning|features|font_info}` non-optional

This makes the API a bit easier to use and the types in use don't allocate if empty. FontInfo would need to be checked for emptiness to not write it out when all its values are None.

Ufo::default() should include all items required by spec

Such as, specifically, the default layer. See #17 for discussion.

remove 'not created here' error condition

During development of norad, this was added because there was a bunch of data we weren't round-tripping. We're now quite a bit better at that, so I think we can relax this concern.

Improve plist handling

This has been discussed but feels worth having a tracking issue.

There are a few different things going on here. I'll quote @madig on zulip:

@cmyr thinking about better (embedded) plist handling, what we need is

something that reads an entire file like fontinfo.plist, which has a rigidly defined key-value (type) schema, basically what serde excels at
something that handles kerning.plist and groups.plist which both have a defined structure but nothing else, like a simple special case of the fontinfo case?
something that handles lib.plist, which is a standalone, free-form plist file
something that handles embedded plists in .glif and layercontents.plist files
number 1 needs something that associates keys with types, number 2 is a basic parse-into-hashmap scenario and 3 and 4 need just be dictionaries that can take arbitrary plist key types plus values

fontinfo.plist: The problem with fontinfo.plist is that it's huge, and has tons of optional keys, and so using serde to create a struct ends up both generating a lot of code and also creating this huge struct that must also be impacting binary size. It would be nice to replace this with some sort of typed dictionary, kind of like druid's Env.
lib.plist: this probably doesn't need to change, and should just be a plist object. We might also want to expose API that lets people try and deserialize this to some concrete struct, if we want?
embedded plists: this is mostly a matter of passing some portion of the xml off to a plist parser to generate a plist object, and then doing that in reverse when we serialize. It might also be nice to let the user provide some custom serdeable type here, but it will be hard to make that play nice in the type system.

This all feels very doable; I'm going to start by testing out the tweaks to fontinfo, and seeing how that feels.

Python bindings and mutation patterns

This is a quick sketch of what would be involved in adapting norad so that it would be useable from languages like python, as a drop-in replacement for existing libraries; the particular challenge here is getting norad objects to have the same semantics ('reference semantics') as objects in those languages, and making that work with Rust's ownership model.

This is follow-up from the discussion in fonttools/fonttools#1095.

I am going to make a bunch of assumptions about how some of the python tools work, so please correct me wherever I'm wrong!

My understanding of what we would want in a python API is basically: everything is an 'object', which reference semantics. If I get a glyph from a layer, and I change its outline, and then I go and get another copy of that glyph from the layer, those glyphs will be identical; they point to the same underlying data.

This is not how things currently work in norad; in norad mutation works through one of two mechanisms, which i'll call "borrowing" and "check-out/check-in":

borrowing: This is how layers currently work in norad. To mutate a layer, you call a method that returns a reference to a layer object, and you can mutate this layer; however you cannot hold on to a copy of this layer, or really pass it anywhere else.
check-in/check-out: This how glyphs currently work: norad doesn't even let you get a mutable reference to a glyph, at all. In norad, if you want to mutate a glyph, you get a glyph from a layer, you modify it, and then you add it back to that layer. You can think of this as being like a check-out/check-in mechanism.

A design to support bindings

I think that trying to make norad as it is currently written fit into the python model will be tricky, but I think there's a reasonably straight-forward answer, which is that we have a separate set of types and interfaces explicitly designed to work with python. This should let us continue to share all of the base types and parsing/validation/serialization logic, while letting us build two separate APIs that will respect the two distinct use-cases.

So basically: we add a python-specific wrapper, in rust, for each type, like:

pub struct PyUfo {
    meta: PyMetaInfo,
    font_info: PyFontInfo,
   ...etc
}

pub struct PyLayer {
    glyphs: Rc<RefCell<Map<String, PyGlyph>>,
    ...etc
}

pub struct PyGlyph {
    inner: Rc<RefCell<Glyph>>
}

etcetera.

Note: This assumes that the glyph is a 'leaf' type, that is it is the finest granularity object that you're allowed to mutate and expect those mutations to show up elsewhere. This might not be the case; for instance you might expect to be able to get a Contour out of a glyph and change its properties and have those be reflected everywhere; in this case we could also need a PyContour type, and PyGlyph would look more like the Glyph that's already in norad.

You can mostly ignore the Rc<RefCell<_>> bit. The Rc means is that we're using a Reference counted pointer, and the RefCell means basically that the internal data is not subject to rust's borrowing rules at compile-time.

(Rc + RefCell is assuming that this object will not be shared between OS threads, which seems like a reasonable assumption for python; if we do want that behaviour then we would instead use Arc + Mutex, which ensures that our reference counts and data access are thread-safe).

Borrowing problems

One possible concern with this approach involves borrowing expectations; the rule with RefCell is that when you actually want to mutate the data, you acquire a kind of 'lock'. If this object is already borrowed, you can't get that lock. In practice I think we can avoid this completely by ensuring all of that acquire/release happens on the rust side; I'd have to look into this a bit, though, to make sure. It might mean we have to write something to generate the python bindings ourselves, to ensure that things like setters and getters are doing that borrowing under the covers.

If we do have to expose this somehow, what we would do is to just throw a python exception if something was already borrowed. I was initially thinking this would be a larger part of the design, as a sort of safety valve; when folks migrated existing python code to this library they might hit some new exceptions, but I actually think we can probably avoid this altogether?

other thoughts

an alternative design based on proxy objects: I think if we want a drop-in replacement for an existing tool written in python, something like what I describe here will be the best route. There are options, like having 'proxy objects' that just hold a reference to the font or layer as well as a method for mapping mutations on themselves to mutations on the shared object. This honestly has a certain nerdy appeal, especially since we could do cool stuff like having a def delete: on a Glyph object that removes it from the layer and updates the layer_contents, but I think it's probably a bit more complicated and it's a bit less clear to me how well it would work, although I'm more curious as I end this paragraph than I was when I began?

next steps

This is intended as a sketch, and an actual design will require a bit more thought and research. I'm going to hold off on doing that work until I have a better sense of how much of a priority is this, and whether it's my priority or someone else's. If @simoncozens is interested in doing the work then I'm happy to offer whatever advice and guidance I can. Otherwise if @davelab6 thinks that this is worth a week or two of my time then I'm confident we can get something working pretty quickly; the only part I'm unsure of is how to generate the python bindings in a way that would play nicely with with this interior-mutability pattern.

Error out if trying to save UFO version != 3 or glif version != 2

The code currently only supports UFO v3 and glif v2. Converting up- or downwards requires some conversion to happen.

Revamp GlyphBuilder

It is now easier to outfit a Glyph without all the Options, so maybe it's time to revisit GlyphBuilder. It currently checks that:

... all identifiers we use are unique
... some methods are not called for format 1 glyphs (e.g. adding guidelines)
... some methods are not called more than once

The last point is due to glif file parsing, some elements are supposed to occur at most once. I think I should move that logic back into the glif parser and turn GlyphBuilder into a simpler general builder that just ensures all identifiers are unique and only format-appropriate methods are called 🤔 If there's a better way to do that, the builder can maybe go away completely. As long as there is no other use-case for the builder, the once-checking may as well stay, though.

Semi-related: The split GlyphBuilder <-> OutlineBuilder is somewhat problematic in that there needs to be a handover point for components, contours and identifiers (the latter need to be merged over so later builder commands can check against outline identifiers). If the builders are to be exposed to the outside, it would currently also require to expose the Outline intermediate struct for handover (making OutlineBuilder return components, contours and identifiers separately triggers clippy). This is... suboptimal. Maybe OutlineBuilder should be merged back into Glyphbuilder?

Wrapping plist errors

We load plist files at several points during UFO loading. Plist errors are currently very... sparse; their Display impl will print things like InvalidIntegerString and that's it. The error also contains line information, but it does not seem accessible from outside the plist crate (ebarnard/rust-plist#60).

I think what's needed is a more extensive PlistError akin to GlifError?

/// An error that occurs while parsing a .plist file
#[derive(Debug)]
pub struct PlistError {
    pub path: Option<PathBuf>,
    pub inner: plist::Error,
}

And then we need lotsa wrapping everywhere?

Also, writers?

Cross-platform line ending serialization approach

Thoughts on how to approach cross-platform line endings? Should norad write consistent line endings on all platforms (limits diff when norad serialization is used x-platform)? Keep line ending format that it reads in on subsequent serialization (maybe limits diff when norad serialization is used with other serializers)? Allow custom line ending support (flexible and maybe limits diffs across serializers and platforms)?

@belluzj mentioned today that he likes the idea of a runebender config panel with Unix vs. Win line ending style options on UFO serialization. I think that we can expand the approaches used for spacing and XML declaration quote customization to address it if this is desirable. All of these serialization format options might be useful in such a config panel.

Originally posted by @chrissimpkins in #161 (comment)

Slow writing performance on Windows

Writing a UFO on Windows can be excruciatingly slow for some reason. Need to profile.

Maybe it's due to https://gregoryszorc.com/blog/2021/04/06/surprisingly-slow/ -> "Closing File Handles on Windows"