Giter Club home page Giter Club logo

fontations's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fontations's Issues

Package naming

In the spirit of the similar issue in fontmake-rs, I thought I’d create one here for bikeshedding on names.

My personal preference is that the names should reflect the fact that these are low level libraries for a particular binary format.

To get the ball rolling, I’ll throw out “croft-” as a prefix. A sort of backronym for Compile/Read OpenFont/Type. I think that checks all the boxes.

Whatever we choose, it would also be nice to have a metacrate that pulls them all together with traverse/write feature gated.

Next!

Allow specifying additional derived traits in codegen

It would be nice to be able to say, for instance, that you would like to derive Hash or PartialOrd for some record. We could then be somewhat judicious about this, since not all traits can be implemented for all types, and not all traits may be desirable for all types.

Implement Default for compile types

This will make these types much easier to work with; for instance we will be able to easily create a new empty table and then populate it, or set some fields of a table and then use ..Default::default() to fill in the blanks.

Importantly for this, I think the default type for non-nullable offsets should be an empty table, instead of a null table. (I am starting to think that normal OffsetMarker should not be an Option<T> internally; this detail is a consequence of the fact that we want to allow reading a broken file and then mutating its contents, but i think if an offset in a file is malformed we should just record the issue somewhere and insert a new default table in its place, which we will be able to do once we have implemented Default for the compile types)

This will likely also require us to add syntax to the codegen inputs to allow the user to specify a default value for a field.

support vertical metrics tables vhea/vmtx

As per discussions with Rod, we are leaning towards generating a single HVhea struct for both horizontal (hhea) and vertical (vhea) header tables, as well as a single HVmtx struct for vertical (vmtx) and horizontal (hmtx) glyph metrics tables, like harfbuzz is also doing, as the pairs of tables share the same or equivalent binary structure.

Which means we'll have to rename field names to work with both vertical and horizontal layout (e.g. advance_max instead of advance_width_max or advance_height_max. etc.), and also we shall gloss over differences like vhea.advanceHeightMax being typed as signed int16 in the TT/OT specs for legacy reasons while hhea.advanceWidthMax is (more correctly) an unsigned uint16; or the fact the vhea version is a fixed Version16Dot16 whereas hhea's version is a MajorMinor tuple (for version 1.0, they are both encoded the same as 0x10000).

Declare structs for types with more than a single `ReadArgs` argument

Currently if a type has multiple read arguments, we pass them in as a tuple, but this makes it very easy to lose track of them, especially if the arguments are of the same type. It would be nice if instead of this, we declared a struct with named fields, and used that.

In this world,

#[read_args(number_of_h_metrics: u16, num_glyphs: u16)]
table Hmtx {
    // ...
}

would produce:

pub struct HmtxArgs {
    pub number_of_h_metrics: u16,
    pub num_glyphs: u16,
}

impl ReadArgs for Hmtx {
    type Args = HmtxArgs;
}

impl<'a> FontReadWithArgs<'a> for Hmtx<'a> {
    fn read_with_args(data: FontData<'a>, args: &HmtxArgs) -> Result<Self, ReadError> {
        let HmtxArgs { number_of_h_metrics, num_glyphs } = *args;
        // ..
    }
}

instead of,

impl ReadArgs for Hmtx {
    type Args = (u16, u16);
}

impl<'a> FontReadWithArgs<'a> for Hmtx<'a> {
    fn read_with_args(data: FontData<'a>, args: &(u16, u16)) -> Result<Self, ReadError> {
        let (number_of_h_metrics, num_glyphs) = *args;
    }
}

which should be much easier to understand.

Less attributes, more types?

Feels like I should be able to more directly tell it what I mean. Instead of this:

    #[nullable]
    #[read_offset_with($num_color_records)]
    color_records_array_offset: BigEndian<Offset32<ColorRecordArray>>,
// ...
#[read_args(num_color_records: u16)]
table ColorRecordArray {
    /// Array of 32-bit flag fields that describe properties of each 
    /// palette. See below for details.
    #[count($num_color_records)]
    color_records: [ColorRecord],
}

Let me write something akin to:

    #[nullable]
    color_records_array_offset: BigEndian<OffsetTo<[ColorRecord; num_color_records]>>,

Being able to spell nullable directly (NullableOffsetTo? OptionalOffsetTo?) would be nice too. IMHO this moves the codegen inputs closer to the schema we've mentioned before.

I would love to try to hack this together if it makes sense. Pointers most appreciated :)

otexplorer throws `hmtx: Error 'An offset was out of bounds'` with NotoColorEmoji

get the font from https://github.com/googlefonts/noto-emoji/blob/main/fonts/Noto-COLRv1.ttf

then try use otexplorer to dump the hmtx table and observe this unexpected error:

$ cargo run --bin otexplorer -- -t hmtx ../noto-emoji/fonts/Noto-COLRv1.ttf
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/otexplorer -t hmtx ../noto-emoji/fonts/Noto-COLRv1.ttf`
hmtx: Error 'An offset was out of bounds'

Take ValueFormat into account when compiling ValueRecord

currently, when compiling value records, we skip null values. This means it is currently impossible for us to correctly compile a value record which contains a null device offset, even if this is expected, and the ValueFormat indicates such a table will be present.

If this occurs, we will silently produce an incorrect font.

This is not a very likely scenario, but it is still something we need to address.

The solution to this will likely be some slightly fancier way of indicating custom compilation behaviour.

Have a dummy codegen input for basic testing

When playing with codegen, testing is annoying: often I end up just rebuilding everything with my current changes, which can produce mountains of errors.

I would like, instead, to have a special input file that is small, but contains examples of the various patterns/attributes, and then for testing we can chose to regenerate just this file, which will be included in some #[test] gated module.

Drop `#[nullable]`?

Every offset could be null and the compiler forces the consumer to be ready for it (yay!). Encoding #[nullable] is basically capturing a validation, that it shouldn't be null. We don't capture validation rules for other fields, perhaps we shouldn't capture this one?

Language agnostic schema

Once #84 is done we're getting close to the codegen input being language agnostic. It "just" needs to be a form other than Rust and to only have attributes that make sense beyond Rust. Strawman to incite debate: use https://toml.io/en/, it's simple, widely supported, supports comments, and more than sufficient to capture what we need.

Today

/// [COLR (Color)](https://learn.microsoft.com/en-us/typography/opentype/spec/colr#colr-header) table
table Colr {
    /// Table version number - set to 0 or 1.
    #[version]
    version: u16,
    /// Number of BaseGlyph records; may be 0 in a version 1 table.
    num_base_glyph_records: u16,
    /// Offset to baseGlyphRecords array (may be NULL).
    #[nullable]
    #[read_offset_with($num_base_glyph_records)]
    base_glyph_records_offset: Offset32<[BaseGlyph]>,

Tomorrow

tag = "COLR"
root = "Colr"

[table.Colr]
comment = "[COLR (Color)](https://learn.microsoft.com/en-us/typography/opentype/spec/colr#colr-header) table"

[table.Colr.version]
type = "u16"
attrib = ["version"]
comment = "Table version number - set to 0 or 1."

[table.Colr.num_base_glyph_records]
type = "u16"
comment = "Number of BaseGlyph records; may be 0 in a version 1 table."

[table.Colr.base_glyph_records_offset]
type = "Offset32<BaseGlyph[num_base_glyph_records]>"
attrib = ["nullable"]
comment = "Offset to baseGlyphRecords array (may be NULL)."

Add additional interesting test ttx files for glyph loading

Per #191 (review), we want more ttx files for testing glyph loading with interesting cases. We'll also want some variable fonts to test when we get tuple variation store support in read-fonts. And of course, hinted fonts at some point.

Making this an issue to avoid blocking that PR on ttx curation.

Add syntax for declaring external record or scalar

With the removal of BigEndian, we no longer have a mechanism whereby the user can communicate in a codegen file whether an unknown type is a scalar or a record. (It is currently possible to use unknown types that are tables, since tables are always referenced via an offset.)

This is a fairly uncommon case, but it does exist, and should be supported.

The information we will need to be able to express for these types is the following:

  • The name of the type in the read crate
  • The name of the type in the write crate (these will generally be the same?)
  • possibly metadata/attributes that explain how to read the type? ~For instance if we're declaring a custom record, it is probably because it is non-trivial, which probably means it needs external arguments in order to be parsed. For instance ValueRecord requires a ValueFormat argument. ~ (Actually this is unnecessary? you'll need to have this metadata at the place where the type is used, but not at the declaration site. It's possible that other attributes may be required, though?)

What should syntax like this look like?

I think the best approach is to match as closely as possible the existing syntax. I think we can do this by introducing a new keyword, extern (which is already a rust keyword, although it is no longer used very often).

So basically I think that ValueRecord should be declared as,

extern record ValueRecord;

And a custom scalar (let's imagine ValueFormat required a custom implementation) would look like,

extern scalar ValueFormat;

And if required, we could add attributes to this.

At parse time, I think we would probably want to parse this into a new category of top-level item.

rethink vhea/hhea & vmtx/hmtx

I have run into a few issues with using a common type to represent two distinct tables here.

First off, using a common type doesn't work with the TopLevelTable trait (added in #164). Since a single type represents multiple top-level tables, there is no way to implement this trait correctly.

The second issue is that hhea is always version 1.0, and vhea is at version 1.1. This means that we need to somehow track which table we're in when we compile, since we need to write out different versions for the different tables.

Currently, I can see two ways to address this issue: manually, or via codegen.

Addressing it manually means manually adding different wrapper structs that contain a common inner table that is codegen'd. This is very possible, but it will involve a lot of boilerplate; basically we will need to manually implement each of the traits we expect to be present. This isn't too bad, but it is annoying, and it also means that if we make any future changes to our traits we will need to manually maintain this set of implementations. On the plus side, we can probably get away with using a macro to handle this implementation, so at least we only write it once, and not twice; I've taken that approach in #164, where it looks like this.

Addressing it via codegen means having each of these tables get its own codegen file, which will generate a distinct type. The downside of this is that it involves duplication. The upside of this is that we can delete a bunch of hand-written code, and avoid the future maintenance costs that this entails, and it also means that these tables will now 'look and feel' more consistently like all the other top-level tables in the crate.

We've already had this discussion (in #104) but in light of actually working with the existing code, my feelings have changed, and I think that just having each of these tables be generated separately is probably the better compromise.

Also: One of the arguments for having shared implementations was that it would be possible to reuse these tables in both vertical & horizontal contexts. it's really not clear to me if this is a practical or theoretical concern, but if it is a practical concern I think that we could address it with a trait, when it arises?

Does anyone else have any strong feelings? cc @anthrotype

Figure out real FontBuilder API

We should perhaps create, and cite, an issue to not use Cow in the FontBuilder API?

Originally posted by @rsheeter in #153 (comment)

The current API for actually constructing a top-level font object is a hack that I put together for the subsetting experiments, and is definitely not what I want it to look like, long-term.

I don't have a coherent long-term design in my head, but I do have some vague ideas:

  • with the existence of a 'SfmtTable' trait (e.g. a trait for associating root tables with their tags) then we should generally be able to have an api that's just like, builder.add(table).
  • we want to support collections
  • we would also like to figure out how we handle the case where we load, modify, and recompile a font; we don't want this to need to generate the compile versions of all tables.
  • figure out where the parallelization happens. Probably the uncompiled objects go into the builder, and then when we hit build we compile everything in parallel?

Come up with some approach for specifying current compilation behaviour

In codegen, we currently have a #[compile(..)] attribute. This attribute takes an expression, which is expected to resolve to the concrete type of the field; that is, if the annotation is applied to a u16, the expression must evaluate to a u16.

This is a nice handrail in many cases, but it is inadequately expressive in complicated cases. Often, we have some type that has complicated encoding rules, which cannot be trivially derived. In these cases, it would be nice if we could have something like #[compile_with(..)]; this would accept something like an expression/method name, which would resolve to any type which implements FontWrite, and so would provide a more flexible escape hatch for cases where we want full control over how to compile a particular field or type.

A feature like this will be useful for resolving #189, but it has come up in a number of other places as well, such as the string records in name and post.

Apache2 please

For context we are advised not to read things without licenses

Generate a Java reader

We have a limited in-house reader library, generating a more complete one would be both useful and a great demonstration of generality of the codegen.

Require best practices

In a week or two turn on disable merge unless CI passes and there is an approving review

Missing table tests

Every table should have tests, at the very least basic read akin to read-fonts/src/tables/cpal.rs. Today some tables have none.

[otexplorer] Minor: Table name spelling doesn't match table tag

Nit/Minor issue observed when using otexplorer:

$ cargo r /Users/drott/dev/color-fonts/build/test_glyphs-glyf_colr_1.ttf -q post | less

starts outputting

┌─────────────────────────────────────────────────────────────┬─────────────┐
│Post                                                         │             │
│ version: 2.0                                                │ 00 02 00 00 │

I.e. Post vs post - IMO it'd make it more consistent if the capitalization would match the tag name.

Similarly COLR vs. Colr.

Improving ergonomics with a more-harfbuzz-like sanitize step

Currently we validate each table when it is read, and this validation does not chase offsets. This has two downsides: the first is that we need to do duplicate validation work each time we access a subtable, and the second is that we need to either check or unwrap the result type on each access, which is annoying.

I've been thinking about a possible pattern that would avoid this. Basically this would look like,

  • Alongside TableRef<T> we have a type, CheckedTableRef<T>. (names to be bike-shedded)
  • TableRef<T> has a method that looks something like check(&self) -> Option<CheckedTableRef<T>>
  • This method recursively resolves each offset, and calls check on the underlying table. It returns Some only if all offsets can be read.
  • CheckedTableRef<T> now has offset getters that return their target without having to perform validation, and without a Result wrapper.
  • This mechanism would be opt-in, either at the level of codegen or simply by the caller; but this means that you might choose to only opt for the checked table in a place where re-reading is expensive, and where there is no recourse if an inner offset is broken (for instance this might make sense for a layout subtable, where we probably want to just skip the table if something is malformed)

Downsides that I can think of:

  • more generated code
  • more complicated API
  • complicates consumer API: callers might want to have the ability to work with either the checked or unchecked version of some table, and I can't think of a great pattern for dealing with this? But maybe the answer is that this mechanism would only be used in places where we would be happy to bail on failure?

Test data for comparison with FreeType

There are several test cases in read-fonts and punchcut where the condition is “produces results identical to FreeType.” The expected output is currently hand curated which is not ideal. We’d like to be able to generate this programmatically to both ensure consistent data and to facilitate adding new test cases.

I see two potential solutions to this problem:

  1. Take on the freetype crate as a test only dependency in these projects. Use this inline to generate expected values on the fly.

  2. Add a new crate to the repo that is used to generate the test data which is then included in the relevant places.

I’m not sure if 1 violates our new dependency rule. I suppose we should clarify if build and test dependencies are subject to it.

Provide an easy way to run a subset of codegen targets

Right now if I want to repeatedly regenerate the same table I copy codegen_plan.toml and make one with just what I want, e.g.

[[generate]]
mode = "parse"
source = "resources/codegen_inputs/cpal.rs"
target = "read-fonts/generated/generated_cpal.rs"

This doesn't work well because it seemingly deletes read/write-fonts/generated/* regardless of whether the plan touches the files in question.

It would be nice to be able to run the normal codegen and tell it "only do this/these items" such as by regex on the source or target. At a glance I think just filtering in fn run_plan(path: &Path) might do it?

When codegen fails a backtrace is nice

I found it very helpful to have eprintln!("{}", Backtrace::capture()); before places that return syn::Error's when modifying the codegen. Without it the lovely error told me where in the input the problem was but not so much where in the codegen it might be.

I wonder if it's worth having that as a standard practice?

Improvements to codegen tool behaviour

The codegen tool has evolved organically over time, and as we add more and more inputs it has started to bog down a bit. I would like to make two general changes:

  • remove different 'file' and 'plan' modes: I would like there to be a single way to invoke the codegen tool, and this should use a plan.
  • filtering: I would like to able to pass some argument to the tool that filters what files are rebuilt.
  • change clean behaviour: the idea behind the clean stage was that if you remove a given codegen output from the plan, or if you change an output filename, it is very easy to end up holding on to copies of stale outputs. The current design is bad, though: we always run this clean step, and we delete files before we even know if codegen has succeeded. This feels bad.

Some unstructured ideas:

  • for clean, we have options. Maybe the simplest approach is that we don't run it by default; instead we add an optional all command that does a full clean and rebuild. We can run this in CI, and that will ensure that the generated directories contents are fresh.
  • for filtering, the best option might be to filter based on the output path? the provided argument could be a regex, but I think right now I would lean towards some kind of custom syntax; you provide an input filename, and then can optionally append a mode, with a dividing slash or something? basically you could write, -f head to regenerate both outputs for the head table, or you could write -f head/compile to only regenerate the write-fonts version? I'm just brain-storming.

In any case wanted to write this down; it isn't a huge priority but would likely be a nice QoL improvement.

[otexplorer] Panic when accessing COLR.version

$ cargo r /Users/drott/dev/color-fonts/build/test_glyphs-glyf_colr_1.ttf -q COLR.version

     Running `/Users/drott/dev/fontations/target/debug/otexplorer /Users/drott/dev/color-fonts/build/test_glyphs-glyf_colr_1.ttf -q COLR.version`
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', read-fonts/src/tables/../../generated/generated_colr.rs:275:42

While for example

$ cargo r /Users/drott/dev/color-fonts/build/test_glyphs-glyf_colr_1.ttf -q post.version
$ cargo r /Users/drott/dev/color-fonts/build/test_glyphs-glyf_colr_1.ttf -q hhea.version

work fine.

Cases where Rust impl details are embedded in the codegen inputs

For future discussion: I've reviewed all the inputs and come up with the following tentative list of places I notice the inputs feel Rust-specific:

  • ComputedArray/VarlenArray: these are irrelevant if you aren't trying to do zerocopy, since you can just iterate the items and allocate for all array types.
  • the presence of lifetimes in the codegen inputs; for instance in gpos we have:
    table PairPosFormat3 {
        // ..
        class1_records: ComputedArray<Class1Record<'a>>,
    }
    
      record Class1Record<'a> {
          class2_records: ComputedArray<Class2Record>,
      }
    basically we want to include the lifetime in the innertype of ComputedArray; not all types in computed array have a lifetime, and we don't know how to guess if we need it or not. We could let it be elided in the decl of Class1Record, I think, but so long as we require it in the other place I think it's less confusing if we require it in both places.
  • we describe the target of an offset using rust generic syntax.
  • some attributes (compile, e.g) allow you to specify expressions inline, and we often include a self parameter in the specified expressions (e.g. to call a method on self)
  • we specify documentation in markdown syntax (e.g. for specifying links)
  • the idea of generic offsets (and the #[generic_offset]) attribute
  • various places that aren't exactly 'rust specific', but more specific to our particular implementation: attributes like compile, compile_type, validate, skip_font_write, skip_from_obj, to_owned, and traverse_with.

make read-fonts a hard dep of write-fonts

this will let write-fonts reuse traits and types declared in read-fonts.

One immediate consequence of this: any custom flags or enums that are declared in codegen can now be generated just once, in read-fonts, and then they can be imported and reused in write-fonts.

It will also let us reuse certain traits, such as described in #109.

Delete resources/raw_tables/?

These are single-use. As soon as you start modifying the "real" codegen input it would be a terrible mistake to rerun generation off them. Given that they are transient my instinct is to think we shouldn't commit them?

When avar2

As we continue to make avar2 fonts, it would be great to see additional avar2 read and write features in more compilers :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.