Giter Club home page Giter Club logo

hamza's Introduction

Hamza

Hamza is a Header-Only, Fast and Portable C99 Unicode/OpenType shaping and rendering library. It's designed to be a small, portable and optimized shaper that's easy to integrate into any existing project. Below is an image of a short string of Arabic shaped with this library using a fairly complex font, random colors are assigned to each glyph.

UCD File Generation

Hamza includes the single-file programs update_ucd_ftp and generate_ucd_headers. The first pulls the necessary UCD files from the FTP server at ftp.unicode.org and requires curl. The second generates optimized C headers from those UCD files. Both of these programs make use of the POSIX regex library for filtering and parsing.

Download the UCD txt, this might take a few minutes so only do if UCD headers are out of date:

./build/update_ucd_ftp

Generate the header files for the UCD versions:

./build/generate_ucd_headers

Getting Started

To start using Hamza, define HZ_IMPLEMENTATION before including hz.h. You can optionally define HZ_NO_STDLIB for . It's also necessary to include the header for the UCD for the version you require. We will explain later how these are generated and how you can update them yourself.

#define HZ_IMPLEMENTATION
#include <hz/hz_ucd_15_0_0.h>
#include <hz/hz.h>

To initialize the library first fill a hz_config_t struct and call hz_init:

  hz_config_t cfg = {
  };

  if (hz_init(&cfg) != HZ_OK) {
      fprintf(stderr, "%s\n", "Failed to initialize Hamza!");
      return -1;
  }

Loading Fonts

Next, before you can shape any text you must provide font data. You want to load a font into a stbtt_fontinfo struct. Hamza includes stb_truetype.h which is intended to be used in reading fonts. To create a hz_font_t from a stbtt font, write:

hz_font_t *font = hz_stbtt_font_create(&fontinfo);

Hamza aims to let the user manage the memory allocation and the data as much as possible. Before shaping the font data has to be parsed into a hz_font_data_t struct. This holds all the OpenType table data required for shaping with a specific font. The hz_font_data_init function takes as argument how much memory will be allocated to hold that font's data:

hz_font_data_t font_data;
hz_font_data_init(&font_data, 1024*1024); // 1MiB
hz_font_data_load(&font_data, font);

Create a shaper and initialize it:

hz_shaper_t shaper;
hz_shaper_init(&shaper);

Set the shaper's required parameters:

hz_shaper_set_direction(&shaper, HZ_DIRECTION_RTL);
hz_shaper_set_script(&shaper, HZ_SCRIPT_ARABIC);
hz_shaper_set_language(&shaper, HZ_LANGUAGE_ARABIC);

Set the shaper's typography features:

hz_feature_t features[] = {
      HZ_FEATURE_ISOL,
      HZ_FEATURE_INIT,
      HZ_FEATURE_MEDI,
      HZ_FEATURE_FINA,
      HZ_FEATURE_RLIG,
      HZ_FEATURE_LIGA,
};

hz_shaper_set_features(&shaper, features, sizeof(features)/sizeof(features[0]));

Create glyph buffer and shape!

hz_buffer_t buffer;
hz_buffer_init(&buffer);
hz_shape_sz1(&shaper, &font_data, HZ_ENCODING_UTF8, "السلام عليكم", &buffer);

After this, you can access the buffer's glyph data and render. After you are done with everything you have to deinitialize.

hz_buffer_release(&buffer);
hz_font_data_release(&font_data);
hz_font_destroy(font);
hz_deinit();

Tested Compilers

  • GCC 10.3.0 x86_64-w64-wingw32
  • GCC 10.3.0 x86_64-w64-wingw32 (mingw64)
  • MSVC 19.35.32217.1
  • MSVC 19.29.30148.0
  • Clang 16.0.0 x86_64-pc-windows-msvc

Features

  • Joining script support and RTL writing
  • Kerning
  • Ligatures
  • Support for new OpenType language tags (mixture of ISO 639-3 and ISO 639-2 codes)
  • Vertical layout Support (mostly for CJK, Mongolian, etc...)
  • Color Emojis
  • Emoji Combinations
  • Multi-Threading
  • Unicode Normalization (NFC,NFD,NFKC,NFKD)
  • Open .aat .woff and .woff2 formats

LICENSE

Hamza is licensed under LGPLv3.

hamza's People

Contributors

saidm00 avatar saidwho12 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

sirdody

hamza's Issues

Rendering doesn't work properly

Hardware:
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lucienne (rev c1) (prog-if 00 [VGA controller]) Subsystem: Dell Device 0a78 Flags: bus master, fast devsel, latency 0, IRQ 50, IOMMU group 4 Memory at fce0000000 (64-bit, prefetchable) [size=256M] Memory at fcf0000000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at d0400000 (32-bit, non-prefetchable) [size=512K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=4 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] Secondary PCI Express Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [400] Data Link Feature <?> Capabilities: [410] Physical Layer 16.0 GT/s <?> Capabilities: [440] Lane Margining at the Receiver <?> Kernel driver in use: amdgpu Kernel modules: amdgpu

*-cpu description: CPU product: AMD Ryzen 7 5700U with Radeon Graphics vendor: Advanced Micro Devices [AMD] physical id: 4 bus info: cpu@0 version: AMD Ryzen 7 5700U with Radeon Graphics serial: Unknown slot: FP6 size: 400MHz capacity: 4372MHz width: 64 bits clock: 100MHz capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp x86-64 constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca cpufreq configuration: cores=8 enabledcores=8 threads=16

Screenshot of the issue:
https://cdn.discordapp.com/attachments/565325006351892485/1029667484581757028/unknown.png

Mark-to-base attachments sometimes don't work with omar-type fonts.

Mark-to-base attachments sometimes don't work when a GSUB lookup replaces a medial form glyph as an example with 2 or 3 glyphs. As an example a base, tanqeet (dotting) and a khatt (stroke) can replace a initial form glyph, afterwhich the GPOS mark-to-base lookup will naively pick the first previous base glyph to attempt to attach to, and obviously it will not find any valid attachment points. This is my best assessment for what is happening in this case, in some cases it finds the correct base, but it has no attachment points in the font itself, so this is because some GPOS lookups are not yet handled.

image
image
image

Bug with omar-type fonts Arabic shaping, joining is inconsistent with YEH, BEH and JIM.

After working for a few days on omar-type font Ayesha Quran, I have come up to a few bugs. These fonts utilize mainly GSUB lookups 1,2,4,6 (excluding 7, which is the extension lookup). The isol, init, medi and fina features are implemented with lookup 1, 2 and 6. It seems to generate strokes between the base character glyphs depending on the context it's in. Contextual substitutions are built for that kind of operation, where you compare a sequence of glyphs e.g. <abc> against the original string, and if they match the engine has to apply "nested" substitutions to glyphs within that context, for some reason though, it's not working especially for the base characters (dotless, these are the base strokes) YEH, BEH and JIM.

image
image
image

OS/Arch compilation status

Post any issues or successes using Hamza down below with your system information such as:

Device
OS
Arch
CPU
GPU
Compiler
Build commands

Feel free to post pictures and logs as well.
This is to keep a status on all platforms and configurations this might have issues compiling in, as well as platforms that have no issues.
As of now Hamza has been compiled on Windows (x86), Linux (x86/arm), Wasm, the Nintendo Switch (Tegra X1) and a TI calculator (with some issues relating to io). I want to start keeping a record of which platforms need work so that we can iron out bugs and get this running on more systems or to document any difficulty people might encounter when using this.

Nested lookups problem with mark to base positioning cannot get previous base glyph.

Issue with nested lookups (depth >= 1) of type HZ_GPOS_LOOKUP_TYPE_MARK_TO_BASE_ATTACHMENT in the case of chained contextual lookups. The hz_buffer_t *in argument for the nested should possibly also contain the prefix and suffix ranges for this to be resolved. I suspect this is the reason certain mark to base adjustments are not being applied in such fonts as omar-type Ayesha Quran with the ALEF character. This happens regardless if the mark is to be placed above or below the base, it is independent of attachment point.

The following is reference from LibreOffice Writer:
image

Hamza:
image

segfault during hz_font_data_load with microsoft supplied arial.ttf

"C:\Windows\Fonts\arial.ttf" (microsoft supplied font) is the offender. replace the arslan wessam path in the stbtt_rasterize demo with that path for repro.

in case we have different versions, here is the file from my system: https://cdn.discordapp.com/attachments/1017579229208117299/1101262155078176890/arial_win.ttf

here is my stack trace for reference (it is the same with mingw / without):

0x00007ff612b3289d in hz_parser_read_u16_block (p=0x7e055fbd68, write_addr=0x2, size=15) at ../../hz/hz.h:1051
1051            *write_addr++ = hz_parser_read_u16(p);
(gdb) backtrace
#0  0x00007ff612b3289d in hz_parser_read_u16_block (p=0x7e055fbd68, write_addr=0x2, size=15) at ../../hz/hz.h:1051
#1  0x00007ff612b355ba in hz_read_coverage (memory_arena=0x7e055ff648, p=0x7e055fbd68, cov=0x1e960385fa0) at ../../hz/hz.h:2061
#2  0x00007ff612b36f2a in hz_ot_load_chained_sequence_context_format3_subtable (memory_arena=0x7e055ff648, p=0x7e055fbd68, table=0x1e960385f20) at ../../hz/hz.h:2926
#3  0x00007ff612b5db21 in hz_load_gpos_chained_context_positioning_subtable (memory_arena=0x7e055ff648, p=0x7e055fbd68, lookup=0x1e96029cd30, subtable_index=5, format=3) at ../../hz/hz.h:4528
#4  0x00007ff612b5cce2 in hz_read_gpos_lookup_subtable (memory_arena=0x7e055ff648, p=0x7e055fbd68, lookup=0x1e96029cd30, lookup_type=8, subtable_index=5) at ../../hz/hz.h:4583
#5  0x00007ff612b5cad8 in hz_load_gpos_lookup_table (memory_arena=0x7e055ff648, p=0x7e055fbd68, face=0x1e95ffc3d40, table=0x1e96029cd30) at ../../hz/hz.h:4671
#6  0x00007ff612b5b90a in hz_load_gpos_table (font_data=0x7e055ff638) at ../../hz/hz.h:4813
#7  0x00007ff612b4a661 in hz_font_data_load (font_data=0x7e055ff638, font=0x1e95ffb9870) at ../../hz/hz.h:4846
#8  0x00007ff612b4a33f in main (argc=1, argv=0x1e95ffb7700) at main.c:282```

latin combining characters config question

I can't get the stbtt_rasterize demo to properly display u with a combining character umlaut ("u\u0308" in c). I have little experience with unicode text shaping, so i assume this is a configuration issue or something else I don't know about. I've tried a few different feature combinations unsuccessfully.

what features do i need to enable for this to work? I have ABVM and CCMP since those seem obvious.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.