Giter Club home page Giter Club logo

qiji-font's Introduction

齊伋體 qiji-font

Qiji-font (齊伋體) is:

  • A Ming typeface;
  • Extracted from Ming Dynasty woodblock printed books (凌閔刻本);
  • Using semi-automatic computer vision and OCR;
  • Open source;
  • A work in progress;
  • Named in honour of 閔齊伋, 16th century printer;
  • Intended to be used with wenyan-lang, the Classical Chinese programming language.

📢 聲明:敝字體近日頗見流傳於網絡,然皆訛作“‘凌’东齐伋体”。鄙人名令東,字體名齊伋;強欲冠後以前者,亦以“令東齊伋體”為宜,望周知。🤦‍♂️

Netlify Status

Download

Progress

Unique Glyphs Covered Characters* Books Scanned
4569 5916 李長吉歌詩 / 淮南鴻烈解

* Simplified forms fall back to traditional forms, more common traditional variants fall back to less common variant forms.

Workflow

Step I: Download high resolution PDFs (from shuge.org) and split pages into images.

Step II: Manually lay a grid on top of each page to generate bounding boxes for characters (potentially replacable by an automatic corner-detection algorithm).

Step III: Generate a low-poly mask for each character on the grid, and save the thumbnails (using OpenCV). First, red channel is subtracted from the grayscale, in order to clean the annotations printed in red ink. Next, the image is thresholded and fed into contour-tracing algorithm. A metric is then used to discard shapes that are unlikely to be part of the character in interest. (This step does not produce the final glyph, only a quick-and-dirty extraction for intermediate processing.)

Step IV: Feed each thumbnail one by one into neural-net Chinese OCR to recognize the characters (currently using chineseocr/darknet-ocr, low detection rate, mediocre accuracy, very slow on CPU, looking for better alternatives).

Step V: Manually judge output of OCR: pick the best-looking instance of a given character, and flag incorrectly recognized characters.

Step VI: For the final character set, automatically generate fine raster rendering of each character. Each character is placed at its "visual" center by cumulatively counting pixels from left and right, as well as top and bottom, so that the "weight" of the character is on the centerlines, as opposed to centering the bounding box. Two thresholding methods are used, the global threshold is dilated and acts as a mask to the adaptive threshold, thus preserving details while blocking out surrounding boogers.

Step VII: Raster-to-vector tracing software potrace is used to convert the raster rendrings into SVG's. FontForge's python library is used to generate the final font file. Done!

As the number of characters grow, the above procedure is going to be less and less efficient, since new, previously unseen characters obtainable from each book processed are going to be rarer and rarer. An alternative method which involves clicking only on unseen characters to pick them out is under construction.

Known Issues

  • Character sizes are sometimes inconsistent. Undergoing manual tweaking.

Development

Requirements:

  • Python 3
  • OpenCV Python (pip3 install opencv-python)
  • FontForge Python library (included in brew install fontforge)
  • Chinese OCR (e.g. chineseocr/darknet-ocr)
  • Raster-to-vector tracer (e.g. potrace)

The main code is contained in /workflow, and corresponds to the steps described above. Documentation for the code is yet to be done (soon), so feel free to inquire if interested. As you might have noticed, there is a ton of work involved in making a Chinese font, so contribution is very much welcome :)

Charset

Sheet of all unique glyphs sorted by unicode entry point, click to enlarge. (this is lossy JPEG, for full PNG, check here, for SVG, run node workflow/make_sheet.js)

qiji-font's People

Contributors

antfu avatar lingdong- avatar rabbitism avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qiji-font's Issues

Assign alternate unicode values for duplicate glyphs

Assigning multiple Unicode values to each unique glyph image will decrease the size of the font file.

Possibly lines 62-71 of forge_font.py can be rewritten as:

other = other - set(care.values())
if len(other) > 0:
glyph.altuni = [ ord(o) for o in other ]

And I suggest the use of glyph.simplify() to remove redundant points
from each glyph.

誠邀參與丹青古文字開源字型計劃

Huang賢兄尊鑒:

 漢字使用人口甚多,其歷史也甚豐,但在數碼科技年代,漢字知識普及程度仍不算彰顯,各位有志之士仍須努力。就古文字知識而言,經過不同地區學界中的專家、敎授的付出,以及民間依從學術正道的硏究者之努力,現在已經有「小學堂」、「漢語多功能字庫」、「漢字全息資源系統應用」、「中華語文知識庫」、「引得市」等可靠的豐富成果。可是,要正式運用古文字時,上述文字學專題網站都只是提供像素圖片,不利在文書軟件裏編輯縮放,不但無法讓使用者便利地套用,在文書軟件裏排印時,圖片更常常會變得模糊不清。長遠而言,製作成可自由縮放的字型格式檔案,才是一勞永逸的解決方法。

 其實早有字型廠商洞悉這方面需求,推出古文字字型。可是字型廠商的製作者未必具備足夠的文字學知識,字型中所收錄的古文字字形未必可靠。又礙於版權限制,使用者無法自行修改和傳播,結果得物無所用。只有維持字型開源,方便具備文字學知識人士修繕,才可以確保字型的質素。可是熟知文字學的人,很少會熟識程式編寫和造字科技,在能力和時間上無法親手製成完好的字型,這方面不得不依賴熟悉編程的朋友協助。

 有見及此,本人謹在此提出此丹青古文字開源字型計劃,希望上述不同範疇的達人可以跨領域合作。計劃第一步是製作甲骨文字型,若效果良好,還可以製作更多古文字字型。步驟詳述如下:

  1. 由文字學達人挑選字型。圖片建議在「漢語多功能字庫」中挑選。一來因爲此字庫可信,其確認的甲骨文形體,絕大多數都準確無訛,沒有爭議;二來此字庫的甲骨文圖片大多經過臨摹,改善了粗細度和清晰度,避免描邊後在小字號時難以讀看。

  2. 儲存成命名方式統一的圖片。以Unicode編碼爲U+9F52的「齒」字爲例,字庫中列出五幅甲骨文圖片,挑選者應從中選出最典型者,將之儲存作「9F52.png」。其餘形體若有收錄需要,應依照典型次序,分別儲存作「9F52_01.png」、「9F52_02.png」、「9F52_03.png」…等。

  3. 編程達人撰寫自動描邊程式,把「9F52.png」自動描邊成U+9F52的字元。描邊時應保持字元高度和寬度格式統一(例如皆爲1000 Units per em),曲線既不失眞、也保持平滑不起鋸齒,字元的位置適中。至於同一漢字的異體,可由Unicode第15平面的U+F0000起順序儲存,並自動設定OpenType裏跟U+9F52之間的異體關聯,以便在相應程式中選擇或呼叫字形。

  4. 最後由字型美工人員檢査描邊結果,人手修整,以臻完美。

 在下不才,對步驟一、二及四的工作,還可以負責。但對步驟三眞的無能爲力。不知可否邀請 閣下共襄此擧?如蒙應允,不勝感激!

 敬祝
安康!

一點字坊召集人
一郎 敬上
2020年2月18日

笔划缺失或多出字

多出:珍、詠、振、慮、
缺失:端、周、贱、睦、弊、組、魄、
错字:己、
存疑:異、
git

Use zi2zi to patch up as many characters as possible?

It might sound dumb, but zi2zi can reconstruct characters through understanding how characters are written, and how fonts can be constructed. Since this does not include every common Chinese character, would zi2zi a good way of restoring info?

Glyph error: U+5B87 (宇)

I am using qiji-combo.ttf v0.0.1. The glyph U+5B87 (宇) looks like U+5B57 (字).

Demo

The first row is QIJIC, while the second row is Source Han Serif K.

Steps to reproduce:

<!DOCTYPE html>
<html lang="zh-Hant" dir="ltr">
<head>
<title>Test Page</title>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<style>
p {
  font-size: 64pt;
  margin: 0;
}
span {
  display: inline-block;
  text-align: center;
  min-width: 100px;
}
p:first-child {
  font-family: QIJIC;
}
p:first-child > span:first-child {
  font-size: 80pt;
  transform: translateX(-8px);
}
p:first-child > span:last-child {
  font-size: 88pt;
  transform: translateX(-10px) translateY(6px);
}
p:last-child {
  font-family: Source Han Serif K;
  font-language-override: "KOR";
}
</style>
</head>
<body>
<p><span></span><span></span></p>
<p><span></span><span></span></p>
</body>
</html>

The correct glyph should be as that in Guangyun Zecuntang Version (廣韻澤存堂本, Kuangxyonh Drakzuondang puonx):

Cio

Change License to the SIL Open Font License

This is an amazing project!

I am the program manager for Google Fonts and I would love to invite you to include this project in the Google Fonts library.

However, the MIT license for libre software is not ideal for fonts, because of its notice requirements. The SIL Open Font License is designed for the specific and unique use cases of fonts, and is required for Google Fonts.

I'm just as happy to schedule a video call if you'd like to discuss this further, as on this issue :)

纠错+手工补字计划

黄老师您好。很喜欢您的齐伋体,粗略排查一遍发现些错字,在这里给您过目。

字元—显示汉字
㩃 — 𢷘
䈁 — 籥
刊 — 可
婿 — 壻
宇 — 字
宾 — 賨
賓 — 賨
廋 — 庾
怅 — 帳
悵 — 帳
挻 — 挺
斡 — 𠏉
暄 — 喧
晖 — 睴
暉 — 睴
榑 — 搏
桡 — 撓
橈 — 撓
槟 — 𢷤
檳 — 𢷤
欽 — 飲
泲 — 濟
洒 — 酒
滖 — 瀼
熟 — 熱
痺 — 庳
瞻 — 贍
祇 — 袛
祓 — 袚
祗 — 𥿄
祫 — 袷
祲 — 䘲
禅 — 襌
禪 — 襌
秏 — 耗
稃 — 粰
箦 — 蔶
簀 — 蔶
篨 — 蒢
簞 — 簟
簦 — 䔲
粱 — 梁
糾 — 紏
绘 — 繒
繪 — 繒
绛 — 綘
絳 — 綘
肋 — 脇
臒 — 臞
蔐 — 蔏
蛖 — 硥
蠖 — 𧓈
衔 — 䘖
裸 — 祼
讎 — 雙
躐 — 蠟
迥 — 廻
遬 — 遫
鈎 — 鉤
鉤 — 鈎
銜 — 䘖
錀 — 鑰
铙 — 饒
鐃 — 饒
钦 — 飲
雎 — 睢
雠 — 雙
霏 — 霑
飢 — 饑
饑 — 飢
鳌 — 鬚
鰲 — 鬚
鶩 — 騖
鹜 — 騖
———2021.1.29———
痕 — 㾗
蓉 — 芰
庳 — 痺
———2021.2.1———
孑 — 子
———2021.2.17———
蒽 — 蔥

(已经忽略明显的异体字,可能还包含某些异体字而非“错字”。)
肉眼排查,可能还有遗漏。其中有的可能已在新版本中更正。

最近在尝试手工补字,已完成一千余字(主要为简体部分)。
360截图183506236110783
希望2021年内能完成3500常用字。

第一次发issue,如有不足,请见谅。

Some ideas

Wow! I didn't expect to see it get such a high level of completeness and I am very surprised how wide range it already covered. Great job! 👍

I haven't got the build tools ready and run by myself (I am having poor network recently, will definitely try later). Here are some quick thoughts/ideas about this with my superficial view.

  1. Maybe it's better to attach the compiled font file to releases instead of source control. I think it's indeed easier and straightforward for people to find the font on early-stage, but since the Chinese font is huge (14MB currently). It's not very ideal to keep it in the source control.
    image

  2. The font name is not specified. (Untitled1 currently)

  3. Consider compile to different font formats (ttf, otf, woff, woff2, etc) by using fontmake. woff is optimized for the web with smaller file sizes which is useful for web display. Here is the build script of FiraCode for reference.

The characters selecting things look very interesting too! Would play with it and make some feedback when I finished the toolings.

Font family

font.familyname = "QIJIFALLBACK"
font.fontname = "QIJIFALLBACK"
font.fullname= "QIJIFALLBACK"

font.familyname = "QIJIC"
font.fontname = "QIJIC"
font.fullname= "QIJIC"

font.familyname = "QIJI"
font.fontname = "QIJI"
font.fullname= "QIJI"

Should we share them with a common family name while they have different font names and full names? It's something like Helvetica Bold, Helvetica Narrow, Helvetica Conth etc. Users can select the font first then select the variations in GUIs.

U+66FF 「替」 形訛矣!

於子字型之中,「替」 (U+66FF) 乃歸口部。然,口部之「替」者,「𠾱」 (U+20FB1) 也。「𠾱」者,「噆」 (U+5646) 之別寫也。音子感反,嗛也,銜也,同「替」無所似。是以字訛也。究其根本,蓋機器識字昏倦,以致正誤無所辨也。

今試採是「齊伋體」同「新細明體」相較,作一圖幀,恭錄於左:「齊伋體」以首,「新細明體」以繼——冀子孰察之!

字体明显偏小

令东兄,感谢您劳费巨大心力成此齊伋大作。日前应用之时,发现中文部分的齊伋字形明显小于西文部分的思源宋体。请问是何缘故?可否调整?盼复为要。

image

另附霞鹜效果(正文均为13px,标题为1.5rem),作为对比:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.