Giter Club home page Giter Club logo

irg-ws2015's People

Contributors

hfhchan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

irg-ws2015's Issues

Multiple Withdrawn Characters/Glyphs

Multiple Characters/Glyphs were withdrawn in WS 2015 v2 IRGN2155 UK Review, but were not reflected in the Working Set:

  • 00123 UTC-01423: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00130 UTC-01318: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00138 UTC-01391: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00296 UTC-01326: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00345 UTC-01329: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00348 UTC-01330: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00475 UTC-01337: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00524 UTC-01421: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00542 UTC-01338: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00560 UTC-01339: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00561 UTC-01369: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00662 UTC-01342: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00814 UTC-01345: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00827 UTC-01372: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00857 UTC-01353: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00863 UTC-01355: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00866 UTC-01441: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00871 UTC-01354: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00874 UTC-01356: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00898 UTC-01358: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00933 UTC-01437: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00938 UTC-01362: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00953 UTC-01367: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00966 UTC-01368: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00969 UTC-01363: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 00970 UTC-01366: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01152 UTC-01428: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01186 UTC-01349: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01194 UTC-01371: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01319 UTC-01314: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01320 UTC-01373: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01331 UTC-01374: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01336 UTC-01387: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01493 UTC-01390: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01520 UTC-01377: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01533 UTC-01341: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01703 UTC-01442: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01721 UTC-01379: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 01850 UTC-01384: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02187 UTC-01378: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02193 UTC-01386: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02333 UTC-01388: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02387 UTC-01480: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02446 UTC-01400: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02541 UTC-01392: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02563 UTC-01457: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02641 UTC-01399: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02684 UTC-01393: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02843 UTC-01396: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 02936 UTC-01397: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03026 UTC-01398: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review; SC - 21.
  • 03332 UTC-01401: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03349 UTC-01402: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03355 UTC-01403: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03384 UTC-01405: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03487 UTC-01406: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03670 UTC-01404: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03679 UTC-01411: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03774 UTC-01412: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03775 UTC-01413: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03835 UTC-01415: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03969 UTC-01416: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 03981 UTC-01418: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 04685 UTC-01424: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 05061 UTC-01425: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 05094 UTC-01408: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.
  • 05222 UTC-01350: Withdrawn by submitter in WS2015v2 - IRGN2155 UK Review.

Unification of various forms of 丘

It is suggested that all historical variants should be unified to the corresponding normalized form (closest variant form?).

The normalized form should be defined as follows:

  • 丘 (丠/㐀)
  • 虛/虚 (𧆳/虗)

Affected Characters:

  • 04681 GHZR74469.18
    IVD 𨼋

  • 04932 GHZR84846.02
    IVD 駈 / 𩢩

  • 04704 GHZR74481.06
    IVD 04703 GHZR74481.05

Unification of 夸 and 𡗢

𡗢 is a common historical variant form of 夸.

image

Existing Disunified Strict Semantic Variants:
夸 U+5938 = 𡗢 U+215E2
洿 U+6D3F = 𣴰 U+23D30
誇 U+8A87 = 𧧳 U+279F3
跨 U+8DE8 = 𨀗 U+28017
䠸 U+4838 = 𨉀 U+28240
鮬 U+9BAC = 𩶮 U+29DAE

Affected Characters:

  • 00375 USAT09219
    image
    Action Item: Unify (IVD) to 刳 U+5233

  • 01648 USAT06769
    image
    Action Item: Unify (IVD) to 胯 U+80EF

  • 05026 USAT09817
    image
    Action Item: Unify (IVD) to 骻 U+9ABB

  • 03426 USAT06768
    image
    Action Item: Unify (IVD) to 胯 U+80EF (ref: issue #23)

00063 UTC-01316

IRGN2155CommentsToIRGN2107 (Chen Zhixiang)’s Comment
image

Henry’s Comment

  • WITHDRAW, reference IRGN2155CommentsToIRGN2107, OR
  • unify to U+2B85C

UTC’s Comment:
Disagree. Character is attested in two separate sources, and the right-hand side components are not unifiable.

Henry’s Additional Comment:
The value of encoding erroneous transcriptions already identified by the Chinese experts should be justified. E.g. the “A Concordance to Fascicle Three of the Inscriptions from the Yin Ruins” is so academically significant that its error forms should be encoded as is, in similar respect to the Kangxi Dictionary and/or Hanyu Dazidian.

05541 UTC-02614

Henry's Comment
IVD to U+2B733 (冉 ~ 冄)? PRC Conventions prefer 冉 > 冄.

UK's Comment
Disagree. We do not think that 冄 and 冉 are unifiable components, and 《漢語大字典》 has separate entries for both U+4DB2 䶲 and U+2A6AE 𪚮 and the two corresponding simplified forms.

Henry's Comment
The unification of 冉 and 冄 should be considered unifiable for IVD. There are already many exact equivalents encoded in URO, and many more can be encoded.

Examples of exact equivalents:
U+5189 冉 = U+5184 冄
U+67DF 柟 = U+678F 枏
U+8043 聃 = U+803C 耼
U+86BA 蚺 = U+86A6 蚦
U+88A1 袡 = U+887B 衻
U+9AEF 髯 = U+9AE5 髥
U+59CC 姌 = U+36A9 㚩
U+8211 舑 = U+4459 䑙
U+82D2 苒 = U+44A3 䒣
U+279A6 𧦦 = U+46C1 䛁
U+294FF 𩓿 = U+4AC7 䫇
U+5465 呥 = U+20BCD 𠯍
[...]
U+722F 爯 = U+2DDAA 𭶪
U+7A31 稱 = U+2E0CE 𮃎

Action Item
Unify via IVD.

Unification of 冊 and 𠕋

There are various ways to which 冊 can be written but do not result in any etymological difference.

Existing UCV Rules
image

Proposed UCV Rule
image

The shape difference between 冊 and 𠕋 could be considered rather "large" for inter-locale unification. Thus, this unification could be restricted to IVD only.

Affected Character:
02460 T13-2D7C
image

Unification of 取 and ⿰耳𡿨

Henry's Comment
取 and ⿰耳𡿨 should be unified because they are common systematic variations in the past.

Affected Characters in WS2015:

  • 03991 GHZR63859.13
    image
    ghz63859 14
    Action: Unify to 䝒 (U+4752)

  • 04966 GHZR84879.02
    image
    ghz84879 02
    Action: Unify to 驟 (U+9A5F) (i.e. confirm suggestion in IRG#46)

Unification of 幸, 㚔 and 羍

In UCS, there are numerous examples of 幸, 㚔 and 羍 disunified.
There are three different etymologies for characters that contain 幸.

(1) U+3694 㚔 (niè, handcuffs).
Examples of characters include (usually as a semantic component):

  • U+57F7 執 / U+2163A 𡘺 / U+21655 𡙕 / U+2065C 𠙜 / U+26383 𦎃
  • U+5831 報 / U+21648 𡙈
  • U+776A 睪 / U+251E1 𥇡
  • U+25216 𥈖
  • U+23582 𣖂
  • U+20DBF 𠶿
  • U+2676F 𦝯
  • U+260A1 𦂡
  • U+26051 𦁑
    (etc)

(2) U+21D18 𡴘 (xìng, fortune)
Examples of characters include (usually as a phonetic component):

  • U+200B7 𠂷 (alternative transcription) of 𡴘
  • U+5548 啈 / U+20D43 𠵃
  • U+6DAC 涬 / U+23DDF 𣷟
  • U+46ED 䛭 / U+27A2B 𧨫
  • U+7DC8 緈 / U+2609C 𦂜 / U+260C9 𦃉
    (etc)

(3) U+7F8D 羍 (dá, small sheep)
Examples of characters include (usually as a phonetic component):

  • U+5548 啈 / U+20D43 𠵃
  • U+9054 達 / U+9039 逹

From the above examples, it can be shown that the shape difference between 幸, 㚔 and 羍 is not generally representative of a systematic semantic difference in Kaishu. In the case of 啈/𠵃/𠶿, the dictionary meaning/pronunciation of the characters is actually opposite to the normative meaning of its phonetic/semantic symbol.

Actually, these three forms were never really distinguished in handwriting. It would be distinguished by context. There is no need for multiple variants of the same character to be encoded. Trying to map every single variant into a dictionary into UCS would only cause confusion to the exact semantic meaning. IVD can be used for the preservation of exact shape.

The following characters are the semantically equivalent to corresponding encoded ideographs, and thus should be unified in WS2015:

  • 00682 UTC-01451
    Semantic Origin: 㚔 - handcuffs
    UNIFY TO U+5709 圉

  • 00195 USAT09927
    Semantic Origin: 𡴘 (phonetic)
    UNIFY TO U+5016 倖

  • 00882 USAT09928
    Semantic Origin: 𡴘 (phonetic)
    UNIFY TO U+5A5E 婞

  • 03375 GHZR52968.20
    Semantic Origin: 㚔 (semantic of phonetic)
    UNIFY TO U+26525 𦔥

  • 03469 GHZR42270.04
    Semantic Origin: 㚔 (semantic of phonetic)
    UNIFY TO U+443E 䐾

  • 02677 T13-2E70
    Semantic Origin: 㚔 (semantic)
    UNIFY TO U+24FF9 𤿹

00198 UTC-01573

Henry’s Comment:
00198 UTC-01573 = 𠈇/𠉦, corrupted form of 𠈇U+20207 /𠉦U+20266

UK’s Comment:
Disagree. UTC-01573 is a variant form of U+5BBF 宿, but U+20207 𠈇 is a variant form of U+5919 夙, so they are different characters and cannot be unified.

Henry’s Further Comments:
夙 is often exchanged with 宿 in old Hanzi. Furthermore, the phonetic of 宿 is 𠈇 (stricter transcription -- 𠉦). The fact that the source says 00198 should be read as “U+5BBF 宿” does not necessarily contradict that 00198 can be unified with 𠈇U+20207 /𠉦U+20266.

Refer to the following source by Chinese University of Hong Kong (http://humanum.arts.cuhk.edu.hk/Lexis/lexi-mf/search.php?word=%E5%AE%BF):
image

Please note that the presence of the 宀 does not affect its meeting; the form with or without is found in Oracle Bone evidence and also Jianbo Wenzi.

Given that,
(1) 𠈇U+20207 and 𠉦 U+20266 are variants of U+5BBF 宿 (per CUHK source)
(2) U+5BBF 宿 is variant with U+5919 夙 (per CUHK source)
(3) U+5919 夙 is variant of 𠈇U+20207 and 𠉦U+20266 (provided by UK)
(4) 00198 is variant of U+5BBF 宿 (provided by UK)
(5) 00198 is virtually indistinguishable with U+20266 𠉦, (and very likely referring to the same Oracle Bone/Old Hanzi glyph)

Action Item
Unify with U+20266 𠉦.

01416 UTC-02632

WS2015 v.3 Discussion Record
unified by U+2BF4A (GZFY-00688) for IVS, irg47.
image

Henry's Comment
unified with U+632C for IVS, irg47. NOT unified with U+2BF4A (GZFY-00688).

UK Response
Disagree. It is a non-unifiable variant of U+632C, and should be encoded separately. As reported in IRGN2108Andrew_WG2N4682.pdf, the glyph form of U+2BF4A is incorrect, and should be corrected to ⿰
扌学, so UTC-02632 cannot be unified with U+2BF4A.

Henry's Additional Comment
According to my meeting notes, the resolution was unification with U+632C for IVS instead of U+2BF4A (GZFY-00688).

--
Additional Info:

image

u 632c

03798 UTC-01941

Henry's Comment
03798 UTC-01941 UNIFY to 褝 (U+891D)

UTC, HK Comment
Unify with U+891D 褝

UK's Comment
Disagree. We think that the unification of the components 単 单 for U+7985 禅 was a mistake, and causes problems for users when a default font shows an unacceptable glyph form. The G-source glyph for U+891D has 単 on the right, so font developers will follow this glyph form when designing fonts for PRC, but the G-source glyph form for U+891D is unacceptable as the simplified form of U+891D 襌.
Therefore we strongly think it will serve users best to encode UTC-01941 (⿰衤单) as a separate character.

Henry's Comment
The above problem will occur for U+20219 𠈙 and U+2548E 𥒎 also, even though they occur in Extension B. However, China has agreed to correct multiple erroneous glyphs in GE standard in IRGN 2170 (involving U+8669 and U+3B9D). Therefore, the rejection to correct U+891D in line with China's normalized glyphs should not be accepted.

Action Item
Unify

"One-off Corruptions"

Owing to IRGN2211 Section B Item 3 “One-off corruptions found on tombstone carvings”, the following characters should be rejected (or unified):

SN / Source / Treatment / Reason
02854 T13-2F48 IVD碑 碑別字新編
02286 T13-2D55 IVD 燦 碑別字新編
02270 TE-6F6B IVD 瞧 廣碑別字
02246 T13-2D4B IVD 照 偏類碑別字
02821 T13-2F42 IVD 穎 碑別字新編
02812 T13-2F3F IVD 智 碑別字新編
02804 T13-2F3D IVD 矢 碑別字新編
02734 T13-2F29 IVD 旹 廣碑別字
02750 T13-2F2F IVD 督 碑別字新編
02154 T13-2D32 IVD 灮 廣碑別字
02088 T13-2D27 IVD 澡 碑別字新編
02086 T13-2D24 IVD 㴱 碑別字新編
04289 T13-3138 IVD 溯 廣碑別字
04299 T13-313C IVD 逮 廣碑別字
02073 T13-2D25 IVD 淄 廣碑別字
02574 T13-2E49 IVD 暴 偏類碑別字
02061 T13-2D21 IVD 潰 偏類碑別字
01948 T13-2C54 IVD 步 碑別字新編
02564 T13-2E48 IVD 當 廣碑別字
02557 T13-2E42 IVD 星 碑別字新編
01931 T13-2C4C IVD氤 碑別字新編
01933 T13-2C4D IVD氤 碑別字新編
01934 T13-2C4E IVD氤 碑別字新編
01929 T13-2C4A IVD 氣 偏類碑別字
02448 T13-2D79 IVD 玉 偏類碑別字
02453 T13-2D7A IVD珍 偏類碑別字
02437 T13-2D76 IVD 敵 碑別字新編
04003 T13-3128 IVD 敗 偏類碑別字
03968 T13-3124 IVD 短 碑別字新編
03803 T13-3075 IVD 福 碑別字新編
03777 T13-3074 IVD 礼 碑別字新編
03490 T13-304F IVD 歸 偏類碑別字
03594 T13-3063 IVD 𤼲 廣碑別字
03593 T13-3062 IVD 𤼲 偏類碑別字
03367 T13-3044 IVD 孝 廣碑別字
03522 T13-3054 IVD 朝 偏類碑別字
01949 T13-2C53 IVD沉 廣碑別字
02642 T13-2E5F IVD 癸 碑別字新編, missing stroke
02644 T13-2E60 IVD 癸 碑別字新編, protruding stroke

03610 T13-3068 IVD 發/彂 金石文字辨異
02036 T13-2C71 IVD 淑 金石文字辨異
04000 T13-3129 IVD 真 金石文字辨異
02845 T13-2F47 IVD 碑 金石文字辨異
02225 T13-2D45 IVD 婆 金石文字辨異
02097 T13-2D2B IVD 演 金石文字辨異
02598 T13-2E51 IVD 病 金石文字辨異

00069 UTC-02765

Henry's Comment
= 𠘧 (U+20627) (variant, protrousion of strokes.)

UK's Comment
Disagree. Non-cognate, and stroke variation is significant.

Henry's Comments
Below is UK's evidence of 00069:
image

Below is HYDZD evidence of 𠘧 (U+20627):
image

First, the pronunciation are the same.
Second, the second meaning of 00069 is the same as the strict meaning of U+20627.
Third, the first meaning of 00069 is used as a grammatical suffix to show extent. Very likely, it is a character borrowed for its sound.
Lastly, the top left part shape of 00069 matches the Shuowen shape of U+20627. 00069 is likely simply another transcription of the same character.

It is likely they are the same character.

Action Item
Unify or Postpone.

Zhuang Character Normalization Issues

There are multiple normalization issues with the Zhuang characters submitted by the Guangxi University. Such as, 橫 should always be changed to 提 on the left side, but they are not in the Zhuang characters. In many cases, the evidence submitted is in the correct normalized form, but the font provided by Guangxi University is not.
Once they are coded, it is very troublesome to change the representative glyph. Therefore, it is suggested that Guangxi University normalize the Zhuang characters properly before their submission.

01594 G_Z3561201: left side 星 does not follow PRC conventions - should be a 提 not 橫
01618 G_Z3551104: left side 星 does not follow PRC conventions - should be a 提 not 橫
04883 G_Z0721301: Does not match PRC conventions. Compare with 養.
02445 G_Z1402302: Does not match PRC conventions. Second stroke of 馬 should not be joined with the 6th stroke.
05239 G_Z0211201: Does not match PRC conventions, last stroke of left component should be 點
03354 G_Z2231201: Does not match PRC conventions, last stroke of left component should be 點
00315 G_Z3842301: Does not match PRC Conventions, last stroke of left component should be 點
04621 G_Z2382304: Does not match PRC Conventions: The third stroke of 犬 should be 點, not 捺.
00065 G_Z1652501: Does not match PRC Conventions: The last stroke of 及 should be 點, not 捺; or the structure should be changed to enclosure.
00264 G_Z4291302: Does not match PRC Conventions: Right hand side should be 尨 (⿷尤彡).
00523 G_Z2042303: Does not match PRC Conventions: last stroke of top left component should be 點.
00629 G_Z2302202: Does not match PRC Conventions: last stroke of top left component should be 點.
00534 G_Z1592101: Does not match PRC Conventions: last stroke of left component should be 點.
00536 G_Z0811201: Does not match PRC Conventions: last stroke of left component should be 提.
00659 G_Z1831401: Does not match PRC Conventions: last stroke of top left component should be 點.
01147 G_PGLG2017 doesn't match PRC conventions, last stroke of left component should be 點.
03527 G_Z2181407 doesn't match PRC conventions; last stroke of left component should be 提.
01149 G_Z3112502 doesn't match PRC conventions, last stroke of left component should be 點.
01150 G_Z0431401 doesn't match PRC conventions, last stroke of left component should be 點.
03665 G_Z1501101 doesn't match PRC conventions, last stroke of left component should be 提.
03805 G_Z1202503 doesn't match PRC conventions, last stroke of left component should be 點.
03961 G_Z2582201 doesn't match PRC conventions, fourth stroke of left component should be 點.
03974 G_Z1412404 doesn't match PRC conventions, last stroke of left component should be 提.
05311 G_PGLG3052 doesn't match PRC conventions, fifth stroke of left component should be 點.
02577 G_Z1432204 doesn't match PRC conventions, last stroke of left component should be 點.
02581 G_Z2782104 doesn't match PRC conventions, last stroke of left component should be 點.
02328 G_Z1602601 doesn't match PRC conventions, last stroke of left component should be 點.
03144 G_Z3651201 doesn't match PRC conventions, last stroke of left component should be 點.

00597 UTC-02810

image

UTC's Comment
Unify with U+2D227 𭈧
image

UK's Comment
Agree.

Henry's Comment
Disagree. 叁 and 参 are cognate but have different semantics in modern day. The phonetic of U+2D227 should be confirmed to be 参 instead of 叁 before unification.

Action Item
Disunify or Postpone.

05197 UTC-02557

screen shot 2017-06-13 at 17 31 41

Henry's Comment
Does not match PRC conventions: right hand side should be normalized to 恒; unify to 05198.

UK Response
Disagree. Hanyu Dazidian has separate entries for both characters.

Henry's Response
恒 and 恆 are unifyable.

04035 UTC-01167

04035 UTC-01167

image

Henry’s Comment:
04035 UTC-01167: more evidence? (賔=賓) no ⿰貝賓 exists yet.

UTC’s Comment:
The UTC does not agree. The supplied evidence is clear.

Henry’s Additional Comments:
賔 is a strict transcription of the character which is more popularly written as “賓” in modern times. The evidence from Grammata Serica Recensa shows only the transcripted form and not the original Oracle Bone or Bronze or Seal Script:
image

This transcription has caught my attention because no character composed of ⿰貝賓 exists yet – so it is possible that UTC-01167 is an incorrect transcription of a certain character, or the character has completely vanished in medieval and modern usage so no modern transcription exists.

If possible, the original source that this transcripted character was based on should be provided, so the transcription can be verified.

Nevertheless, given the historical significance of Grammata Serica Recensa, unless the original oracle/bronze/seal sources show clear evidence that this character has been horribly mis-transcripted, the character in its current form is still worth encoding.

Action Item
Postpone or Keep

02315 T13-2D5B

image

The left hand side is not Claw 爪 but 心 (忄). 悑 (U+6091) is already encoded.

According to Kangxi, 悑 is the variant of 怖, which is synonymous with 懼.

Action Item:
Unify/IVD to U+6091.

Unification of 肉 and ⺼

There are a large number of characters where the U+2EBC has been replaced with the full 肉 radical. IRG should consider to allow encoding these characters as IVD as there are 500+ characters with U+2EBC. Variants using 肉 on the left are sufficiently rare in modern usage.
It is suggested that only IVD if 肉 on left; disunify if 肉 is at the bottom. Zhuang characters in particular should be studied on a case-by-case basis because 肉 may play a phonetic part instead of semantic part. Those with 肉 on the bottom have been included for completeness.

00282 USAT08988 IVD 䏌: 肉 vs U+2EBC
02178 USAT08962 IVD 炙
02670 T13-2E6E IVD 䏢
03390 USAT06462 IVD 肊: text indicates 乙 as phonetic; 肉 vs U+2EBC
03392 USAT09914 IVD 肋: 肉 vs U+2EBC
03393 USAT06266 IVD 肙: 肉 vs U+2EBC
03395 USAT08303 IVD 𦘺: 肉 vs U+2EBC
03397 USAT10232 IVD 肧: 肉 vs U+2EBC
03403 USAT05646 IVD 肴: 肉 vs U+2EBC, top 爻 (ref: 希~𢁫 // 04335)
03405 USAT90295 IVD 股/肢: 肉 vs U+2EBC
03407 USAT08919 IVD 育: 肉 vs U+2EBC
03411 USAT06030 IVD 背: 肉 vs U+2EBC
03414 USAT08746 IVD 䏣: 肉 vs U+2EBC
03416 USAT90297 IVD 胜: 肉 vs U+2EBC
03419 USAT06375 IVD 肺: 肉 vs U+2EBC; IDS: ⿰肉巿
03420 USAT10231 IVD 胎: 肉 vs U+2EBC
03422 USAT06361 IVD 胷: 肉 vs U+2EBC
03423 USAT08968 IVD 脃: 肉 vs U+2EBC
03426 USAT06768 IVD 胯: 肉 vs U+2EBC; (𡗢 ~ 夸 // 誇 ~ 𧧳 // 跨 ~ 𨀗 // 䠸 ~ 𨉀 // 鮬 ~ 𩶮 // 洿 ~ 𣴰);SC - 6
03437 USAT07202 IVD 腴: 肉 vs U+2EBC; SC = 9
03463 GHZR10093.03 IVD 𦠒
03470 USAT90298 IVD 臗

03800 UTC-01942

03800 UTC-01942

UTC, HK Comments
Unify with U+2B304 𫌄
image

UK Comment
Agree.

Henry's Comment
Disagree.

The pronunciation of U+2B304 𫌄 is given to be tươm, 叁 to be tam, and 參 to be tham/sam/sâm/khươm on the Nom Foundation Nom Lookup Tool. It is hightly probable that the phonetic of U+2B304 𫌄 is 叁 instead of 參.

Before the phonetic of U+2B304 𫌄 can be truly confirmed, U+2B304 and 03800 should not be unified.

Action Item
Postpone or Disunify.

00765 UTC-01217, 00777 UTC-01219

00765 UTC-01217
image

00777 UTC-01219
image

Henry’s Comment:
00765 UTC-01217 UNIFY WITH 00777 UTC-01219 (keep 00777).

UTC’s Comment:
IDSes are ⿰土⿱𥃭木 and ⿰土⿱直木. The UTC does not agree.

Henry’s Additional Comments:
The sources provided by UTC are as follows:
image
image

Given the evidence provided by the UTC, it is rather obvious from the context that they should be referring to the same person, and thus be the same character.

A Wikipedia article shows that according to 《明實錄》, the name of the person is 志㙞. If this can be confirmed by any expert familiar with the members of the Ming dynasty imperial family, then the two corrupted forms (UTC-01217, UTC-01219) should be unified to 㙞 (U+365E) (via IVD).

《明實錄》 can be found on **研究所 歷史語言研究所 明實錄、朝鮮王朝實錄、清實錄資料庫:
image

Action Item
Postpone or Withdraw

Unification of 耎 and 䎡

䎡 is a common variant of 耎.

Encoded Characters with IDS containing 䎡:
U+43A1 䎡 = U+800E 耎
U+24322 𤌢 = U+7157 煗
U+24B81 𤮁 = U+3F32 㼲
U+25C47 𥱇 = U+25BEC 𥯬
U+273E6 𧏦 = U+8761 蝡
U+28AB3 𨪳 = U+28A30 𨨰
U+28EE2 𨻢 = U+967E 陾
U+29C4A 𩱊 = U+29C44 𩱄
U+2C8BD 𬢽 (UNKNOWN ORIGIN - JK-65739)

Affected Characters:

  • 01592 GHZR31640.06 IVD 㬉 (耎 ~ 䎡)

  • 03004 GHZR52807.16 IVD 稬 (耎 ~ 䎡)

  • 03447 GHZR42254.03 IVD 腝 (耎 ~ 䎡)

  • 00208 USAT09305 IVD 偄 (耎 ~ 䎡)

  • 01275 GHZR42502.05
    image
    image
    Action: Withdraw / IVD 愞. (䙳 on the right hand side is an error form of 䎡. 䙳 = 票)

04924 T13-314F

image

The glyph image and the IDS/SC submitted by TCA is not an accurate transcription of the evidence:
image

艺 is a character invented as a simplified glyph of 藝. The right hand side is not 艺 but 𠃟.

Thus, it should be transcripted to ⿰馬𠃟.

𠃟 is an alternative transcription of 也:

image

Action Item

  • Correct glyph to ⿰馬𠃟.
  • Unify (IVD) to U+99B3 馳.

02179 UTC-02651

Henry's Comment
02179 UTC-02651 = 𤇆 (U+241C6) / 烟 (因 ~ 囙 -- 第一批异体字整理表)

Japan's Comment
Unify with U+241C6 𤇆

UK's Comment
Disagree. We do not believe that 回 and 囙 are unifiable components.

Henry's Comment
It is common for the middle of 回 to be written as 囙 in print, such as:
u 91c1
although the reverse is rather uncommon.

In the evidence provided, it is given that 02179 is a variant of 烟:
image

To also quote from MOE Dictionary (http://dict2.variants.moe.edu.tw/variants/rbt/word_attribute.rbt?quote_code=QTAyNDIwLTAwMQ):
image

𤇆 is a variant of 烟 and that the pair 因 / 囙 is included in 《第一批异体字整理表》. Therefore, the equivalence relationship between 02179 and 𤇆 is beyond reasonable doubt.

Suggested Action Item
Unify/IVD

03555 UTC-01950

WS2015v3.0 Discussion Record
IRGN2179PostponedV3.0
pending for solutions (not unified by U+82B2, G source of U+82B2 may be changed), irg47.
unified by U+82B2, irg46.

Henry's Comment

  • U+82B2 is used in 第一批异体字整理表 and 《通用規範漢字表》異體字 to mean 花;
    Keep G-source of U+82B2 unchanged.
  • 03555 should be separately encoded.

UK Response
Strongly agree!

Action Item
Suggested to annotate the Code Charts with the correct semantics
image
image

Character Stroke Count for 丽

丽 is present in the Kangxi Dictionary with a SC = 7. The radical is Dot (丶). Therefore, the total SC should be 8.

Iit is also counted as total SC = 8 when on top of 鹿
image

However, U+4E3D in the Code Charts has an SC = 6 (total SC = 7)
U+4E3D in the Code Charts

Affected Characters as follows:

  • 02746 UTC-01877
    image
  • 04094 UTC-02120
    image
  • 00376 UTC-01690
    image
  • 03824 UTC-02112
    image
  • 03793 UTC-01940
    image
  • 01398 UTC-01809
    image

Action Item
Add 丽 to IRGN954AR with total SC of 8.

Unification of 犮 and 叐

image
image

Henry’s Comment:
00698 UTC-01204 UNIFY TO 坺 U+577A
00861 USAT90292 UNIFY TO 妭 U+59AD

UTC’s Comment:
IDS is ⿰土叐. The UTC does not agree.

Henry’s Supplementary Information:
叐 and 犮 are variant forms of the same component*. In another version of 弇山堂別集卷三十三, the character 坺 (U+577A) is used instead:

image
Source: http://ctext.org/library.pl?if=gb&file=58118&page=103, which is 《欽定四庫全書》本·史部五·雜史類。

*: Referencing variants of拔 (U+62D4) from the MOE Dictionary:
image
image
image

The source quoted by MOE Dictionary has been omitted for brevity.

A list of coded characters containing 叐 are listed as follows:

  • U+53D0 叐: Kanxi / Hanyu Dazidian: variant of 犮
  • U+39DE 㧞: variant of 拔based on evidence of 拔 in MOE Dictionary
  • U+47E6 䟦: variant of 跋 based on evidence of 跋 in MOE Dictionary
  • U+2209B 𢂛: unverifiable; G4K-sourced character; original sources could not be found
  • U+2342A 𣐪: TF-sourced character; pronunciation same as 柭
  • U+24923 𤤣: TF-sourced character; pronunciation same as 𤤒
  • U+2595C 𥥜: variant of 突 according to Hanyu Dazidian
  • U+25FC8 𥿈: variant of 𥿈 @ MOE Dictionary, HYD and Koseki
  • U+26B5E 𦭞: variant of 𦭞 @ HYD
  • U+296BF 𩚿: variant of 飫 @ MOE Dictionary, HYD and Koseki
  • U+2989A 𩢚: variant of 䮂 @ KX, MOE, HYD, Koseki
  • U+2C4B2 image : TC-sourced character; pronunciation same as 祓
  • U+2CAC6 image : TD-sourced character; pronunciation same as 鈸
  • U+2CF74 image : variant of 伏 according to SAT Database
  • U+2D71E image : used in name of person in SAT database or as a phonetic transcription; meaning could not be verified.
  • U+2D805: variant of 戾 according to Mojikiban
    image
  • U+2DF7D: variant of 鉢 according to SAT Database, which is variant of 盋 according to PRC 第一批异体字整理表, Koseki & HYD
    image
  • U+2E0BA: variant of 秡 according to Mojikiban “読み・字形による類推” with remark “地名外字”
    image
  • U+2E2DF: variant of 黻 according to footnote in SAT Database.
    image
  • U+2E3DF: variant of 沷 according to footnote in SAT Database. The next character was 若, so a possible phenomenon of a “類化”
    image

To summarize, in majority of cases, 叐 component is a strict variant of 犮. In 2 cases it is equivalent to 犬, which is a property also shared by 犮 itself. In 2 cases the source of this character could not be verified, and in 1 case there were multiple sources to show that it was a “corruption” of another similar shaped component.

Therefore I think there is enough evidence to regard叐 as unifiable to 犮.

In fact, this rule is regarded as a normalization by ROK in IRGN2154:
image

Conclusion:

  • 00698 UTC-01204 unify/IVD to 坺 U+577A
    image
  • 00861 USAT90292 unify/IVD to 妭 U+59AD
    image

00346 G_Z1841301

image
The 几 (U+51E0) is the phonetic, not 𠘧 (U+20627). The character 00346 cannot be changed to use 𠘧 (U+20627). It should use 几 (with hook).
The presence or absence of the hook is not a location variant form issue. Refer to U+28972 where the hook is present even when the component is situated at the top:
image

Zhuang Character Normalization Issue (Components)

The following submitted Zhuang characters do not use components that are considered standard by PRC. PRC may wish to say that Zhuang characters are not expected to be normalized. However, Ideographs used by Han languages and dialects may use the same character, if such evidence is discovered, then those characters will likely be unified (or IVD) with the Zhuang characters. Then, the existing Zhuang glyghs will need to be modified to use the PRC normalized form, creating lots of problems for existing fonts.
Therefore, it is wise to normalize the glyphs to use the PRC normalized form first. If the explicit form in the dictionary wish to be preserved, IVD should be registered after the normalized form is encoded.

  • 00980 G_Z3951603
    Consider normalize the right hand side to 兒 (U+5152) instead of 𫤘 (U+2B918).
    image

  • 00224 G_Z2281101
    Consider normalize to 觉 as 覚 is not a normalized form.
    image

  • 02271 G_Z2302301
    Consider normalize to 觉 as 覚 is not a normalized form.
    image

  • 04111 G_Z1231301
    The phonetic of the character is 忝 (tim1). It should be normalized to 忝.
    image
    The confusion of 氺 and 心 for 忝 is common:
    image
    image

  • 00739 G_Z1191201
    same as above; then unify to 𡍞 (U+2135E)
    image

  • 03110 G_Z4442201
    斉 is not the PRC normative form. Consider normalize to 齐.
    image

  • 03187 G_Z0671501
    埀 is not PRC normative form. Consider normalize to 垂.
    image

  • 05030 G_Z3621104
    悪 is not the normative form. Consider normalize to 惡.
    悪 and 惡 are unifyable, IRG#47
    image

  • 03008 G_Z3271501
    宻 is not the PRC normative form. Consider normalize to 密.
    image
    Please also refer to IRGN2154 ROK Normalization Rule 5-1:
    image

  • 03140 G_Z4491201
    𪮫 is not the PRC normative form. Consider normalize to 撒.
    image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.