<a href="https://freetype-py.readthedocs.io/en/latest/face.html#freetype.Face.postscri

Maybe Face.post_name.decode("utf-8") ?

Read the post reference manual . Where is it available? <p dir="

I would also want to know how can we decode sfnt name: <div class="highlight highl

For SNFT name, see <a href="https://freetype.org/freetype2/docs/reference/ft2-sfnt_nam

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Extracted from the freetype doc - <div class="snippet-clipboard-content notranslat

[Question] How to decode Face.postscript_name? about freetype-py HOT 26 CLOSED

moi15moi commented on June 16, 2024

[Question] How to decode Face.postscript_name?

from freetype-py.

Comments (26)

rougier commented on June 16, 2024

Maybe Face.postscript_name.decode("utf-8")?

from freetype-py.

moi15moi commented on June 16, 2024

A postscript name could be in any encoding, so that's not a good idea to always take utf-8.

from freetype-py.

HinTak commented on June 16, 2024

Postscript names containing non-ascii's should be escaped the postscript way (hex with prefix)? Read the postscript reference manual.

from freetype-py.

moi15moi commented on June 16, 2024

Read the postscript reference manual.
Where is it available?

Or, is there any way to get the name of the postscript_name? With the name, i could easily convert it to string.

from freetype-py.

moi15moi commented on June 16, 2024

I would also want to know how can we decode sfnt name:

import freetype
face = freetype.Face("F5AJJI3A.TTF")

for i in range(face.sfnt_name_count):
    name = face.get_sfnt_name(i)
    print(name.string) # can return bytes

Here is a font example: https://mega.nz/file/S9ERDRpQ#bcPhS06kv-D5jt64aTNDbZVd6gZr6ZfJDYT91yYsoWk

from freetype-py.

rougier commented on June 16, 2024

For SNFT name, see https://freetype.org/freetype2/docs/reference/ft2-sfnt_names.html
For Postscript_name, see https://freetype.org/freetype2/docs/reference/ft2-base_interface.html#ft_get_postscript_name=

from freetype-py.

HinTak commented on June 16, 2024

The postscript name is in plain ascii, the SNFT name is in SJIS encoding - the combination of platform/encoding/language id's said so. You need to call one of the python decoding function to decode bytes as sjis encoding.

The postscript name is Fj-Ima310, the SNFT name should decode to "Fjイーマ310" from "Fj\x83C\x81[\83}310"

from freetype-py.

HinTak commented on June 16, 2024

In your code above, you need to read also "name.platform_id", encoding_id and language_id , before deciding how to decode name.string in general.

from freetype-py.

HinTak commented on June 16, 2024

>>> name = face.get_sfnt_name(1)
>>> print((name.string).decode("sjis"))
Fjイーマ310
>>> print(name.encoding_id)
0
>>> print(name.language_id)
11
>>> print(name.platform_id)
1

1,0,11 is Japanese SJIS. There is a table linked in the https://freetype.org/freetype2/docs/reference/ft2-sfnt_names.html which tells you what (platform, encoding, language)= (1,0,11) means. You basically needs to check it is (1,0,11) to set "sjis" in the decode argument.

from freetype-py.

HinTak commented on June 16, 2024

Extracted from the freetype doc -

#define TT_PLATFORM_MACINTOSH      1
#define TT_MAC_ID_ROMAN                 0
#define TT_MAC_LANGID_JAPANESE                     11

from freetype-py.

HinTak commented on June 16, 2024

Some of the other entries look broken, in this font.

>>> name = face.get_sfnt_name(8)
>>> print(name.platform_id)
3
>>> print(name.encoding_id)
2
>>> print(hex(name.language_id))
0x411

#define TT_PLATFORM_MICROSOFT      3
#define TT_MS_ID_SJIS                             2
#define TT_MS_LANGID_JAPANESE_JAPAN                    0x0411

This suggests it is in SJIS too. However, it won't decode as sjis, but needed to be decoded as utf-16-be:

>>> print(name.string.decode("sjis"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0xfc in position 7: illegal multibyte sequence
>>> print(name.string.decode("utf-16-be"))
Fjイーマ310
>>>

ie. this font is slightly broken in some of its sfnt names.

from freetype-py.

moi15moi commented on June 16, 2024

I know that the encoding will depend on the platform_id and encoding_id (if platform_id = 0, it also depend on language_id).

My problem is that I don't understand how I can get the right encoding from these parameter. Which method should I call? In your example, you harcoded sjis.

But, if I remove all the sfnt name except the one from the platformID 3, I can't decode it. Still, I get the correct name with windows and libass which use freetype: https://mega.nz/file/Oo9ygbJA#7Ri7rlZ0oCxS6slXtfxIpP_VJa1HE6h24PcRRtGDr0E

from freetype-py.

HinTak commented on June 16, 2024

Freetype itself does not care about text encodings. It is a library about any arbitrary mapping between text of any (sometimes localized, and sometimes even custom, like a private collections of symbols) encoding to shape. There is nothing inside freetype to call.

There are a few combinations of platform/encoding/language where it means unicode (the newer common standard). For the rest, it means the corresponding localised encoding, back in the 1980's, before unicode. Japan used SJIS, and still do.

As I said, this particular font is buggy in the (3,2,0x411) strings. 3,2,0x411 is japanese and sjis, but the bytes are in utf-16-be, wrongly.

There is no quick/fast way of setting the decoding parameter - given there are about 10+ common localised encodings (cjk is 4 already, simplified chinese = gb18030 vs traditional = big5). The logic is fairly messy:

If (combo = one of the unicode ones )
Do unicode
Else if (combo is one of lang1)
Do lang1
Else if (combo is one of lang 2)
... etc

As I write a 3rd time now, this particular font is buggy for its (3,2,0x411) name strings. Anyway, you can do "ftdump -n ..." on most fonts and ftdump (one of freetype2-demo programs written by the freetype people, to demonstrate freetype api's... available on most Linux platforms, and buildable for windows too) will try to decode all the strings to unicode or / hex for you. The actual decoding routine is "Print_Sfnt_Names" is about ~110 lines (total is ~1400, so about 10% of it!), from about line 340 onwards, and it is not a pretty thing: it is a few large and nested "switch (x_id) case:... " .

Considering even the freetype people needs to write 110 lines of C code to demonstrate how to decode the sfnt names, and it only converts the utf-16-be ones, and do nothing for the others. Utf-16-be is special: it is the native encoding of the first apple mac in 1980's, when truetype was created.That's your answer - you need to copy that 110 lines of C code, convert that to python, adds a few lines to decode arbitrary names for arbitrary fonts, if that's your goal.

I'll write a 4th time: this particular font is buggy (ie. Off-spec) in the names department. Don't use it for testing your code in this area.

from freetype-py.

HinTak commented on June 16, 2024

I have just been reminded in #156 that in our example directory, there is a python version of ftdump.py : https://github.com/rougier/freetype-py/blob/master/examples/ftdump.py - you can see the piles of "if ... elif ..." for the name decoding part.

from freetype-py.

moi15moi commented on June 16, 2024

Ok, thank you.

The font is not really "buggy", but it is a special case. With a modified version of fonttools, i can decode everything correctly.

from freetype-py.

HinTak commented on June 16, 2024

The font is buggy. The platform/encoding/language tags for the sfnt names don't reflect their encoding correctly. Maybe it is not seriously buggy, but buggy nonetheless. If fonttools shows every strings in human readable form (more than "ftdump -n " is able to show), then it is behaving in a friendly though off-spec (ie buggy) manner.

from freetype-py.

moi15moi commented on June 16, 2024

Since it is wrote in the documentation of freetype and adobe that postscript name should only contain ascii character, this seems to be a solution:

if font.postscript_name is not None:
    try:
        decoded_postscript_name = font.postscript_name.decode("ASCII")
    except UnicodeDecodeError:
        print("The font you specified contain an invalid postscript name")

from freetype-py.

HinTak commented on June 16, 2024

It is a work-around: font designers / font editing software do all sort of things , until the community (font creators and font consuming techs) reaches concensus about what is good and what to avoid, and the spec gets updated to reflect concensus . Often old buggy fonts, which are sufficiently useful nonetheless, do not get updated.

I think "contains ascii only" is a "recommendation". Many fonts were created with non-ascii names (for non-english markets, like in this case, Japanese) before it was stated as a poor practice.

from freetype-py.

HinTak commented on June 16, 2024

The postscript reference manual is freely downloadable for Adobe. I am not quite sure about what you are asking now. If it is an encoding issue, it is as I said, the correct way is in the reference; if it is a missing api issue (not all freetype routines are exposed in freetype-py), then we can add it, though I doubt that's the case, since getting at the postscript name is quite an old functionality and should be in; if it is lack of documentation, consulting upstream (freetype's) is in order. Lastly, for some fonts, it is also possible that the font creator mistakenly put off-spec bytes/encoding there. Actually, looking at examples/ftdump.py (in the source examples directory on freetype-py), it should just work. what exactly is your problem? The postscript # hex encoding one?

from freetype-py.

JeremieBergeron commented on June 16, 2024

If it is an encoding issue, it is as I said, the correct way is in the reference

The adobe reference doesn't say how to decode it.
It only say how to create an postscript name.

what exactly is your problem?

Face.postscript_name can return bytes
It should always return a string.

It seems freetype always return an ascii bytes, so i think freetype-py should do that:

if font.postscript_name is not None:
    try:
        decoded_postscript_name = font.postscript_name.decode("ASCII")
    except UnicodeDecodeError:
        print("The font you specified contain an invalid postscript name")

from freetype-py.

HinTak commented on June 16, 2024

@JeremieBergeron I have already pointed out that the correct way to interprete those bytes is as in ftdump.py example. The example does return a string. It is not a neat two-line of code answer, but it is the answer. The fact that this particular code does not work on this particular font , is because this particular font is buggy, as in it is off-spec. That the font still (partially) works (in some circumstances/ for some usage) is besides the point. Some other part of the font is not buggy, that's what you are claiming, really.

from freetype-py.

HinTak commented on June 16, 2024

If you are proposing copying that 100 lines of ftdump.py as a wrapper into the core, that's debatable.

from freetype-py.

JeremieBergeron commented on June 16, 2024

Why are you talking about ftdump.py?

It does not decode the byte:

freetype-py/examples/ftdump.py

Line 34 in 0084212

ps_name = face.postscript_name or "UNAVAILABLE"

from freetype-py.

HinTak commented on June 16, 2024

I am not sure what you are asking here. There is an implicit conversion on print. As I explained quite a few times, localised names are as done in ftdump. If the font name is not ascii, it is not ascii. Blindly converting to ascii seems wrong.

There is a better way of encoding localised names (And some font vendors still get it wrong). Historically the postscript name is anything that that font vendor put there, and it works for their intended purpose... and it looks as if font vendors put ascii names, localized names (for its intended locale), utf8 names recently in some cases, and postscript encoded hex in others. What it should be was added later.

If you think the conversion to ascii should be done, it could be added on the client side...

from freetype-py.

JeremieBergeron commented on June 16, 2024

I am talking about postscript_name. In the freetype documentation, it is wrote: Retrieve the ASCII PostScript name of a given face, if available. This only works with PostScript, TrueType, and OpenType fonts.

So, it always return an ascii bytes.

Of course, this won't work if I was trying to decode directly the a name in the os2 table, but that's totally different (also, the code in ftdump does not always retrieve the good encoding, see what fonttools have done

from freetype-py.

HinTak commented on June 16, 2024

In the case of it being completely normal and ascii, print(font.postscript_name.decode("ASCII")) and print(font.postscript_name) are not that different, visually. One might argue not to convert - python 3 strings internally are not single byte representations, so that will surprise some other people.

from freetype-py.

[Question] How to decode Face.postscript_name? about freetype-py HOT 26 CLOSED

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent