Giter Club home page Giter Club logo

Comments (26)

rougier avatar rougier commented on June 16, 2024

Maybe Face.postscript_name.decode("utf-8")?

from freetype-py.

moi15moi avatar moi15moi commented on June 16, 2024

A postscript name could be in any encoding, so that's not a good idea to always take utf-8.

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

from freetype-py.

moi15moi avatar moi15moi commented on June 16, 2024

Read the postscript reference manual.
Where is it available?

Or, is there any way to get the name of the postscript_name? With the name, i could easily convert it to string.

from freetype-py.

moi15moi avatar moi15moi commented on June 16, 2024

I would also want to know how can we decode sfnt name:

import freetype
face = freetype.Face("F5AJJI3A.TTF")

for i in range(face.sfnt_name_count):
    name = face.get_sfnt_name(i)
    print(name.string) # can return bytes

Here is a font example: https://mega.nz/file/S9ERDRpQ#bcPhS06kv-D5jt64aTNDbZVd6gZr6ZfJDYT91yYsoWk

from freetype-py.

rougier avatar rougier commented on June 16, 2024

For SNFT name, see https://freetype.org/freetype2/docs/reference/ft2-sfnt_names.html
For Postscript_name, see https://freetype.org/freetype2/docs/reference/ft2-base_interface.html#ft_get_postscript_name=

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

The postscript name is in plain ascii, the SNFT name is in SJIS encoding - the combination of platform/encoding/language id's said so. You need to call one of the python decoding function to decode bytes as sjis encoding.

The postscript name is Fj-Ima310, the SNFT name should decode to "Fjイーマ310" from "Fj\x83C\x81[\83}310"

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

In your code above, you need to read also "name.platform_id", encoding_id and language_id , before deciding how to decode name.string in general.

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024
>>> name = face.get_sfnt_name(1)
>>> print((name.string).decode("sjis"))
Fjイーマ310
>>> print(name.encoding_id)
0
>>> print(name.language_id)
11
>>> print(name.platform_id)
1

1,0,11 is Japanese SJIS. There is a table linked in the https://freetype.org/freetype2/docs/reference/ft2-sfnt_names.html which tells you what (platform, encoding, language)= (1,0,11) means. You basically needs to check it is (1,0,11) to set "sjis" in the decode argument.

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

Extracted from the freetype doc -

#define TT_PLATFORM_MACINTOSH      1
#define TT_MAC_ID_ROMAN                 0
#define TT_MAC_LANGID_JAPANESE                     11

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

Some of the other entries look broken, in this font.

>>> name = face.get_sfnt_name(8)
>>> print(name.platform_id)
3
>>> print(name.encoding_id)
2
>>> print(hex(name.language_id))
0x411
#define TT_PLATFORM_MICROSOFT      3
#define TT_MS_ID_SJIS                             2
#define TT_MS_LANGID_JAPANESE_JAPAN                    0x0411

This suggests it is in SJIS too. However, it won't decode as sjis, but needed to be decoded as utf-16-be:

>>> print(name.string.decode("sjis"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0xfc in position 7: illegal multibyte sequence
>>> print(name.string.decode("utf-16-be"))
Fjイーマ310
>>> 

ie. this font is slightly broken in some of its sfnt names.

from freetype-py.

moi15moi avatar moi15moi commented on June 16, 2024

I know that the encoding will depend on the platform_id and encoding_id (if platform_id = 0, it also depend on language_id).

My problem is that I don't understand how I can get the right encoding from these parameter. Which method should I call? In your example, you harcoded sjis.

But, if I remove all the sfnt name except the one from the platformID 3, I can't decode it. Still, I get the correct name with windows and libass which use freetype: https://mega.nz/file/Oo9ygbJA#7Ri7rlZ0oCxS6slXtfxIpP_VJa1HE6h24PcRRtGDr0E

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

Freetype itself does not care about text encodings. It is a library about any arbitrary mapping between text of any (sometimes localized, and sometimes even custom, like a private collections of symbols) encoding to shape. There is nothing inside freetype to call.

There are a few combinations of platform/encoding/language where it means unicode (the newer common standard). For the rest, it means the corresponding localised encoding, back in the 1980's, before unicode. Japan used SJIS, and still do.

As I said, this particular font is buggy in the (3,2,0x411) strings. 3,2,0x411 is japanese and sjis, but the bytes are in utf-16-be, wrongly.

There is no quick/fast way of setting the decoding parameter - given there are about 10+ common localised encodings (cjk is 4 already, simplified chinese = gb18030 vs traditional = big5). The logic is fairly messy:

If (combo = one of the unicode ones )
Do unicode
Else if (combo is one of lang1)
Do lang1
Else if (combo is one of lang 2)
... etc

As I write a 3rd time now, this particular font is buggy for its (3,2,0x411) name strings. Anyway, you can do "ftdump -n ..." on most fonts and ftdump (one of freetype2-demo programs written by the freetype people, to demonstrate freetype api's... available on most Linux platforms, and buildable for windows too) will try to decode all the strings to unicode or / hex for you. The actual decoding routine is "Print_Sfnt_Names" is about ~110 lines (total is ~1400, so about 10% of it!), from about line 340 onwards, and it is not a pretty thing: it is a few large and nested "switch (x_id) case:... " .

Considering even the freetype people needs to write 110 lines of C code to demonstrate how to decode the sfnt names, and it only converts the utf-16-be ones, and do nothing for the others. Utf-16-be is special: it is the native encoding of the first apple mac in 1980's, when truetype was created.That's your answer - you need to copy that 110 lines of C code, convert that to python, adds a few lines to decode arbitrary names for arbitrary fonts, if that's your goal.

I'll write a 4th time: this particular font is buggy (ie. Off-spec) in the names department. Don't use it for testing your code in this area.

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

I have just been reminded in #156 that in our example directory, there is a python version of ftdump.py : https://github.com/rougier/freetype-py/blob/master/examples/ftdump.py - you can see the piles of "if ... elif ..." for the name decoding part.

from freetype-py.

moi15moi avatar moi15moi commented on June 16, 2024

Ok, thank you.

The font is not really "buggy", but it is a special case. With a modified version of fonttools, i can decode everything correctly.

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

The font is buggy. The platform/encoding/language tags for the sfnt names don't reflect their encoding correctly. Maybe it is not seriously buggy, but buggy nonetheless. If fonttools shows every strings in human readable form (more than "ftdump -n " is able to show), then it is behaving in a friendly though off-spec (ie buggy) manner.

from freetype-py.

moi15moi avatar moi15moi commented on June 16, 2024

Since it is wrote in the documentation of freetype and adobe that postscript name should only contain ascii character, this seems to be a solution:

if font.postscript_name is not None:
    try:
        decoded_postscript_name = font.postscript_name.decode("ASCII")
    except UnicodeDecodeError:
        print("The font you specified contain an invalid postscript name")

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

BTW you can see "copyright 1998" for this particular font. Some of the specs/docs were written later.

It is a work-around: font designers / font editing software do all sort of things , until the community (font creators and font consuming techs) reaches concensus about what is good and what to avoid, and the spec gets updated to reflect concensus . Often old buggy fonts, which are sufficiently useful nonetheless, do not get updated.

I think "contains ascii only" is a "recommendation". Many fonts were created with non-ascii names (for non-english markets, like in this case, Japanese) before it was stated as a poor practice.

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

from freetype-py.

JeremieBergeron avatar JeremieBergeron commented on June 16, 2024

If it is an encoding issue, it is as I said, the correct way is in the reference

The adobe reference doesn't say how to decode it.
It only say how to create an postscript name.

what exactly is your problem?

Face.postscript_name can return bytes
It should always return a string.

It seems freetype always return an ascii bytes, so i think freetype-py should do that:

if font.postscript_name is not None:
    try:
        decoded_postscript_name = font.postscript_name.decode("ASCII")
    except UnicodeDecodeError:
        print("The font you specified contain an invalid postscript name")

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

@JeremieBergeron I have already pointed out that the correct way to interprete those bytes is as in ftdump.py example. The example does return a string. It is not a neat two-line of code answer, but it is the answer. The fact that this particular code does not work on this particular font , is because this particular font is buggy, as in it is off-spec. That the font still (partially) works (in some circumstances/ for some usage) is besides the point. Some other part of the font is not buggy, that's what you are claiming, really.

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

If you are proposing copying that 100 lines of ftdump.py as a wrapper into the core, that's debatable.

from freetype-py.

JeremieBergeron avatar JeremieBergeron commented on June 16, 2024

Why are you talking about ftdump.py?

It does not decode the byte:

ps_name = face.postscript_name or "UNAVAILABLE"

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

I am not sure what you are asking here. There is an implicit conversion on print. As I explained quite a few times, localised names are as done in ftdump. If the font name is not ascii, it is not ascii. Blindly converting to ascii seems wrong.

There is a better way of encoding localised names (And some font vendors still get it wrong). Historically the postscript name is anything that that font vendor put there, and it works for their intended purpose... and it looks as if font vendors put ascii names, localized names (for its intended locale), utf8 names recently in some cases, and postscript encoded hex in others. What it should be was added later.

If you think the conversion to ascii should be done, it could be added on the client side...

from freetype-py.

JeremieBergeron avatar JeremieBergeron commented on June 16, 2024

I am talking about postscript_name. In the freetype documentation, it is wrote: Retrieve the ASCII PostScript name of a given face, if available. This only works with PostScript, TrueType, and OpenType fonts.

So, it always return an ascii bytes.

Of course, this won't work if I was trying to decode directly the a name in the os2 table, but that's totally different (also, the code in ftdump does not always retrieve the good encoding, see what fonttools have done

from freetype-py.

HinTak avatar HinTak commented on June 16, 2024

In the case of it being completely normal and ascii, print(font.postscript_name.decode("ASCII")) and print(font.postscript_name) are not that different, visually. One might argue not to convert - python 3 strings internally are not single byte representations, so that will surprise some other people.

from freetype-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.