Giter Club home page Giter Club logo

Comments (4)

synth-me avatar synth-me commented on August 20, 2024 1

Thanks !

that actually solved my problem. I used your library here : https://github.com/synth-me/torch_kr , but not yet upgraded with this new feature.

from jamotools.

HaebinShin avatar HaebinShin commented on August 20, 2024

Hi. From what I understand, you mean it raises some error to vectorize the number bigger than 62?
If you the built in vectorization function with the number or English, it will be encoded with character unit.

>>> v = jamotools.Vectorizationer(rule=jamotools.rules.RULE_1, \
...                                   max_length=None, \
...                                   prefix_padding_size=0)
>>> print(v.vectorize("62"))
[101  97]
>>> print(v.vectorize("622"))
[101  97  97]

>>> len("Typhoon Maemi was the most powerful typhoon to strike South Korea since records began in 1904. Maemi formed on September 4, 2003 in the western Pacific and became a typhoon on September 8. Passing over the Japanese island of Miyako-jima on September 10, it left 95 percent of residents without power and caused 58.5 mm (2.30 in) of rainfall in an hour and 402.5 mm (15.85 in) in 24 hours. Maintaining much of its intensity, it made landfall west of Busan, South Korea, on September 12, where winds reached 154 km/h (96 mph). The port sustained heavy damage, restricting exports for months. On Jeju Island, it produced a peak wind gust of 216 km/h (134 mph) and a minimum pressure of 950 mbar (28 inHg), both records for the country. Nationwide, high winds destroyed about 5,000 houses and damaged 13,000 buildings, leaving 25,000 people homeless. Crop damage resulted in the poorest rice harvest in 23 years. Across South Korea, Maemi killed 117 people, and damage totaled 5.52 trillion won (US$4.8 billion).")
1008
>>> len(v.vectorize("Typhoon Maemi was the most powerful typhoon to strike South Korea since records began in 1904. Maemi formed on September 4, 2003 in the western Pacific and became a typhoon on September 8. Passing over the Japanese island of Miyako-jima on September 10, it left 95 percent of residents without power and caused 58.5 mm (2.30 in) of rainfall in an hour and 402.5 mm (15.85 in) in 24 hours. Maintaining much of its intensity, it made landfall west of Busan, South Korea, on September 12, where winds reached 154 km/h (96 mph). The port sustained heavy damage, restricting exports for months. On Jeju Island, it produced a peak wind gust of 216 km/h (134 mph) and a minimum pressure of 950 mbar (28 inHg), both records for the country. Nationwide, high winds destroyed about 5,000 houses and damaged 13,000 buildings, leaving 25,000 people homeless. Crop damage resulted in the poorest rice harvest in 23 years. Across South Korea, Maemi killed 117 people, and damage totaled 5.52 trillion won (US$4.8 billion)."))
1008

Could give me more information with your code?

from jamotools.

synth-me avatar synth-me commented on August 20, 2024

I think i did not make myself clear. Let's say we have an input as "가" and the result as [12,21]. My question is what's the highest value a single jamo can have ? or there's no such thing ? i may be only wrong

from jamotools.

HaebinShin avatar HaebinShin commented on August 20, 2024

Sorry for the late reply. You can check the mapping table through the symbol_map property of jamotools.Vectorizationer.

>>> v = jamotools.Vectorizationer(rule=jamotools.rules.RULE_1, max_length=None, prefix_padding_size=0)
>>> v.symbol_map
{'<PAD>': 0, '<UNK>': 1, 'ᄀ': 2, 'ᄁ': 3, 'ᄂ': 4, 'ᄃ': 5, 'ᄄ': 6, 'ᄅ': 7, 'ᄆ': 8, 'ᄇ': 9, 'ᄈ': 10, 'ᄉ': 11, 'ᄊ': 12, 'ᄋ': 13, 'ᄌ': 14, 'ᄍ': 15, 'ᄎ': 16, 'ᄏ': 17, 'ᄐ': 18, 'ᄑ': 19, 'ᄒ': 20, 'ᅡ': 21, 'ᅢ': 22, 'ᅣ': 23, 'ᅤ': 24, 'ᅥ': 25, 'ᅦ': 26, 'ᅧ': 27, 'ᅨ': 28, 'ᅩ': 29, 'ᅪ': 30, 'ᅫ': 31, 'ᅬ': 32, 'ᅭ': 33, 'ᅮ': 34, 'ᅯ': 35, 'ᅰ': 36, 'ᅱ': 37, 'ᅲ': 38, 'ᅳ': 39, 'ᅴ': 40, 'ᅵ': 41, 'ᆨ': 42, 'ᆩ': 43, 'ᆪ': 44, 'ᆫ': 45, 'ᆬ': 46, 'ᆭ': 47, 'ᆮ': 48, 'ᆯ': 49, 'ᆰ': 50, 'ᆱ': 51, 'ᆲ': 52, 'ᆳ': 53, 'ᆴ': 54, 'ᆵ': 55, 'ᆶ': 56, 'ᆷ': 57, 'ᆸ': 58, 'ᆹ': 59, 'ᆺ': 60, 'ᆻ': 61, 'ᆼ': 62, 'ᆽ': 63, 'ᆾ': 64, 'ᆿ': 65, 'ᇀ': 66, 'ᇁ': 67, 'ᇂ': 68, 'a': 69, 'b': 70, 'c': 71, 'd': 72, 'e': 73, 'f': 74, 'g': 75, 'h': 76, 'i': 77, 'j': 78, 'k': 79, 'l': 80, 'm': 81, 'n': 82, 'o': 83, 'p': 84, 'q': 85, 'r': 86, 's': 87, 't': 88, 'u': 89, 'v': 90, 'w': 91, 'x': 92, 'y': 93, 'z': 94, '0': 95, '1': 96, '2': 97, '3': 98, '4': 99, '5': 100, '6': 101, '7': 102, '8': 103, '9': 104, ' ': 105, '.': 106, ',': 107, '/': 108, '(': 109, ')': 110, '"': 111, '*': 112, ':': 113, '-': 114, '%': 115}

So, for the default setting, the highest value for a single jamo is 68.

from jamotools.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.