Giter Club home page Giter Club logo

Comments (8)

vanderlee avatar vanderlee commented on June 1, 2024

I'm currently working on some phpSyllable.
Do you have some examples I could use? A russian text and a version of that text with dashes where the hyphens should be so I can doublecheck if the characters are handled properly by the parser.

from phpsyllable.

Krknv avatar Krknv commented on June 1, 2024

text generator
http://referats.yandex.ru/philosophy.xml

text hyphenator
http://iproc.ru/interesting/hyphenation/test/
http://quittance.ru/hyphenator.php

some plugins
https://github.com/kozachenko/jQuery-Russian-Hyphenation
https://code.google.com/p/hyphenator/

from phpsyllable.

vanderlee avatar vanderlee commented on June 1, 2024

Thanks for the links, they helped a lot.

I don't quite know how reliable those hyphenators are, so I'm a bit weary to use them as validation.

Currently I'm getting these results from a minor test with some random words:
Де-дукт-ив-ный ме-тод ярем-но-го
I'm pretty sure none of that makes sense gramatically, since I have absolutely no idea what it means. Could you please verify the hyphens in those three words? In particular "Дедуктивный" I suspect may be wrong.

The code is not yet ready for public as I discovered some oddities specific to the russian .tex file which need further research. Mostly to do with some LaTeX-specific hacks which may cause conflicts and unexpected (non-standard)

Also (before I forget), you'll need to remove any cache files as the .tex parsing was invalid as well.

from phpsyllable.

Krknv avatar Krknv commented on June 1, 2024

You can check some words with dictionary

де-дук-тив-ный
http://slovonline.ru/slovar_el_fonetic/b-4/id-28417/deduktivnyj.html

ме-тод
http://slovonline.ru/slovar_el_fonetic/b-13/id-64071/metod.html

ярем-ный -> ярем-но-го
http://slovonline.ru/slovar_el_fonetic/b-30/id-161819/yaremnyj.html


one more hyphenator
http://www.ushuaia.pl/hyphen/

some recommendation for hyphenation (on russian)
http://raal100.narod.ru/index/0-8
http://tutrus.com/orthography/pravila-perenosa-slov

from phpsyllable.

vanderlee avatar vanderlee commented on June 1, 2024

I think I've ironed out all the bugs. There was a dumb bug in the .tex file parsing which caused the bad hyphenation on дедуктивный. "Luckily" this also affected the other languages, so I could use those to track down the issue.

I've checked in a version which seems to handle cyrillic characters as well as the hyphenators you gave. It should also handle any other non-latin language. Could you please verify Russian?

from phpsyllable.

Krknv avatar Krknv commented on June 1, 2024

No errors and no hyphens

current
http://cl.ly/image/1m0n173b3n3R

old
http://cl.ly/image/1n3i0m0B442A

** old version working a lot faster

from phpsyllable.

vanderlee avatar vanderlee commented on June 1, 2024

Did you clear out the cache directory?

p.s. The new version will indeed be slower, due to the need to handle multibyte UTF-8 characters instead of guarenteed single-byte ascii. Not only are fixed-width character sets a lot easier to handle, single-byte allows me to use some hacks. In order to support non-latin characters, I am forced to undo many of those tricks (though some still remain in the .tex parser). Just compare the old code (mostly parseWord method) to see how some very fast optimizations had to be reverted.
There still may be some optimizations possible in the code, mostly in the regex functions and choice of character encoding, but it'll never be as fast as a latin-only version.

from phpsyllable.

Krknv avatar Krknv commented on June 1, 2024

Yep, after clearing cache - it work. Good job =)

from phpsyllable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.