Comments (8)
I'm currently working on some phpSyllable.
Do you have some examples I could use? A russian text and a version of that text with dashes where the hyphens should be so I can doublecheck if the characters are handled properly by the parser.
from phpsyllable.
text generator
http://referats.yandex.ru/philosophy.xml
text hyphenator
http://iproc.ru/interesting/hyphenation/test/
http://quittance.ru/hyphenator.php
some plugins
https://github.com/kozachenko/jQuery-Russian-Hyphenation
https://code.google.com/p/hyphenator/
from phpsyllable.
Thanks for the links, they helped a lot.
I don't quite know how reliable those hyphenators are, so I'm a bit weary to use them as validation.
Currently I'm getting these results from a minor test with some random words:
Де-дукт-ив-ный ме-тод ярем-но-го
I'm pretty sure none of that makes sense gramatically, since I have absolutely no idea what it means. Could you please verify the hyphens in those three words? In particular "Дедуктивный" I suspect may be wrong.
The code is not yet ready for public as I discovered some oddities specific to the russian .tex file which need further research. Mostly to do with some LaTeX-specific hacks which may cause conflicts and unexpected (non-standard)
Also (before I forget), you'll need to remove any cache files as the .tex parsing was invalid as well.
from phpsyllable.
You can check some words with dictionary
де-дук-тив-ный
http://slovonline.ru/slovar_el_fonetic/b-4/id-28417/deduktivnyj.html
ме-тод
http://slovonline.ru/slovar_el_fonetic/b-13/id-64071/metod.html
ярем-ный -> ярем-но-го
http://slovonline.ru/slovar_el_fonetic/b-30/id-161819/yaremnyj.html
one more hyphenator
http://www.ushuaia.pl/hyphen/
some recommendation for hyphenation (on russian)
http://raal100.narod.ru/index/0-8
http://tutrus.com/orthography/pravila-perenosa-slov
from phpsyllable.
I think I've ironed out all the bugs. There was a dumb bug in the .tex file parsing which caused the bad hyphenation on дедуктивный. "Luckily" this also affected the other languages, so I could use those to track down the issue.
I've checked in a version which seems to handle cyrillic characters as well as the hyphenators you gave. It should also handle any other non-latin language. Could you please verify Russian?
from phpsyllable.
No errors and no hyphens
current
http://cl.ly/image/1m0n173b3n3R
old
http://cl.ly/image/1n3i0m0B442A
** old version working a lot faster
from phpsyllable.
Did you clear out the cache directory?
p.s. The new version will indeed be slower, due to the need to handle multibyte UTF-8 characters instead of guarenteed single-byte ascii. Not only are fixed-width character sets a lot easier to handle, single-byte allows me to use some hacks. In order to support non-latin characters, I am forced to undo many of those tricks (though some still remain in the .tex parser). Just compare the old code (mostly parseWord
method) to see how some very fast optimizations had to be reverted.
There still may be some optimizations possible in the code, mostly in the regex functions and choice of character encoding, but it'll never be as fast as a latin-only version.
from phpsyllable.
Yep, after clearing cache - it work. Good job =)
from phpsyllable.
Related Issues (20)
- Not working with : abeyant , abraxas , abraxas , pipeline etc HOT 4
- Error when trying to use the functions HOT 2
- Split sentence into array of arrays of syllables of each word HOT 19
- hyphenateHtml messes up certain symbols HOT 1
- Showing the stress on syllables HOT 3
- no autoloader for PHP 7+ HOT 1
- Min word count after hyphenation
- Array and string offset access syntax with curly braces is deprecated HOT 2
- Results differ from syllable.toyls.com HOT 1
- Word does not syllablise correctly HOT 1
- Update language files HOT 4
- Replace / remove outdated German language file hyph-de.tex HOT 5
- Replace test execution by Travis CI with GitHub Action HOT 2
- Allow auto merge for this repository HOT 1
- Request for feedback; deprecated splitWord HOT 3
- Post-processing of first run of language update and tests workflow
- Need absolute path to cache language files HOT 24
- Cache version in JSON cache file can be infinite decimal HOT 1
- lowercase vs uppercase hyphenation word list HOT 2
- Cleanup HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phpsyllable.