opendatakosovo / cyrillic-transliteration Goto Github PK
View Code? Open in Web Editor NEWTransliterate Cyrillic script to Latin script and vice versa.
Home Page: https://pypi.python.org/pypi/cyrtranslit
License: MIT License
Transliterate Cyrillic script to Latin script and vice versa.
Home Page: https://pypi.python.org/pypi/cyrtranslit
License: MIT License
Couldn't you also make this concession?
As you know, it is rather common in Slavic texts to use accent marks to disambiguate between homographs, or to indicate intonation or vowel pronunciation.
https://en.wikipedia.org/wiki/Ye_with_grave and https://en.wikipedia.org/wiki/I_with_grave_(Cyrillic) are rather common in Macedonian.
Thank to develop and share this great project. I wonder what is the reference or historical background of the Cyrillic-Latin mappings in CyrTranslit. Is it based on the Soviet project? Or something similar exsiting transliteration scheme?
input a .txt and execute it through a command "CyrTranslit -i text.txt"
python 3.5.2
cyrtranslit-0.3
Traceback:
>>> import cyrtranslit
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[...]/lib/python3.5/site-packages/cyrtranslit/__init__.py", line 2, in <module>
from mapping import TRANSLIT_DICT
ImportError: No module named 'mapping'
string_to_transliterate = string_to_transliterate.decode('utf-8')
does not work in Python 3, returns the error
AttributeError: 'str' object has no attribute 'decode'
Similarly, return latinized_str.encode('utf-8')
means a byte-string is returned in Python 3.
b'my latin string'
Removing the encode and decode of strings in init.py makes the package work with python 3
Related to issue #6
installed cyrtranslit-1.1.1
echo 'Її' | cyrtranslit -l UA
output:
Ïï
(i.e. not changed)
I received different outputs for the same inputs based on code version:
Version: 1.0
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Я часто пью водку", "ru")
"JA chasto p'ju vodku"
Version: 1.1
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Я часто пью водку", "ru")
"YA chasto p'yu vodku"
When I try to transliterate any word written in Latin, which contain letter q
, q
letter kept as is.
Example:
In [1]: import cyrtranslit
In [2]: cyrtranslit.to_cyrillic('Question', lang_code='ru')
Out[2]: 'Qуестион'
Letter q
doesn't exist in Russian alphabet, and in most cases q
should be replaced with к
.
Was fine before commit 26a63d6
Mongolian language does not support casing variations for 2 letters in latin, e.g.:
>>> import cyrtranslit
>>> cyrtranslit.to_cyrillic("Kh", "mn")
'Х'
>>> cyrtranslit.to_cyrillic("KH", "mn")
'КH'
>>> cyrtranslit.to_cyrillic("kh", "mn")
'х'
>>> cyrtranslit.to_cyrillic("kH", "mn")
'кH'
>>> cyrtranslit.to_cyrillic("Sh", "mn")
'Ш'
>>> cyrtranslit.to_cyrillic("SH", "mn")
'СH'
>>> cyrtranslit.to_cyrillic("TS", "mn")
'ТС'
>>> cyrtranslit.to_cyrillic("Ts", "mn")
'Ц'
It seems to me that the safest code would simply combine all 3 - sr, me and mk.
The extra letters in me and mk do not conflict with any sr letters.
That way, a client can pass any Western South Slavic Cyrillic text, and be guaranteed an output.
(Likewise, in production code I would want Russian or Bulgarian letters to be handled in some way, in case they occur in text my code must process.)
Sorry, but you should know that "Republika Kosovo" doesn't exist. And most of all THERE IS NO "Serbian phrase 'Republika kosovo'". Please, don't spread the confusion.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.