camel-lab / camel-guidelines Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://camel-guidelines.readthedocs.io/
Home Page: https://camel-guidelines.readthedocs.io/
There are several matters related to the transcription of preposition and clitics in the CODA* Guidelines. We found after a series of research observations that these entities will be easier to process and write if separated from the names following them with a space.
We consequently came up with a new method for transcribing prepositions in Tunisian Arabic: م (+ال), من, لي, ع (+ال), على, في, بي, لي and كي.
We also propose to let و (Conjunction) separated from the entity directly following it.
I should credit Hager Ben Ammar (https://tn.linkedin.com/in/ben-ammar-hager-9b670ab7) for this proposal that worked well in practice although it is different from MSA methods.
The spelling of inti (meaning you) is controversial. The CODA* Guidelines says that it is motivated by the MSA guidelines. However, it transcribes this personal pronoun as إنتي.
In this particular situation, إنتِ seems to be phonologically motivated and morphologically similar to MSA guidelines.
The same applies to the past tense: كتبتِ or كتبتي.
The Phonology part lacks a significant consideration of assimilation. It should be expanded by taking into consideration what we have written at https://en.wikipedia.org/wiki/Tunisian_Arabic.
We are a group of researchers that tested the CODA guidelines among other Arabic Script conventions on real users from Tunisia with the contribution of Derja Association. We held three demo sessions in late 2019. Given that developing a large-scale writing convention for Arabic dialects is more important than developing a convention for Tunisian Arabic, we decided to share with you our findings so that they be taken into consider in enriching CODA* Guidelines.
In "Unified guidelines and resources for Arabic dialect orthography", you specified this:
Alif Maqsura The MSA rules for spelling the AlifMaqsura (ø ý), which are sometimes based on roots and sometimes on patterns, apply in CODA*.
This is not explicit as a rule. We propose to decide the transcription of Alif Maqusra for verbs according to their present.
Example,
جاء (to come) becomes جا in Tunisian Arabic. We propose to write it as جى as its present is يجي.
There are several consonants in the Arabic dialects that do not exist in the MSA. Examples: [p], [g] and [v] for Tunisian. Figure 2 in "Unified Guidelines and Resources for Arabic Dialect Orthography" provides several insights about this topic. However, there is no explicit explanation of why [g] should be written as q in Tunisian Arabic and as k in Moroccan Arabic.
In MSA, there is no shaddah at the beginning of the word. However, in Arabic dialects, this exists.
CODA* Guidelines did not seem to consider this issue. Example: مّالح (Salted Olive and Vegetables in Tunisian).
I think that this should be included in the CODA* Guidelines
The MSA rule is related to "العرب لا تبدأ إلا بمتحرك ولا تقف إلا على ساكن".
This is not valid for the Arabic dialects.
Sometimes, the use of Shaddah is needed to disambiguate between lexemes:
سلّم: say hello to someone (Tunisian)
سلم: being safe (Tunisian)
I think that Shaddah should be added in such an important situation.
As well, Haraka can be interesting to differentiate between "Al-" and Alif Madda coupled to an l in a given word:
بالْغة: Pubescent
باليمين: On the right
It seems that adding a haraka to l in this situation is excellent.
Another example where Sukun can be useful in undiacritized text is the differentiation between two types of noun phrases:
كلمة باهية, قول باهي: Good word (adjective and noun compound)
كلمةْ حق, قولْ حقيقة: True word (additional phrase)
I think that mentioning this is absolutely useful.
There is a guideline in Maghrebi CODA that was not considered well. In this brief guideline, we propose that ha-nekteb in Egyptian should be written as حنكتب and not as هنكتب as ح has etymologically developed from رح. It will be reasonable to adopt this guideline at a large scale for Arabic dialects as it solves controversies.
In Tunisian, Algerian and Moroccan, there are four long vowels including [a:] and [e:]. I have seen that this has not been considered in CODA* by contrast to CODA.
In Maghrebi CODA guidelines, iA is used to differentiate [a:] from [e:]. What we propose is to keep this convention and use Zwarakai (U+0659) + Alef Madd to note this variant in the Arabic Script. This is similar to Pashto Script. https://en.wikipedia.org/wiki/File:Harakat_pashto.svg. This is mainly because it will be not visually excellent to note kasra before Alef Madd.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.