Giter Club home page Giter Club logo

Comments (7)

santhoshtr avatar santhoshtr commented on May 30, 2024

On Friday, December 6, 2013, Rafael Xavier de Souza wrote:

Hi Santhosh,

Timo (@Krinkle https://github.com/krinkle) has commented about your
library on our jQuery (cc @jzaefferer https://github.com/jzaefferer
@scottgonzalez https://github.com/scottgonzalez) recent talk about
Globalize / CLDR.

Follow some general comments I have about it below.

PS: Please, forgive me for opening an issue to talk about it. If you
prefer to have this conversation using another channel, just let me know.

Thanks for writing, no issues in using this channel :)

In a brief, first I would like to understand some architect decisions made
here, and second to point out suggestions, and a possible cooperation.

  1. CLDRPluralRuleParser 1.1. LDML vs JSON

I've noticed demos are extracting the plural information from the LDML
file. Any reason why not using the official CLDR JSON instead, and avoid
thishttps://github.com/santhoshtr/CLDRPluralRuleParser/blob/master/demo/demo.js#L37-L63
?

That demo was written before CLDR started shipping json formatted data. No
issues in using json format as input too.

The CLDRPluralRuleParser has a small js parser for the LDML to create a
json format that is optimized for our usecases too. See
https://github.com/santhoshtr/CLDRPluralRuleParser/blob/master/tools/PluralXML2JSON.html
Sample output:
http://thottingal.in/projects/js/plural/tools/PluralXML2JSON.html

Optimizations:

  1. Avoid samples in the rules. CLDR 24 has number samples using @integer
    and @decimal inside the rules itself. For processing rules, we are not
    interested and it takes storage space. We strip them
  2. CLDR contains a lot of languages with plural rules same as the usual
    fallback language -English. So we dont put them in the output json.

The jquery.i18n library https://github.com/wikimedia/jquery.i18n which
Wikimedia uses for friend end internationalization uses these custom
formatted plural json output along with CLDRPluralRuleParser for plural
rule calculation. See
https://github.com/wikimedia/jquery.i18n/blob/master/src/jquery.i18n.language.js

Suggestion

Unicode officially distributes CLDR in the JSON format [1http://cldr.unicode.org/index/cldr-spec/json]
[2 https://github.com/rxaviers/cldr#how-to-get-cldr-json-data]. Using
JSON in javascript is more straightforward, requiring simpler algorithms to
manipulate it.

You could use this lightweight library https://github.com/rxaviers/cldrto help with the JSON CLDR manipulation, which is what we use in jQuery
Globalize.

This whole blockhttps://github.com/santhoshtr/CLDRPluralRuleParser/blob/master/demo/demo.js#L37-L63would become this:

var locale = new Cldr( localeStr );var rules = locale.supplemental("plurals-type-cardinal/{language}");/** * Example for "en":* {* "pluralRule-count-one": "i = 1 and v = 0 @integer 1",* "pluralRule-count-other": " @integer 0, 216, 100, 1000, 10000, 100000, 1000000, … @decimal 0.01.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …"* } */

1.2. API

I've noticed the API requires user (I'm naming user the code that uses
this library) to pass the rule as an argument on user-code-level:
https://github.com/santhoshtr/CLDRPluralRuleParser/blob/master/demo/demo.js#L45

Suggestions

First of all, congrats for making the library succinct, so all it does is
what it's supposed to do: handle plural CLDR rules. Although, thishttps://github.com/santhoshtr/CLDRPluralRuleParser/blob/master/demo/demo.js#L37-L63is still a cumbersome boilerplate code to reside in the user code.

Usually, user needs to know what's the cardinal form given a locale, and a
number. Something like this:

plural("en", 1) -> one
plural("en", 3) -> other

If you consider using CLDR in the JSON form, and using cldr.jshttps://github.com/rxaviers/cldr,
it would be piece of cake to offer an API like the above with very few
extra code https://gist.github.com/rxaviers/7828731.

Yes, your perspective about this is correct. I must explain why the parser
is not providing this abstraction at this point of time. That will also
help us to figure out a way to find solution that satisfy our use case and
general use cases as you pointed out.

In MediaWiki https://www.mediawiki.org, we used to have custom handwritten
plural calculationhttps://github.com/wikimedia/mediawiki-core/commit/bbbcf089dbbb84dcd92e2332d2cf7222d7140647#diff-d92658e2bddb028407444a76eb1e5797L78 rules
for all languages we support in php and javascript- for serverside
and client side respectively. In 2011 we decided to choose data driven
approach by avoiding handwritten logic for each language. We wanted to use
CLDR data and parsers/tools on top of this data to do plural
calculation(Plural is just one example, number formats, language names etc
are there). But we immediately realized the difference between CLDR data
and the plural calculation code that exist in MediaWiki- Sometimes
MediaWiki has plural rules that are not defined in CLDR yet. Sometimes
MediaWiki has plural rules that does not match with what CLDR defined. So
we had to apply an overriding and extending mechanism on top of plural.xml.
See https://github.com/wikimedia/mediawiki-core/tree/master/languages/data

In the past 2 year, the difference became small but not void.

So, with this XML data at server, and cached, for a given language, our PHP
Plural parser https://github.com/wikimedia/mediawiki-core/blob/master/languages/utils/CLDRPluralRuleEvaluator.php and
Javascript
plural parserhttps://github.com/wikimedia/mediawiki-core/blob/master/resources/mediawiki.libs/CLDRPluralRuleParser.js evaluates
the plural rules for the context language. I wrote
the CLDRPluralRuleParser.js.

Since the data exist in server as xml, and our client side resource loading
mechanis https://www.mediawiki.org/wiki/ResourceLoaderm push the specific
plural rules for a given language, we did not duplicate the plural rules in
json or php either in LDML or json format.

Since the CLDRPluralRuleParser.js was written with this specific use case
in mind, it does not bundle the plural rules or does not provide the
abstraction like plural("en", 1) -> one. In its current form it acts as
parser alone, but I provided an example of using LDML.

For our use case(MediaWiki/Wikipedia)
mediawiki.cldr.js https://github.com/wikimedia/mediawiki-core/blob/master/resources/mediawiki.language/mediawiki.cldr.js provides
this abstraction. Same provided by jquery.i18n.

So I hope I explained the reasons behind the current architecture. But I am
very happy to improve and make the library more general. Discussions like
this are in this direction.

Question is whether we need to keep this library as a parser alone for
plural rules and provide evaluations in libraries like globalize.js or
jquery.i18n. Or whether we need to have plural rules bundled with the help
of cldr.js and provide the abstraction.

I must add that Wikimedia/MediaWiki has another customization on top of the
CLDR plural forms- named explicit plural forms. Eg: There {{PLURAL:$1|is
one egg|are $1 eggs|12=is a dozen eggs}} in the basket. For 12, this string
get transformed to "There is a dozen eggs in the basket" by our i18n
libraries. Here "12=is a dozen" is explicit plural form, this is outside
the one, two,few, many etc plural forms given by CLDR.

1.3. UMD

If it works on browsers, why not supporting AMD as well on your UMD
wrapperhttps://github.com/santhoshtr/CLDRPluralRuleParser/blob/master/src/CLDRPluralRuleParser.js#L305-L307
?

Suggestions

https://github.com/umdjs/umd/blob/master/returnExports.js

Thanks for pointing out. I will be happy to do this.

  1. Cooperation

Globalize https://github.com/jquery/globalize/ and moment.jshttps://github.com/moment/moment(cc
@ichernev https://github.com/ichernev) are working the migration to
CLDR. To make a long story short, moment.js needs plural support on parts
of its code (eg.
https://github.com/moment/moment/blob/develop/moment.js#L713-L726).
Although I have not consulted any of the parts, perhaps you both could
cooperate.

Timo (@Krinkle https://github.com/krinkle) told me about this, and I am
happy to help. We have a scheduled meeting sometime this month.

Feel free to ask if I was not clear in any of the above reply!

from cldrpluralruleparser.

rxaviers avatar rxaviers commented on May 30, 2024

Is CLDRPluralRuleParser of any use for non-CLDR plural rules? I didn't understand whether your plan is to support a custom variation of CLDR rules, or to implement CLDR specification strictly (while allowing rules to be optimized)?

Have you considered filing bugs on CLDR bug tracker for the cases where there are differences between your former custom data vs. CLDR data? One or another may be wrong.

If the plan is to implement a custom variation of CLDR, I have no more to say. Otherwise, please read on.

The foundations behind cldr.js, and Globalize 1.0.0 is exactly to not bundle I18n data within the code. Maintaining such bundled data is what we're avoiding by "outsourcing" it to CLDR to handle. We're, instead, helping them by spotting any problems we find on their data or on their specs documentation.

Our goal is to provide a set of tools that leverage the official CLDR JSON data; allow users to load as much or as little data as they need; avoid duplicating data if using multiple i18n libraries that leverage CLDR; run in browsers and node.js.

If you share the same interests as we do above, I do think we can converge efforts.

from cldrpluralruleparser.

santhoshtr avatar santhoshtr commented on May 30, 2024

Is CLDRPluralRuleParser of any use for non-CLDR plural rules? I didn't understand whether your plan is to support a custom variation of CLDR rules, or to implement CLDR specification strictly (while allowing rules to be optimized)?

We strictly follow TR35 specification for plural rules for numbers. (http://www.unicode.org/reports/tr35/tr35-33/tr35-numbers.html#Language_Plural_Rules)

When we override or extend the CLDR data set, we write them as per the tr35 syntax.

To give a better idea, please see our enhancement(override or extend) of CLDR 23 data https://github.com/wikimedia/mediawiki-core/blob/master/languages/data/plurals-mediawiki.xml

Have you considered filing bugs on CLDR bug tracker for the cases where there are differences between your former custom data vs. CLDR data? One or another may be wrong.

Of course, we work with language communities to resolve the differences in CLDR and our data set. If you look at our custom data, the difference is small compared to 400+ languages we support. And thanks to CLDR and language communities, this difference is decreasing with every CLDR release. (Wikimedia is a liaison member in Unicode consortium)

You will see Hebrew override in the above link, but in CLDR 24, this was resolved and we are going to remove that override and we can use CLDR data for Hebrew.

If the plan is to implement a custom variation of CLDR, I have no more to say. Otherwise, please read on.

Note at all. We want to strictly follow CLDR, but at the same time respecting community consensus about the language rules, most of the cases , it is till a new version of CLDR data set released. But because of the large number of language we operate at , there will be one more language that is not yet in CLDR data set :)

If you share the same interests as we do above, I do think we can converge efforts.

Do you still have doubt? :)

from cldrpluralruleparser.

rxaviers avatar rxaviers commented on May 30, 2024

Awesome. So, getting back to the question about adopting cldr.js...

Question is whether we need to keep this library as a parser alone for plural rules and provide evaluations in libraries like globalize.js or jquery.i18n. Or whether we need to have plural rules bundled with the help of cldr.js and provide the abstraction.

Cldr.js is unopinionated about how data should be loaded. It's a decision deferred to the user (I'm naming user the code that uses cldr.js library). Cldr.js simply takes care of the cldr manipulation. So, if you pick cldr.js, you are still able to postpone the how-to-load-the-data decision to the user.

The benefit is that you don't require user to parse / manipulate the data by himself to provide your library the rules. You avoid that boilerplate code (we've talked about) in user level function, while you still allow him to load the data the way he wants, ie: by using mediawiki's Resource Loader, or by using AMD (example), or by using $.ajax, or even by embedding the data into the code like it's being done in https://github.com/wikimedia/jquery.i18n/blob/master/src/jquery.i18n.language.js (<- although, I do not suggest that for obvious reasons of maintainability).

Picking cldr.js still leaves your library as a parser alone for plural.

What do you think?

from cldrpluralruleparser.

santhoshtr avatar santhoshtr commented on May 30, 2024

All right, So I added plurals.json from json.zip of CLDR 24, and used that for demo instead of using plurals.xml.
See f4ae51a

I added another demo using cldr.js in commit 87dbfa5 (That was very easy!) and it is running at
http://thottingal.in/projects/js/plural/demo/cldrjs.html

So if a library like moment.js(or any piece of js) want to use plural rules, they should be able to use cldr.js along with CLDRPluralRuleParser. Do you see anything missing from our side to make this smooth?

from cldrpluralruleparser.

rxaviers avatar rxaviers commented on May 30, 2024

@santhoshtr thanks for taking this on. I am also very happy to hear that using cldr.js made it easier.

from cldrpluralruleparser.

rxaviers avatar rxaviers commented on May 30, 2024

This has been addressed, closing...

from cldrpluralruleparser.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.