Giter Club home page Giter Club logo

cldr-number-pm5's People

Contributors

mnlagrasta avatar oalders avatar patch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cldr-number-pm5's Issues

load locale data for each locale from a different module

Suggested by @aarondcohen:

[13:43] Aaron Cohen: as an added benefit, you could break CLDR::Number::Data::* up by locale
[13:43] Aaron Cohen: so epople would load less into memory if they aren't using the other locales

Although the number-related locale data is relatively small per locale, the aggregate is increasingly large with each CLDR release. Another idea is to remove any data from a locale that is the same as what would already be inherited.

add FAQ about fallback for non-existant locales

Users occasionally report that the wrong formatting is used for several non-existant locales including Mexican English (en-MX) and Brazilian Spanish (es-BR). We should document that since these locales don’t exist, they would fall back to English (en) and Spanish (es), respectively.

Note that I also plan on bringing up the issue to the CLDR Technical Committee that es-XX, where XX is any country within Latin America (419) should fall back to es-419 even if es-XX is not a valid locale. This would, however, require a new structure added to the LDML spec unless a locale was created for each combination of es with each remaining country within 419.

Moo::Role-related bug in Perl 5.8.1 through 5.8.3

Most releases of CLDR::Number have had inconsistent but common Moo::Role-related test failures in Perl v5.8.1 through v5.8.3. The oldest version of Perl that has not been known to have this problem is v5.8.4, although there are very few reports on that version.

We should either figure out the problem and fix it, or raise the minimum version of Perl from v5.8.1 (September 2003) to v5.8.4 (April 2004), which I would not be against.

Test reports:
http://matrix.cpantesters.org/?dist=CLDR-Number+0.19

Typical output:

Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Can't locate object method "is_role" via package "Moo::Role" at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
BEGIN failed--compilation aborted at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number/Role/Format.pm line 13.
Compilation failed in require at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Module/Runtime.pm line 313.
Compilation failed in require at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number.pm line 32.

support different rounding modes

As per the CLDR spec (see below), default rounding is half-even. There is no current way to change the rounding mode. We use Math::BigFloat, which supports the following modes: even, odd, +inf, -inf, zero, trunc, common. Let's add a rounding_mode attribute and decide if we should use the same modes and names as Math::BigFloat.

An implementation may allow the specification of a rounding mode to determine how values are rounded. In the absence of such choices, the default is to round "half-even", as described in IEEE arithmetic. That is, it rounds towards the "nearest neighbor" unless both neighbors are equidistant, in which case, it rounds towards the even neighbor. Behaves as for round "half-up" if the digit to the left of the discarded fraction is odd; behaves as for round "half-down" if it's even. Note that this is the rounding mode that minimizes cumulative error when applied repeatedly over a sequence of calculations.

use Math::BigFloat as much as possible

We’re already using Math::BigFloat in most situations for rounding using the round_mode and ffround methods. Let’s continue to use if for any functionality we can, replacing existing code in CLDR::Number: is_nan, is_inf, is_pos, is_neg, etc.

locales should inherit from defined parent locales when available

Right now, the inheritance works like zh-Hant-MOzh-Hantzhroot, but Part 1 Core §4.1.1 Parent Locales defines exceptions in the LDML for different parents.

For example:

 <parentLocale parent="zh_Hant_HK" locales="zh-Hant-MO"/>

This would modify the inheritance to zh-Hant-MOzh-Hant-HKzh-Hantzhroot.

Others are defined with a parent of root to skip normal steps altogether. The most notable problem with the current inheritance is that es-US (US Spanish), es-MX (Mexican Spanish), es-CR (Costa Rican Spanish), etc., inherit directly from es (European Spanish) instead of es-419 (Latin American Spanish).

improve docs for a broader audience

tl;dr: Let’s improve the docs! Please add doc requests or suggestions in the comments here.

The first goal of this project was to implement the standardized Unicode CLDR–based localized number formatting defined in UTS #35, Part 3: Numbers. Much of the CLDR::Number documentation, however, does not go into detail to describe functionality to developers without existing familiarity with the CLDR. This project shouldn’t require external knowledge in order to use it. One problem is that it allows for a lot of advanced customization that most developers will never need to use or know about when they can instead depend on the defaults provided for the requested locale (and currency for prices). Perhaps the docs should be split into 100% self-contained intro-level with more examples, and advanced-level with all the gritty options and external references. These days I write much more documentation for developers than actual coding, and while I have less time for maintaining my CPAN modules, I’d like to commit some time to improve these docs.

Thanks to @Ovid for bringing this to my attention:

maximum integer digits

Implement the functionality supplied by the maximum_integer_digits attribute, which already exists as a stub. There doesn’t appear to be a symbol associated with this.

UTS #35, Part 3, §3.3:

If the number of actual integer digits exceeds the maximum integer digits, then only the least significant digits are shown. For example, 1997 is formatted as 97 if the maximum integer digits is set to 2.

remove Math::BigFloat for Inf/NaN checking

We started using Math::BigFloat in CLDR::Number v0.14 [issue #45] to check for infinity, NaN, and negatives, but this addition has created many failing test reports:

http://matrix.cpantesters.org/?dist=CLDR-Number+0.14

It turns out that Perl 5.22 overhauled infinity and NaN values to be more consistent across platforms and operations, including stringifying to Inf and NaN instead of the previous inf and nan; however, Math::BigFloat doesn’t understand those titlecased values and treats them both as NaN. We’re better off performing the checks ourselves for now, as well as submitting an issue for the Math::BigInt project.

add algorithmic (non-decimal) numbering systems

We now support non-Latin (latn) numbering systems, but only decimal systems, not algorithmic systems like hant (Traditional Chinese Numerals), hebr (Hebrew Numerals), roman (Roman Numerals), etc.

Using Locale::CLDR corrupts CLDR::Number

In a project I am using CLDR::Number for quite some time to format numbers in the right locale.

Now I want to use Locale::CLDR to get country names in the correct language. However, as soon as I use Locale::CLDR, formatting an integer number via CLDR::Number fails with the message:
Can't locate object method "ffround" via package "Math::BigInt" at <path_to>/perllib/CLDR/Number/Role/Format.pm line 260

I can easily reproduce this using the following script:

#!/usr/bin/perl

use strict;

use CLDR::Number;
use Locale::CLDR;

my $cldr = CLDR::Number->new(locale => 'en');
my $formatter = $cldr->decimal_formatter(minimum_fraction_digits => 2, maximum_fraction_digits => 2);
print 'Success: ', $formatter->format(15.23), "\n";
print 'Fail: ', $formatter->format(42.0), "\n";

Here, the formatting of the number 42 will fail with the indicated message. As soon as I remove the line use Locale::CLDR, the formatting works as expected.

Do you know why using Locale::CLDR causes CLDR::Number to break? I know that the latter is a somewhat older module, but I do not want to let go of it. If there is a more up-to-date module with a similar interface as CLDR::Number, then I will definitely check it out.

deprecate mutable locales

The locale attribute being mutable has caused additional code, complexity, and bugs. The problem is that it is a rw attribute that sets a dozen or so other rw attributes. It's difficult to maintain these inherited attributes that should be lazy, publicly writable, and change based on changes to locale. The solution is to change locale from rw to ro. This is backward-incompatible, but there are no known real-world uses of a mutable locale other than convenience in unit tests and examples.

  1. Publicly announce upcoming deprecation of the locale method used as a setter and request feedback.
  2. Document the deprecation in the next release of CLDR::Number.
  3. Warn when mutating the locale in a further release.
  4. Finally, change the locale from rw to ro and remove related code.

Comments and suggestions highly appreciated!

round half-even with rounding increment

By default we use Math::BigFloat for rounding and round in half-even mode. If a rounding increment greater than 1 is provided in the pattern or via the rounding_increment attribute, we instead use Math::Round::nearest because it supports rounding increments; however, it does not support half-even rounding, which I believe we should be performing along with rounding increments. We need to investigate alternatives and possibly ask for clarification on the CLDR mailing list.

See also issue #30.

quiet down expected warnings in tests

Use Test::Warnings so we don't actually warn to STDERR while running tests.

Here's the only current problem:

t/inheritance.t ....... ok
default_locale 'xx' is unknown at (eval 36) line 44.

escaped quoting bug in Perl v5.8.8

Escaped quotes are being returned in formats as \xF7\xB0\x80\x84 (utf8-encoded \x{1F0000}) instead of the proper '. This is happening in all two CPAN Testers’ reports for Perl v5.8.8 and no other versions. The other reports from v5.8.x are v5.8.5 and v5.8.9, which do not have this problem.

Here are the related CPAN Testers’ reports:

Here are the three failing tests, which are the same in both reports:

#   Failed test 'single quote itself'
#   at t/from_uts35.t line 57.
#          got: '1 o÷°€„clock'
#     expected: '1 o'clock'
# Looks like you failed 1 test of 41.
t/from_uts35.t ........ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/41 subtests 

#   Failed test at t/quoting.t line 16.
#          got: '÷°€„123÷°€„'
#     expected: ''123''

#   Failed test at t/quoting.t line 17.
#          got: '#÷°€„#'
#     expected: '#'#'
# Looks like you failed 2 tests of 7.
t/quoting.t ........... 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/7 subtests

change internal placeholder non-Unicode codepoints to PUA

Change non-Unicode codepoints to Private Use Area codepoints. These are internally used as placeholders. We're currently using U+1F0000, U+1F0001, U+1F0002, U+1F0003, and U+1F0004, but this caused bug #20, which required a hacky workaround.

Tests fail (with latest Moo?)

There are new test failures — see http://www.cpantesters.org/cpan/report/487b0514-e060-11e5-a971-eac272d7c31d for a sample.

Statistical analysis from test failures generated on my machine suggests that the problem is caused by the latest Moo (negative theta is bad):

****************************************************************
Regression 'mod:Moo'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='const']           1.0000          0.0000    30849180474401392.00
[1='eq_1.007000']             0.0000          0.0000       0.00
[2='eq_2.000001']             0.0000          0.0000       1.98
[3='eq_2.000002']             0.0000          0.0000       3.36
[4='eq_2.001000']            -1.0000          0.0000    -28977759259709780.00

R^2= 1.000, N= 74, K= 5
****************************************************************

format inf, -inf, and nan

Perl treats inf, -inf, and nan as numbers; CLDR has formats for infinity, nan, and the negative sign; so let's format them appropriately.

support spelled-out currencies

Add support for spelled-out currencies using the unitPattern and displayName with a count attribute. For example, 5000 JPY (Japanese Yen) in ja (Japanese) would be 5,000 円 (as opposed to ¥5,000), which uses the unitPattern {0} {1} and displayName with the count other. See UTS #35, Part 3, §4: Currencies for details.

Review the ICU API for this feature and determine what attribute should be used to enable it. Also consider how to best store and load the data because it will take much more memory than the other currency data.

This feature has been requested by users.

support CLDR v27

CLDR v25 was released today:
http://unicode-inc.blogspot.com/2014/03/cldr-version-25-released.html

The changes are primarily structural in nature and very few of these changes affect numbers, while none of these structural changes affect the implemented portions of CLDR::Number.

Here are the locale data changes that affect us:

  • new locales fy (West Frisian), fy-NL, ug (Uyghur), ug-Arab, ug-Arab-CN, prg (Prussian)
  • data improvements for official languages
  • number symbol fixes

Additionally there is "Better locale matching, with better fallbacks; likely subtags for regions; added scripts for various languages" but our locale matching and fallbacks were already rather minimal. We should obviously use the new version when implementing matching/fallback improvements.

add minimum grouping digits

Minimum grouping digits were added to the spec in CLDR v26 (#33). LDML stores the related value as minimumGroupingDigits and we should add the minimum_grouping_digits attribute.

http://www.unicode.org/reports/tr35/tr35-numbers.html#Number_Elements

The minimumGroupingDigits can be used to suppress groupings below a certain value. This is used for languages such as Polish, where one would only write the grouping separator for values above 9999. The minimumGroupingDigits contains the default for the locale.

http://cldr.unicode.org/translation/numbering-systems

In some languages, the grouping separator is suppressed in certain cases. For example, see china-auf-wachstumskurs.gif, where there is a grouping separator in 12 080 but not in 4720. The minimumGroupingDigits determines what the default for a locale is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.