patch / cldr-number-pm5 Goto Github PK
View Code? Open in Web Editor NEWLocalized number formatters using the Unicode CLDR
Home Page: https://metacpan.org/pod/CLDR::Number
License: Other
Localized number formatters using the Unicode CLDR
Home Page: https://metacpan.org/pod/CLDR::Number
License: Other
Suggested by @aarondcohen:
[13:43] Aaron Cohen: as an added benefit, you could break CLDR::Number::Data::* up by locale
[13:43] Aaron Cohen: so epople would load less into memory if they aren't using the other locales
Although the number-related locale data is relatively small per locale, the aggregate is increasingly large with each CLDR release. Another idea is to remove any data from a locale that is the same as what would already be inherited.
So we can define $N
, $P
, $C
, $M
, and $Q
all in one place.
Users occasionally report that the wrong formatting is used for several non-existant locales including Mexican English (en-MX
) and Brazilian Spanish (es-BR
). We should document that since these locales don’t exist, they would fall back to English (en
) and Spanish (es
), respectively.
Note that I also plan on bringing up the issue to the CLDR Technical Committee that es-XX
, where XX
is any country within Latin America (419
) should fall back to es-419
even if es-XX
is not a valid locale. This would, however, require a new structure added to the LDML spec unless a locale was created for each combination of es
with each remaining country within 419
.
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
CLDR::Number::Role::Base already has the length
attribute, which is not currently used. Valid lengths are full
, long
, medium
, short
, and narrow
.
The desired functionality is described in UTS #35:
We should create a new test file: t/length.t
The cldr29
branch was generated with the CLDR v29-beta1:
https://github.com/patch/cldr-number-pm5/compare/cldr29
Everything looks good and no tests were broken. When the CLDR v29 is officially released, we can regenerate, document in Changes
, and release to CPAN.
See also: CLDR 29 Release Note (DRAFT)
Most releases of CLDR::Number have had inconsistent but common Moo::Role-related test failures in Perl v5.8.1 through v5.8.3. The oldest version of Perl that has not been known to have this problem is v5.8.4, although there are very few reports on that version.
We should either figure out the problem and fix it, or raise the minimum version of Perl from v5.8.1 (September 2003) to v5.8.4 (April 2004), which I would not be against.
Test reports:
http://matrix.cpantesters.org/?dist=CLDR-Number+0.19
Typical output:
Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Use of uninitialized value in method lookup at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
Can't locate object method "is_role" via package "Moo::Role" at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Moo/Role.pm line 138.
BEGIN failed--compilation aborted at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number/Role/Format.pm line 13.
Compilation failed in require at /home/njh/perl5/perlbrew/perls/perl-5.8.1/lib/site_perl/5.8.1/Module/Runtime.pm line 313.
Compilation failed in require at /home/njh/.cpan/build/CLDR-Number-0.19-M6Ajmp/blib/lib/CLDR/Number.pm line 32.
We've been getting a lot of reports like this with 3 failing tests since the first CPAN upload this morning:
http://www.cpantesters.org/cpan/report/f66b08b0-6612-11e3-8a8a-6b1ebd322218
In fact, they all seem to be failing:
http://matrix.cpantesters.org/?dist=CLDR-Number+0.00_02
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
As per the CLDR spec (see below), default rounding is half-even. There is no current way to change the rounding mode. We use Math::BigFloat, which supports the following modes: even
, odd
, +inf
, -inf
, zero
, trunc
, common
. Let's add a rounding_mode attribute and decide if we should use the same modes and names as Math::BigFloat.
An implementation may allow the specification of a rounding mode to determine how values are rounded. In the absence of such choices, the default is to round "half-even", as described in IEEE arithmetic. That is, it rounds towards the "nearest neighbor" unless both neighbors are equidistant, in which case, it rounds towards the even neighbor. Behaves as for round "half-up" if the digit to the left of the discarded fraction is odd; behaves as for round "half-down" if it's even. Note that this is the rounding mode that minimizes cumulative error when applied repeatedly over a sequence of calculations.
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
From @JCEmmons:
To: cldr-users
Subject: Preliminary JSON available for release 28
From: John Emmons
Date: Tue, 1 Sep 2015 00:09:28 -0500A preliminary version of the JSON for the upcoming CLDR release 28 is now
available on github for testing. Please see
https://github.com/unicode-cldr/cldr-json for details. Any errors or
omissions should be reported via CLDR trac by filing a new ticket at
http://unicode.org/cldr/trac/newticket
We’re already using Math::BigFloat in most situations for rounding using the round_mode
and ffround
methods. Let’s continue to use if for any functionality we can, replacing existing code in CLDR::Number: is_nan
, is_inf
, is_pos
, is_neg
, etc.
Handle undef
by warning and returning undef
like core Perl functions.
Right now, the inheritance works like zh-Hant-MO
→ zh-Hant
→ zh
→ root
, but Part 1 Core §4.1.1 Parent Locales defines exceptions in the LDML for different parents.
For example:
<parentLocale parent="zh_Hant_HK" locales="zh-Hant-MO"/>
This would modify the inheritance to zh-Hant-MO
→ zh-Hant-HK
→ zh-Hant
→ zh
→ root
.
Others are defined with a parent of root
to skip normal steps altogether. The most notable problem with the current inheritance is that es-US
(US Spanish), es-MX
(Mexican Spanish), es-CR
(Costa Rican Spanish), etc., inherit directly from es
(European Spanish) instead of es-419
(Latin American Spanish).
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
Three CPAN Testers’ reports are reporting massive test failures that may be related to Moo v1.000006 and v1.000007. More investigation is needed.
Here are the related CPAN Testers’ reports:
tl;dr: Let’s improve the docs! Please add doc requests or suggestions in the comments here.
The first goal of this project was to implement the standardized Unicode CLDR–based localized number formatting defined in UTS #35, Part 3: Numbers. Much of the CLDR::Number documentation, however, does not go into detail to describe functionality to developers without existing familiarity with the CLDR. This project shouldn’t require external knowledge in order to use it. One problem is that it allows for a lot of advanced customization that most developers will never need to use or know about when they can instead depend on the defaults provided for the requested locale (and currency for prices). Perhaps the docs should be split into 100% self-contained intro-level with more examples, and advanced-level with all the gritty options and external references. These days I write much more documentation for developers than actual coding, and while I have less time for maintaining my CPAN modules, I’d like to commit some time to improve these docs.
Thanks to @Ovid for bringing this to my attention:
Started this in commit fdc3e57.
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
Implement the functionality supplied by the maximum_integer_digits
attribute, which already exists as a stub. There doesn’t appear to be a symbol associated with this.
If the number of actual integer digits exceeds the maximum integer digits, then only the least significant digits are shown. For example, 1997 is formatted as
97
if the maximum integer digits is set to 2.
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
May be easier to start out with numbering systems that have a @type
of numeric
as well as a value for @digits
.
See also: http://www.unicode.org/repos/cldr-aux/json/22.1/supplemental/numberingSystems.json
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
We started using Math::BigFloat in CLDR::Number v0.14 [issue #45] to check for infinity, NaN, and negatives, but this addition has created many failing test reports:
http://matrix.cpantesters.org/?dist=CLDR-Number+0.14
It turns out that Perl 5.22 overhauled infinity and NaN values to be more consistent across platforms and operations, including stringifying to Inf
and NaN
instead of the previous inf
and nan
; however, Math::BigFloat doesn’t understand those titlecased values and treats them both as NaN. We’re better off performing the checks ourselves for now, as well as submitting an issue for the Math::BigInt project.
We now support non-Latin (latn
) numbering systems, but only decimal systems, not algorithmic systems like hant
(Traditional Chinese Numerals), hebr
(Hebrew Numerals), roman
(Roman Numerals), etc.
In a project I am using CLDR::Number
for quite some time to format numbers in the right locale.
Now I want to use Locale::CLDR
to get country names in the correct language. However, as soon as I use Locale::CLDR
, formatting an integer number via CLDR::Number
fails with the message:
Can't locate object method "ffround" via package "Math::BigInt" at <path_to>/perllib/CLDR/Number/Role/Format.pm line 260
I can easily reproduce this using the following script:
#!/usr/bin/perl
use strict;
use CLDR::Number;
use Locale::CLDR;
my $cldr = CLDR::Number->new(locale => 'en');
my $formatter = $cldr->decimal_formatter(minimum_fraction_digits => 2, maximum_fraction_digits => 2);
print 'Success: ', $formatter->format(15.23), "\n";
print 'Fail: ', $formatter->format(42.0), "\n";
Here, the formatting of the number 42 will fail with the indicated message. As soon as I remove the line use Locale::CLDR
, the formatting works as expected.
Do you know why using Locale::CLDR
causes CLDR::Number
to break? I know that the latter is a somewhat older module, but I do not want to let go of it. If there is a more up-to-date module with a similar interface as CLDR::Number
, then I will definitely check it out.
The locale
attribute being mutable has caused additional code, complexity, and bugs. The problem is that it is a rw
attribute that sets a dozen or so other rw
attributes. It's difficult to maintain these inherited attributes that should be lazy, publicly writable, and change based on changes to locale
. The solution is to change locale
from rw
to ro
. This is backward-incompatible, but there are no known real-world uses of a mutable locale other than convenience in unit tests and examples.
locale
method used as a setter and request feedback.rw
to ro
and remove related code.Comments and suggestions highly appreciated!
By default we use Math::BigFloat for rounding and round in half-even mode. If a rounding increment greater than 1 is provided in the pattern or via the rounding_increment attribute, we instead use Math::Round::nearest because it supports rounding increments; however, it does not support half-even rounding, which I believe we should be performing along with rounding increments. We need to investigate alternatives and possibly ask for clarification on the CLDR mailing list.
See also issue #30.
Add number parsers under CLDR::Number::Parse as described in UTS #35, Part 3, §7.
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
Some older perls on some systems don’t support inf
and nan
. Here are a few failing test reports from CLDR::Number v0.12.
I think we should just test for support in the test file t/inf-nan.t
and skip with a diag
warning when not supported, as well as documenting that the feature depends on perl’s support for the given system.
Use Test::Warnings so we don't actually warn to STDERR while running tests.
Here's the only current problem:
t/inheritance.t ....... ok
default_locale 'xx' is unknown at (eval 36) line 44.
Escaped quotes are being returned in formats as \xF7\xB0\x80\x84
(utf8-encoded \x{1F0000}
) instead of the proper '
. This is happening in all two CPAN Testers’ reports for Perl v5.8.8 and no other versions. The other reports from v5.8.x are v5.8.5 and v5.8.9, which do not have this problem.
Here are the related CPAN Testers’ reports:
Here are the three failing tests, which are the same in both reports:
# Failed test 'single quote itself'
# at t/from_uts35.t line 57.
# got: '1 o÷°€„clock'
# expected: '1 o'clock'
# Looks like you failed 1 test of 41.
t/from_uts35.t ........
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/41 subtests
# Failed test at t/quoting.t line 16.
# got: '÷°€„123÷°€„'
# expected: ''123''
# Failed test at t/quoting.t line 17.
# got: '#÷°€„#'
# expected: '#'#'
# Looks like you failed 2 tests of 7.
t/quoting.t ...........
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/7 subtests
Change non-Unicode codepoints to Private Use Area codepoints. These are internally used as placeholders. We're currently using U+1F0000, U+1F0001, U+1F0002, U+1F0003, and U+1F0004, but this caused bug #20, which required a hacky workaround.
There are new test failures — see http://www.cpantesters.org/cpan/report/487b0514-e060-11e5-a971-eac272d7c31d for a sample.
Statistical analysis from test failures generated on my machine suggests that the problem is caused by the latest Moo (negative theta is bad):
****************************************************************
Regression 'mod:Moo'
****************************************************************
Name Theta StdErr T-stat
[0='const'] 1.0000 0.0000 30849180474401392.00
[1='eq_1.007000'] 0.0000 0.0000 0.00
[2='eq_2.000001'] 0.0000 0.0000 1.98
[3='eq_2.000002'] 0.0000 0.0000 3.36
[4='eq_2.001000'] -1.0000 0.0000 -28977759259709780.00
R^2= 1.000, N= 74, K= 5
****************************************************************
Issue imported from the TODO:
https://github.com/perl-cldr/cldr-number-perl5/blob/master/lib/CLDR/Number/TODO.pod
Add the significant_digits
attribute, the @
symbol in patterns, and associated functionality described in UTS #35, Part 3, §3.5.
Perl treats inf
, -inf
, and nan
as numbers; CLDR has formats for infinity, nan, and the negative sign; so let's format them appropriately.
Add support for spelled-out currencies using the unitPattern and displayName with a count attribute. For example, 5000 JPY (Japanese Yen) in ja (Japanese) would be 5,000 円
(as opposed to ¥5,000
), which uses the unitPattern {0} {1}
and displayName 円
with the count other
. See UTS #35, Part 3, §4: Currencies for details.
Review the ICU API for this feature and determine what attribute should be used to enable it. Also consider how to best store and load the data because it will take much more memory than the other currency data.
This feature has been requested by users.
Consider using Params::Validate. See also #22 for handling undef
.
CLDR v25 was released today:
http://unicode-inc.blogspot.com/2014/03/cldr-version-25-released.html
The changes are primarily structural in nature and very few of these changes affect numbers, while none of these structural changes affect the implemented portions of CLDR::Number.
Here are the locale data changes that affect us:
fy
(West Frisian), fy-NL
, ug
(Uyghur), ug-Arab
, ug-Arab-CN
, prg
(Prussian)Additionally there is "Better locale matching, with better fallbacks; likely subtags for regions; added scripts for various languages" but our locale matching and fallbacks were already rather minimal. We should obviously use the new version when implementing matching/fallback improvements.
Minimum grouping digits were added to the spec in CLDR v26 (#33). LDML stores the related value as minimumGroupingDigits
and we should add the minimum_grouping_digits
attribute.
http://www.unicode.org/reports/tr35/tr35-numbers.html#Number_Elements
The
minimumGroupingDigits
can be used to suppress groupings below a certain value. This is used for languages such as Polish, where one would only write the grouping separator for values above 9999. TheminimumGroupingDigits
contains the default for the locale.
http://cldr.unicode.org/translation/numbering-systems
In some languages, the grouping separator is suppressed in certain cases. For example, see china-auf-wachstumskurs.gif, where there is a grouping separator in
12 080
but not in4720
. TheminimumGroupingDigits
determines what the default for a locale is.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.