tc39 / proposal-intl-displaynames Goto Github PK
View Code? Open in Web Editor NEWGet localized display names for languages, scripts, regions and others. https://tc39.github.io/proposal-intl-displaynames/
License: MIT License
Get localized display names for languages, scripts, regions and others. https://tc39.github.io/proposal-intl-displaynames/
License: MIT License
It seems to me that at the moment DisplayNames
doesn't handle abbreviated
length of the field.
Is there a reason we decided to skip it or just an omission?
The current spec include type: "script" . After careful study of all the usage on the web, I found this feature is not really needed and cannot see a good use case of using it. I propose we remove such support from the current spec if no one object. This will help us to reduce the size impact of locale data. We can add it back if later if someone show a strong use case of it.
I couldn't find an issue for that, so please, feel free to dupe this against it if there is one.
In the process we trimmed down the scope of this API, and I'd like to verify that the current intention is to later add weekday
and month
display names.
Intl.DateTimeFormat API support calendar option, we should provide display names for these calendar
TypeError
should be thrown by step 4. of Intl.DisplayNames.prototype.of(code)
when Type(code)
is not String
, Number
or Object
. Step 5. invokes ToString(code)
when Type(code)
is Object
only. So, step 6. can be reached by a numeric code
but IsValidCodeForDisplayNames()
operation invoked by this step does not make sense for a numeric code
. It does not make sense to ask whether some number (unlike its textual representation) matches some production rule. Moreover, IsWellFormedCurrencyCode()
operation states explicitly that its argument must be string.
It seems that ToString(code)
in step 5. of Intl.DisplayNames.prototype.of(code)
should be invoked when Type(code)
is Number
as well. In fact, the combination of steps 4. and 5. looks a bit strange to me. Why does the specification takes special care of some types of code
. Wouldn't it be simpler and more natural to convert code
to String
using ToString(code)
unconditionally?
Similar to #80 but under different constraints
In #77 (comment)
@anba suggested
"
Region (and scripts) subtags should also get canonicalised to replaced outdated subtags with their preferred value.
"
This issue track the "script part" only since the issue with region is different.
I have concern about this. There are no standalone pre-defined process in UTS35 for this. The process for the script subtag within unicode_language_id stated in https://unicode-org.github.io/cldr/ldml/tr35.html#Canonical_Unicode_Locale_Identifiers is part of the the whole process. And also there is only one entry currently in
https://github.com/unicode-org/cldr/blob/master/common/supplemental/supplementalMetadata.xml
<scriptAlias type="Qaai" replacement="Zinh" reason="deprecated"/>
Hi I'm not sure if displaynames is the right place to ask this or is already supported by in another Intl.x API . I believe that "Months" or "Weekdays" should be supported like :
Actual Behaviour
var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime'});
console.log(dateTimeNames.of('monday')); // "Monday"
Nice to have Behaviour
var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime'});
console.log(dateTimeNames.of('months')); // ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
console.log(dateTimeNames.of('weekdays')); // ["Sunday", "Monday", ...]
OR
var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime' style: 'short'});
console.log(dateTimeNames.of('months')); // ["Jan", "Feb", "Mar" ... ]
console.log(dateTimeNames.of('weekdays')); // ["Sun", "Mon", ...]
This is spin off from #8 . I think it is better to have one request per issue to make the discussion easier.
https://tc39.es/proposal-intl-displaynames/#sec-Intl.DisplayNames-internal-slots
The following text stated there are "six" display names style but the reality is there are "nine"
[[LocaleData]].[[<locale>]] must have a [[types]] field for all locale values locale.
The value of this field must be a Record, which must have fields with the names of
the six display name types: "language", "region", "script", "currency", "weekday",
"month", "quarter", "dayPeriod", and "dateTimeField".
We should fix this.
I was talking to @pedberg-icu yesterday about how to index month names for the purpose of the Intl.DisplayNames API. He suggested that it could make sense to add an optional second argument to the .of()
method for type: "month"
to indicate whether the month is a leap-month. Opening an issue to continue this discussion.
When type
is set to "dateSymbol"
, the codes
are things like the days of the week and months of the year. These are things:
Because of 1 and 2, the case for having them in the API isn't very strong. Item 3 shows a major drawback to including them: their presence may entice uninformed developers to try to build their own date formatters, which would almost certainly be less correct and less performant than the date formatters already present in JavaScript.
https://tc39.es/ecma402/#conformance lists options which are allowed to have implementation-defined behaviour. For Intl.DisplayNames
, the "style" and "type" options should probably be listed there. I'm not yet sure if it makes sense to allow implementation-defined behaviour for the "fallback" option, because I don't exactly know what other kind of fallback should happen here.
CLDR has display names for measurement units. Maybe expose them through Intl.DisplayNames?
Is there a reason the type field does not follow the recommendations of https://w3ctag.github.io/design-principles/#casing-rules? Does JavaScript have different rules?
Most Intl types have a section under internal slots of the constructor that describes how the locale data is organized. E.g., see this section for DateTimeFormat. I don't see this description for DisplayNames.
The proposal currently has the syntax
symbolNames = new Intl.DisplayNames(
['en'], {type: 'dateSymbol', style: 'short'});
symbolNames.of('saturday'); // => "Sa"
A few questions/comments:
type: "date-symbol"
be better?currencyDisplay: "short"
. Would symbolDisplay: "short"
or just display: "short"
be better? Note: dateStyle/timeStyle uses the word "style".Intl.DisplayNames.prototype.of
was recently (#67) changed to use plain ToString
to convert the input. This makes Intl.DisplayNames
inconsistent with CanonicalizeLocaleList, which only allows String
and Object
inputs to avoid accepting NaN
as "nan"
.
js> new Intl.DisplayNames("en", {type: "language"}).of(NaN)
"Min Nan Chinese"
But we didn't care about this case anymore when specifiying Intl.Locale
:
js> new Intl.Locale("und", {language: NaN}).toString()
"nan"
so maybe unconditionally calling ToString
is okay. We should just make sure everybody is on board with this decision.
We currently don't have any capitalization related options exposed in any of our Intl APIs.
I don't know the exact reason for this decision, but IIRC it has something to do with potentially high payload required to provide proper capitalization across all supported locales.
If we add it here, it would be good to evaluate this cost again, and verify if we want to also add it to DateTimeFormat (weekDay standalone for example), RelativeTimeFormat etc.
The proposal as is includes regions, locales, and language display names. Would it include scripts and variants too?
https://github.com/unicode-cldr/cldr-localenames-modern/tree/master/main/en
--
back reference: brawer/proposal-intl-displaynames#6
I may have missed it, but it might be worth considering to offer a convenience method(s) for going from specific locales other than English.
This could roughly take the form of displayNames.for(‹name›, ‹locale›)
with more or less verbose parameters where applicable.
@ljharb wrote:
https://tc39.es/proposal-intl-displaynames/#sec-Intl.DisplayNames.prototype.of step 4 needs an Oxford comma after "Number"
It would be good to explain this in the README.
This is spin off from #8 . I think it is better to have one request per issue to make the discussion easier.
Create Documentation for Intl.DisplayNames
MDN Pages :
Interactive Examples MDN :
Browser compat-data :
Just notice CLDR (and therefore ICU) has different display names for "month", "weekday", "quarter" and "dayPeriod" for different "calendar"
Therefore, we should
In calendar systems with month names that are not Gregorian month names, like Hebrew, how do you get the month names? We have the data already in Intl.DateTimeFormat:
new Date().toLocaleDateString("en-us-u-ca-hebrew", { month: "long" })
// "Tamuz"
Language, script, and region tags should be canonicalised, because
For example when new Intl.DisplayNames("en", {type: "language"}).of("de-DD")
returns "German (Germany)"
, for consistency we should then ensure that new Intl.DisplayNames("en",{type: "region"}).of("DD")
returns "Germany"
.
Do we really want to throw an error if data is not available, or just return null? If we return null, then we can also use that behavior when exporting a list.
Unless the spec explicitly lists which region codes have to be supported, for example, I do not like the idea of throwing an exception here, because then it means that the normal, expected way to call the function is to wrap it in a try-catch just in case the implementation does not have the needed data.
Unless we envision Intl.DisplayNames
to be primarily used to retrieve the localised name for language tags, it may make sense to change "type" to a mandatory option instead of defaulting it to "language"
in the Intl.DisplayNames
constructor.
Temporal uses "dayOfWeek" instead of "weekday" as the getter for the ISO 8601 weekday number. We should consider using consistent naming (either changing Temporal or Intl.DisplayNames). @pipobscure @gibson042
In #77 (comment)
@anba suggested
"
Region (and scripts) subtags should also get canonicalised to replaced outdated subtags with their preferred value.
"
This issue track the "region part" only since the issue with script is different.
I have concern about this. (canonicalize the region code). There are no pre-defined process in UTS35 for this. The process for the region subtag within unicode_language_id stated in https://unicode-org.github.io/cldr/ldml/tr35.html#Canonical_Unicode_Locale_Identifiers depends on the language code (and script code if present) while there are multiple territories listed in the replacement attribute of territoryAlias.
https://unicode.org/reports/tr35/tr35-general.html#Display_Name_Elements
example nl-BE
= "Flemish", not just "Dutch (Belgium)"
An analysis of a payload cost for SpiderMonkey to ship this API - https://bugzilla.mozilla.org/show_bug.cgi?id=1557727#c7
The codes are "monday"
, "tuesday"
, ..., "january"
, "february"
, ...
Those are long and easy to make typos. They are maybe also not as friendly to non-English speakers. Did you consider shortening them to their common 3-letter abbreviations?
Weekdays:
Months:
Several Intl APIs accept numberingSystem as option, we should provide display names of these numberingSystem
_record_.[[<_fieldname_>]]
Currency? We already have this due to NumberFormat
Time zone names? We already have this data due to DateTimeFormat
Emoji names? Widely requested, cc @nolanlawson
The current proposal specifies ofLanguage
as accepting either a language or a full language tag (language-script-region
).
I'm not sure why this decisions has been made but it seems a bit counter-intuitive and an outlier.
Why not ofLocale
for whole locale, and ofLanguage
just for languages?
Currently, ECMA262 define 0-based, not 1-based index for month and weekday in
https://ecma-international.org/ecma-262/#sec-todatestring-day-names
and
https://ecma-international.org/ecma-262/#sec-todatestring-month-names
But in https://tc39.es/proposal-intl-displaynames/#sec-isvalidweekdaycode
it defines https://tc39.es/proposal-intl-displaynames/#table-validcodeforweekday
And in the spec text it refer to this table as
2. If weekday is listed in Table 1, return true.
We should remove this Table 1 but instead change the spec text to refer to the Table 49 in ECMA262 and therefore shift to 0-based index instead. This will make the spec cleaner and align with ECMA262.
Same issue for month.
and in https://tc39.es/proposal-intl-displaynames/#sec-isvalidmonthcode
we defines https://tc39.es/proposal-intl-displaynames/#table-validcodeformonth
@sffc @leobalter @rwaldron @zbraniecki @littledan @mbeck @ljharb @zenparsing
Anyone know how should I put into the source file to referring to a table in ECMA262? Could someone give me an example?
This API has been originally designed to expose data that is already carried to support ECMA402 Edition 3.
Items like names of months or week days in gregorian calendar are carried by all environments that implement Intl.DateTimeFormat
.
Two other types of data are now optionally carried in potentially limited form - timezone names and currency names. DateTimeFormat can use a human readable name such as America/Los_Angeles
and US Dollar
or fallback on the codename GMT-7
and USD
.
The new iteration of the API adds tables and columns that currently are not in use by ECMA402:
year
, month
, day
, hour
, second
etc.)and more are considered:
I believe that this API in particular has the potential to continue increasing in size as new fields will be requested.
As the scope of the API increases, it brings back the concern raised by Apple long time ago, and shared by Mozilla - that as we increase the sufrace of ECMA402, the data package carried by the engine aiming to implement ECMA402 will start increasing.
While each of those data fields may not be large on its own, carrying data for ~100 locales, and often with three or more styles (short/medium/long etc.) may make it impossible for certain implementations (for example aiming at IoT or other low-capacity environments).
I'm not sure if we have a good solution for how to enable such engine to implement the API in a way that is compatible with how users will use it, while retaining ability to opt out or limit the data fields carries by it.
For example, just returning null for unavailable fields will likely lead to developers assuming that data carried by the most popular engine in the most common environment is available everywhere and write code such as:
let {yearTitle, monthTitle, dayTitle} = (new Intl.DisplayNames(navigator.locales, {type: "dateField")).of(["year", "month", "day"]);
th1.textContent = yearTitle;
th2.textContent = monthTitle;
th3.textContent = dayTitle;
While this would work with currency, language and timezone, where the input code is a valid fallback output, I'm concerned that for dateFields
, emojiNames
, regionNames
and dateSymbols
, we do not have ability to fallback on the input as output.
If we don't design the API to encourage taking into account a scenario where the data is not present, developers will likely not consider it in their work.
I don't have a ready solution to the problem, but I believe we should consider it as part of this API design.
While WebKit/JavaScriptCore team implements Intl.DisplayNames
, we noticed that V8 and SpiderMonkey do not accept _
separator in unicode_language_id while it is defined in UTS35 (https://unicode.org/reports/tr35/#unicode_language_id)
sep | = [-_] ;
V8
V8 version 8.5.153
d8> new Intl.DisplayNames(['en'], {type: 'language', fallback: 'none'}).of('en_US')
(d8):1: RangeError: invalid_argument
SpiderMonkey
js> addIntlExtras(Intl)
js> new Intl.DisplayNames(['en'], {type: 'language', fallback: 'none'}).of('en_US')
typein:2:70 RangeError: invalid value "en_US" for option language
Stack:
@typein:2:70
Is this behavior something intentional one and does it need to be integrated into this spec? Or is it just an implementation issue?
@littledan wrote in #11
This design decision still seems strange to me. It was never clarified why Array.prototype.map doesn't make sense for these sorts of cases. I think getCanonicalLocales is different, since a locale fallback list is a fundamental concept all over ECMA-402, whereas a list of things that you want to apply .of to is not.
@FrankYFTang replied
My concern of the "return a map" approach is the unnecessary waste of run time memory and performance. Let say a JavaScript engine know about the names of 150 languages, 240 regions, 90 scripts, and the application only need to get 6 display names for "en-US", "en-GB", "zh-Hant", "ja", "ja-Latn", "ko". For the current approach, the app make one call passing in [ "en-US", "en-GB", "zh-Hant", "ja", "ja-Latn", "ko"] and the JavaScript engine create an array and insert 6 string and return. If we take the "return a map" approach, the JavaScript engine then need to create an Map and then create 150x240x90 x 2 = 6480000 strings and return. (x2 because you need a key and a value into the map) Even if the JavaScript engine only deal with 150 languages, it still need to create 150 string to insert into the returning map. Of course, this map can be cached per locale, but it will then still use runtime memory. And consider the run time performance of creating these string which most of the time the caller (the app) won't care to know.
As currently spec'ed, Intl.DisplayNames.prototype.of
returns code
in the input case for the "code" fallback. For example:
js> new Intl.DisplayNames("en", {type: "language", fallback: "code"}).of("BB")
"BB"
Returning the "code" fallback in the canonical case (*) for the requested type is probably a better choice.
(*) That means lower-case for language
subtags, title-case for script
subtags, and upper-case for region
subtags and currency codes.
This will make it easier for people to find high-level docs, and also might make it unnecessary to duplicate some of the description of motivation and scope (especially since some of this may change).
Internationalized websites already have large catalogs of strings that they need translated into all the languages the site supports. It's unclear what is different about these specific strings to make them handled differently than the site's existing workflow.
Currently, the proposed API focused on retrieving a single language/region/script/locale name.
When designing the API for Mozilla internal use, I was discussing it with the ECMA402 task group and we came to conclusion that in almost all cases we can imagine, the user will want to retrieve a high number of items at once.
For that reason, similarly to what we did with getCanonicalLocales
we recommended the input argument to be an array (or be ToArray
ed), rather than a single string.
I suggest we do the same here.
Currently, the value of code for type: "dayPeriod" is either "am" or "pm" as string. There were discussion to move this to number 0 and 1 or 1 and 2. This issue track the discussion.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.