Giter Club home page Giter Club logo

proposal-intl-displaynames's Issues

Consider adding abbreviated length

It seems to me that at the moment DisplayNames doesn't handle abbreviated length of the field.
Is there a reason we decided to skip it or just an omission?

Removing display name of Script code

The current spec include type: "script" . After careful study of all the usage on the web, I found this feature is not really needed and cannot see a good use case of using it. I propose we remove such support from the current spec if no one object. This will help us to reduce the size impact of locale data. We can add it back if later if someone show a strong use case of it.

@gsathya @zbraniecki @littledan @sffc @jungshik @anab

Add "weekday" and "month" display names

I couldn't find an issue for that, so please, feel free to dupe this against it if there is one.

In the process we trimmed down the scope of this API, and I'd like to verify that the current intention is to later add weekday and month display names.

Suspicious handling of numeric code in Intl.DisplayNames.prototype.of()

TypeError should be thrown by step 4. of Intl.DisplayNames.prototype.of(code) when Type(code) is not String, Number or Object. Step 5. invokes ToString(code) when Type(code) is Object only. So, step 6. can be reached by a numeric code but IsValidCodeForDisplayNames() operation invoked by this step does not make sense for a numeric code. It does not make sense to ask whether some number (unlike its textual representation) matches some production rule. Moreover, IsWellFormedCurrencyCode() operation states explicitly that its argument must be string.

It seems that ToString(code) in step 5. of Intl.DisplayNames.prototype.of(code) should be invoked when Type(code) is Number as well. In fact, the combination of steps 4. and 5. looks a bit strange to me. Why does the specification takes special care of some types of code. Wouldn't it be simpler and more natural to convert code to String using ToString(code) unconditionally?

Should we map the code if the type is script ?

Similar to #80 but under different constraints
In #77 (comment)
@anba suggested
"
Region (and scripts) subtags should also get canonicalised to replaced outdated subtags with their preferred value.
"
This issue track the "script part" only since the issue with region is different.

I have concern about this. There are no standalone pre-defined process in UTS35 for this. The process for the script subtag within unicode_language_id stated in https://unicode-org.github.io/cldr/ldml/tr35.html#Canonical_Unicode_Locale_Identifiers is part of the the whole process. And also there is only one entry currently in
https://github.com/unicode-org/cldr/blob/master/common/supplemental/supplementalMetadata.xml

        <scriptAlias type="Qaai" replacement="Zinh" reason="deprecated"/>

Names of Months or Week days

Hi I'm not sure if displaynames is the right place to ask this or is already supported by in another Intl.x API . I believe that "Months" or "Weekdays" should be supported like :

Actual Behaviour

var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime'});
console.log(dateTimeNames.of('monday')); // "Monday"

Nice to have Behaviour

var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime'});
console.log(dateTimeNames.of('months')); // ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
console.log(dateTimeNames.of('weekdays')); // ["Sunday", "Monday", ...]

OR

var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime' style: 'short'});
console.log(dateTimeNames.of('months')); // ["Jan", "Feb", "Mar" ... ]
console.log(dateTimeNames.of('weekdays')); // ["Sun", "Mon", ...]

The number of types stated in the Internal Slot should be "nine" instead of "six"

https://tc39.es/proposal-intl-displaynames/#sec-Intl.DisplayNames-internal-slots
The following text stated there are "six" display names style but the reality is there are "nine"

[[LocaleData]].[[<locale>]] must have a [[types]] field for all locale values locale. 
The value of this field must be a Record, which must have fields with the names of
 the six display name types: "language", "region", "script", "currency", "weekday", 
"month", "quarter", "dayPeriod", and "dateTimeField".

We should fix this.

Indexing month names for leap month

I was talking to @pedberg-icu yesterday about how to index month names for the purpose of the Intl.DisplayNames API. He suggested that it could make sense to add an optional second argument to the .of() method for type: "month" to indicate whether the month is a leap-month. Opening an issue to continue this discussion.

Consider removing the dateSymbol type

When type is set to "dateSymbol", the codes are things like the days of the week and months of the year. These are things:

  1. That rarely (never?) change
  2. Are not plentiful; there are only a handful of them
  3. Are constituents of date formatters

Because of 1 and 2, the case for having them in the API isn't very strong. Item 3 shows a major drawback to including them: their presence may entice uninformed developers to try to build their own date formatters, which would almost certainly be less correct and less performant than the date formatters already present in JavaScript.

Unit display name?

CLDR has display names for measurement units. Maybe expose them through Intl.DisplayNames?

Naming of dateSymbol and style

The proposal currently has the syntax

symbolNames = new Intl.DisplayNames(
  ['en'], {type: 'dateSymbol', style: 'short'});
symbolNames.of('saturday'); // => "Sa"

A few questions/comments:

  1. We usually use kebab-case instead of camelCase for string values. Would type: "date-symbol" be better?
  2. Intl.NumberFormat uses the word "display" instead of "style" for the width: currencyDisplay: "short". Would symbolDisplay: "short" or just display: "short" be better? Note: dateStyle/timeStyle uses the word "style".
  3. Is there a better name than "dateSymbol"? I don't have any suggestions right now. It's just that the word "symbol" makes me thing of a symbol character, not a word.

Input conversion for language tags inconsistent with CanonicalizeLocaleList

Intl.DisplayNames.prototype.of was recently (#67) changed to use plain ToString to convert the input. This makes Intl.DisplayNames inconsistent with CanonicalizeLocaleList, which only allows String and Object inputs to avoid accepting NaN as "nan".

js> new Intl.DisplayNames("en", {type: "language"}).of(NaN)
"Min Nan Chinese"

But we didn't care about this case anymore when specifiying Intl.Locale:

js> new Intl.Locale("und", {language: NaN}).toString()
"nan"

so maybe unconditionally calling ToString is okay. We should just make sure everybody is on board with this decision.

Evaluate the cost of capitalization rules

We currently don't have any capitalization related options exposed in any of our Intl APIs.

I don't know the exact reason for this decision, but IIRC it has something to do with potentially high payload required to provide proper capitalization across all supported locales.

If we add it here, it would be good to evaluate this cost again, and verify if we want to also add it to DateTimeFormat (weekDay standalone for example), RelativeTimeFormat etc.

Docs(MDN) : Intl.DisplayNames

Create Documentation for Intl.DisplayNames

MDN Pages :

  • prototype
  • of
  • options

Interactive Examples MDN :

  • Example of usage

Browser compat-data :

  • Browser compat

How to get non-Gregorian month names?

In calendar systems with month names that are not Gregorian month names, like Hebrew, how do you get the month names? We have the data already in Intl.DateTimeFormat:

new Date().toLocaleDateString("en-us-u-ca-hebrew", { month: "long" })
// "Tamuz"

Canonicalise language, script, and region tags

Language, script, and region tags should be canonicalised, because

  1. This matches how ECMA-402 works for other API.
  2. ICU does this implicitly for some APIs resp. other ICU APIs require a canonicalised input to produce any result.

For example when new Intl.DisplayNames("en", {type: "language"}).of("de-DD") returns "German (Germany)", for consistency we should then ensure that new Intl.DisplayNames("en",{type: "region"}).of("DD") returns "Germany".

Error handling while there are no name for the code

@sffc wrote in #11

Do we really want to throw an error if data is not available, or just return null? If we return null, then we can also use that behavior when exporting a list.

Unless the spec explicitly lists which region codes have to be supported, for example, I do not like the idea of throwing an exception here, because then it means that the normal, expected way to call the function is to wrap it in a try-catch just in case the implementation does not have the needed data.

Weekday or dayOfWeek?

Temporal uses "dayOfWeek" instead of "weekday" as the getter for the ISO 8601 weekday number. We should consider using consistent naming (either changing Temporal or Intl.DisplayNames). @pipobscure @gibson042

Should we map the code if the type is region ?

In #77 (comment)
@anba suggested
"
Region (and scripts) subtags should also get canonicalised to replaced outdated subtags with their preferred value.
"
This issue track the "region part" only since the issue with script is different.

I have concern about this. (canonicalize the region code). There are no pre-defined process in UTS35 for this. The process for the region subtag within unicode_language_id stated in https://unicode-org.github.io/cldr/ldml/tr35.html#Canonical_Unicode_Locale_Identifiers depends on the language code (and script code if present) while there are multiple territories listed in the replacement attribute of territoryAlias.

Shorten codes for weekdays and month names?

The codes are "monday", "tuesday", ..., "january", "february", ...

Those are long and easy to make typos. They are maybe also not as friendly to non-English speakers. Did you consider shortening them to their common 3-letter abbreviations?

Weekdays:

  • sun
  • mon
  • tue
  • wed
  • thu
  • fri
  • sat

Months:

  • jan
  • feb
  • mar
  • apr
  • may
  • jun
  • jul
  • aug
  • sep
  • oct
  • nov
  • dec

Editorial nits on fields/slots

  • Make sure to list [[Fields]] in your internal slot list
  • When looking up a record field by variable name, use the syntax _record_.[[<_fieldname_>]]

Separate Language from Locale

The current proposal specifies ofLanguage as accepting either a language or a full language tag (language-script-region).

I'm not sure why this decisions has been made but it seems a bit counter-intuitive and an outlier.

Why not ofLocale for whole locale, and ofLanguage just for languages?

Change the "month" and "weekday" type to 0-based index from 1-based

Currently, ECMA262 define 0-based, not 1-based index for month and weekday in
https://ecma-international.org/ecma-262/#sec-todatestring-day-names
and
https://ecma-international.org/ecma-262/#sec-todatestring-month-names

But in https://tc39.es/proposal-intl-displaynames/#sec-isvalidweekdaycode
it defines https://tc39.es/proposal-intl-displaynames/#table-validcodeforweekday
And in the spec text it refer to this table as

2. If weekday is listed in Table 1, return true.

We should remove this Table 1 but instead change the spec text to refer to the Table 49 in ECMA262 and therefore shift to 0-based index instead. This will make the spec cleaner and align with ECMA262.

Same issue for month.
and in https://tc39.es/proposal-intl-displaynames/#sec-isvalidmonthcode
we defines https://tc39.es/proposal-intl-displaynames/#table-validcodeformonth

@sffc @leobalter @rwaldron @zbraniecki @littledan @mbeck @ljharb @zenparsing

Anyone know how should I put into the source file to referring to a table in ECMA262? Could someone give me an example?

Evaluate options for environments to opt-out of carrying the data necessary for this API

This API has been originally designed to expose data that is already carried to support ECMA402 Edition 3.
Items like names of months or week days in gregorian calendar are carried by all environments that implement Intl.DateTimeFormat.
Two other types of data are now optionally carried in potentially limited form - timezone names and currency names. DateTimeFormat can use a human readable name such as America/Los_Angeles and US Dollar or fallback on the codename GMT-7 and USD.

The new iteration of the API adds tables and columns that currently are not in use by ECMA402:

  • Language Display Names
  • Region Display Names
  • Script Display Names
  • Currency Display Names
  • Date Field Names (year, month, day, hour, second etc.)

and more are considered:

  • TimeZone names (#17)
  • Emoji names (#16)

I believe that this API in particular has the potential to continue increasing in size as new fields will be requested.

As the scope of the API increases, it brings back the concern raised by Apple long time ago, and shared by Mozilla - that as we increase the sufrace of ECMA402, the data package carried by the engine aiming to implement ECMA402 will start increasing.
While each of those data fields may not be large on its own, carrying data for ~100 locales, and often with three or more styles (short/medium/long etc.) may make it impossible for certain implementations (for example aiming at IoT or other low-capacity environments).

I'm not sure if we have a good solution for how to enable such engine to implement the API in a way that is compatible with how users will use it, while retaining ability to opt out or limit the data fields carries by it.

For example, just returning null for unavailable fields will likely lead to developers assuming that data carried by the most popular engine in the most common environment is available everywhere and write code such as:

let {yearTitle, monthTitle, dayTitle} = (new Intl.DisplayNames(navigator.locales, {type: "dateField")).of(["year", "month", "day"]);
th1.textContent = yearTitle;
th2.textContent = monthTitle;
th3.textContent = dayTitle;

While this would work with currency, language and timezone, where the input code is a valid fallback output, I'm concerned that for dateFields, emojiNames, regionNames and dateSymbols, we do not have ability to fallback on the input as output.

If we don't design the API to encourage taking into account a scenario where the data is not present, developers will likely not consider it in their work.

I don't have a ready solution to the problem, but I believe we should consider it as part of this API design.

unicode_language_id's `_` separator and the implementations in the wild

While WebKit/JavaScriptCore team implements Intl.DisplayNames, we noticed that V8 and SpiderMonkey do not accept _ separator in unicode_language_id while it is defined in UTS35 (https://unicode.org/reports/tr35/#unicode_language_id)

sep | = [-_] ;

V8

V8 version 8.5.153
d8> new Intl.DisplayNames(['en'], {type: 'language', fallback: 'none'}).of('en_US')
(d8):1: RangeError: invalid_argument

SpiderMonkey

js> addIntlExtras(Intl)
js>  new Intl.DisplayNames(['en'], {type: 'language', fallback: 'none'}).of('en_US')
typein:2:70 RangeError: invalid value "en_US" for option language
Stack:
  @typein:2:70

Is this behavior something intentional one and does it need to be integrated into this spec? Or is it just an implementation issue?

Should we change back to take one input string and return one output string?

@littledan wrote in #11

This design decision still seems strange to me. It was never clarified why Array.prototype.map doesn't make sense for these sorts of cases. I think getCanonicalLocales is different, since a locale fallback list is a fundamental concept all over ECMA-402, whereas a list of things that you want to apply .of to is not.

@FrankYFTang replied

My concern of the "return a map" approach is the unnecessary waste of run time memory and performance. Let say a JavaScript engine know about the names of 150 languages, 240 regions, 90 scripts, and the application only need to get 6 display names for "en-US", "en-GB", "zh-Hant", "ja", "ja-Latn", "ko". For the current approach, the app make one call passing in [ "en-US", "en-GB", "zh-Hant", "ja", "ja-Latn", "ko"] and the JavaScript engine create an array and insert 6 string and return. If we take the "return a map" approach, the JavaScript engine then need to create an Map and then create 150x240x90 x 2 = 6480000 strings and return. (x2 because you need a key and a value into the map) Even if the JavaScript engine only deal with 150 languages, it still need to create 150 string to insert into the returning map. Of course, this map can be cached per locale, but it will then still use runtime memory. And consider the run time performance of creating these string which most of the time the caller (the app) won't care to know.

Change Intl.DisplayNames.prototype.of to return the "code" fallback in canonical case?

As currently spec'ed, Intl.DisplayNames.prototype.of returns code in the input case for the "code" fallback. For example:

js> new Intl.DisplayNames("en", {type: "language", fallback: "code"}).of("BB")
"BB"

Returning the "code" fallback in the canonical case (*) for the requested type is probably a better choice.

(*) That means lower-case for language subtags, title-case for script subtags, and upper-case for region subtags and currency codes.

minor: link to explainer from spec introduction

This will make it easier for people to find high-level docs, and also might make it unnecessary to duplicate some of the description of motivation and scope (especially since some of this may change).

Accept lists of values on input

Currently, the proposed API focused on retrieving a single language/region/script/locale name.

When designing the API for Mozilla internal use, I was discussing it with the ECMA402 task group and we came to conclusion that in almost all cases we can imagine, the user will want to retrieve a high number of items at once.

For that reason, similarly to what we did with getCanonicalLocales we recommended the input argument to be an array (or be ToArrayed), rather than a single string.

I suggest we do the same here.

Values of code for type: "dayPeriod"

Currently, the value of code for type: "dayPeriod" is either "am" or "pm" as string. There were discussion to move this to number 0 and 1 or 1 and 2. This issue track the discussion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.