tc39 / proposal-intl-displaynames Goto Github PK

Get localized display names for languages, scripts, regions and others. https://tc39.github.io/proposal-intl-displaynames/

License: MIT License

HTML 99.21% Shell 0.68% HCL 0.11%

proposal-intl-displaynames's Issues

Consider adding abbreviated length

It seems to me that at the moment DisplayNames doesn't handle abbreviated length of the field.
Is there a reason we decided to skip it or just an omission?

Removing display name of Script code

The current spec include type: "script" . After careful study of all the usage on the web, I found this feature is not really needed and cannot see a good use case of using it. I propose we remove such support from the current spec if no one object. This will help us to reduce the size impact of locale data. We can add it back if later if someone show a strong use case of it.

@gsathya @zbraniecki @littledan @sffc @jungshik @anab

Add "weekday" and "month" display names

I couldn't find an issue for that, so please, feel free to dupe this against it if there is one.

In the process we trimmed down the scope of this API, and I'd like to verify that the current intention is to later add weekday and month display names.

Support calendar display names

Intl.DateTimeFormat API support calendar option, we should provide display names for these calendar

Suspicious handling of numeric code in Intl.DisplayNames.prototype.of()

TypeError should be thrown by step 4. of Intl.DisplayNames.prototype.of(code) when Type(code) is not String, Number or Object. Step 5. invokes ToString(code) when Type(code) is Object only. So, step 6. can be reached by a numeric code but IsValidCodeForDisplayNames() operation invoked by this step does not make sense for a numeric code. It does not make sense to ask whether some number (unlike its textual representation) matches some production rule. Moreover, IsWellFormedCurrencyCode() operation states explicitly that its argument must be string.

It seems that ToString(code) in step 5. of Intl.DisplayNames.prototype.of(code) should be invoked when Type(code) is Number as well. In fact, the combination of steps 4. and 5. looks a bit strange to me. Why does the specification takes special care of some types of code. Wouldn't it be simpler and more natural to convert code to String using ToString(code) unconditionally?

Should we map the code if the type is script ?

Similar to #80 but under different constraints
In #77 (comment)
@anba suggested
"
Region (and scripts) subtags should also get canonicalised to replaced outdated subtags with their preferred value.
"
This issue track the "script part" only since the issue with region is different.

I have concern about this. There are no standalone pre-defined process in UTS35 for this. The process for the script subtag within unicode_language_id stated in https://unicode-org.github.io/cldr/ldml/tr35.html#Canonical_Unicode_Locale_Identifiers is part of the the whole process. And also there is only one entry currently in
https://github.com/unicode-org/cldr/blob/master/common/supplemental/supplementalMetadata.xml

        <scriptAlias type="Qaai" replacement="Zinh" reason="deprecated"/>

Names of Months or Week days

Hi I'm not sure if displaynames is the right place to ask this or is already supported by in another Intl.x API . I believe that "Months" or "Weekdays" should be supported like :

Actual Behaviour

var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime'});
console.log(dateTimeNames.of('monday')); // "Monday"

Nice to have Behaviour

var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime'});
console.log(dateTimeNames.of('months')); // ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
console.log(dateTimeNames.of('weekdays')); // ["Sunday", "Monday", ...]

var dateTimeNames = new Intl.DisplayNames(['en'], {type: 'dateTime' style: 'short'});
console.log(dateTimeNames.of('months')); // ["Jan", "Feb", "Mar" ... ]
console.log(dateTimeNames.of('weekdays')); // ["Sun", "Mon", ...]

Supporting TimeZone names in Intl.DisplayNames API

This is spin off from #8 . I think it is better to have one request per issue to make the discussion easier.

The number of types stated in the Internal Slot should be "nine" instead of "six"

https://tc39.es/proposal-intl-displaynames/#sec-Intl.DisplayNames-internal-slots
The following text stated there are "six" display names style but the reality is there are "nine"

[[LocaleData]].[[<locale>]] must have a [[types]] field for all locale values locale. 
The value of this field must be a Record, which must have fields with the names of
 the six display name types: "language", "region", "script", "currency", "weekday", 
"month", "quarter", "dayPeriod", and "dateTimeField".

We should fix this.

Indexing month names for leap month

I was talking to @pedberg-icu yesterday about how to index month names for the purpose of the Intl.DisplayNames API. He suggested that it could make sense to add an optional second argument to the .of() method for type: "month" to indicate whether the month is a leap-month. Opening an issue to continue this discussion.

Consider removing the dateSymbol type

When type is set to "dateSymbol", the codes are things like the days of the week and months of the year. These are things:

That rarely (never?) change
Are not plentiful; there are only a handful of them
Are constituents of date formatters

Because of 1 and 2, the case for having them in the API isn't very strong. Item 3 shows a major drawback to including them: their presence may entice uninformed developers to try to build their own date formatters, which would almost certainly be less correct and less performant than the date formatters already present in JavaScript.

Add applicable options to the "Conformance" section

https://tc39.es/ecma402/#conformance lists options which are allowed to have implementation-defined behaviour. For Intl.DisplayNames, the "style" and "type" options should probably be listed there. I'm not yet sure if it makes sense to allow implementation-defined behaviour for the "fallback" option, because I don't exactly know what other kind of fallback should happen here.

Unit display name?

CLDR has display names for measurement units. Maybe expose them through Intl.DisplayNames?

Casing of enums

Is there a reason the type field does not follow the recommendations of https://w3ctag.github.io/design-principles/#casing-rules? Does JavaScript have different rules?

Describe schema for locale data

Most Intl types have a section under internal slots of the constructor that describes how the locale data is organized. E.g., see this section for DateTimeFormat. I don't see this description for DisplayNames.

Naming of dateSymbol and style

The proposal currently has the syntax

symbolNames = new Intl.DisplayNames(
  ['en'], {type: 'dateSymbol', style: 'short'});
symbolNames.of('saturday'); // => "Sa"

A few questions/comments:

We usually use kebab-case instead of camelCase for string values. Would type: "date-symbol" be better?
Intl.NumberFormat uses the word "display" instead of "style" for the width: currencyDisplay: "short". Would symbolDisplay: "short" or just display: "short" be better? Note: dateStyle/timeStyle uses the word "style".
Is there a better name than "dateSymbol"? I don't have any suggestions right now. It's just that the word "symbol" makes me thing of a symbol character, not a word.

Stage 3 review

Reviewers

@littledan
more reviewers to be found async

Editors

Input conversion for language tags inconsistent with CanonicalizeLocaleList

Intl.DisplayNames.prototype.of was recently (#67) changed to use plain ToString to convert the input. This makes Intl.DisplayNames inconsistent with CanonicalizeLocaleList, which only allows String and Object inputs to avoid accepting NaN as "nan".

js> new Intl.DisplayNames("en", {type: "language"}).of(NaN)
"Min Nan Chinese"

But we didn't care about this case anymore when specifiying Intl.Locale:

js> new Intl.Locale("und", {language: NaN}).toString()
"nan"

so maybe unconditionally calling ToString is okay. We should just make sure everybody is on board with this decision.

Evaluate the cost of capitalization rules

We currently don't have any capitalization related options exposed in any of our Intl APIs.

I don't know the exact reason for this decision, but IIRC it has something to do with potentially high payload required to provide proper capitalization across all supported locales.

If we add it here, it would be good to evaluate this cost again, and verify if we want to also add it to DateTimeFormat (weekDay standalone for example), RelativeTimeFormat etc.

Script and variants info

The proposal as is includes regions, locales, and language display names. Would it include scripts and variants too?

https://github.com/unicode-cldr/cldr-localenames-modern/tree/master/main/en

back reference: brawer/proposal-intl-displaynames#6

Update README for new dateTime types

See #48 and #49

Round-trip conversion — eg displayNames.for(‹name›, ‹locale›)

I may have missed it, but it might be worth considering to offer a convenience method(s) for going from specific locales other than English.

This could roughly take the form of displayNames.for(‹name›, ‹locale›) with more or less verbose parameters where applicable.

Need to add Oxford comma after "Number"

@ljharb wrote:
https://tc39.es/proposal-intl-displaynames/#sec-Intl.DisplayNames.prototype.of step 4 needs an Oxford comma after "Number"

Why is locales an array?

It would be good to explain this in the README.

Supporting Emoji names in Intl.DisplayNames API

This is spin off from #8 . I think it is better to have one request per issue to make the discussion easier.

Docs(MDN) : Intl.DisplayNames

Create Documentation for Intl.DisplayNames

Review Readme documentation and examples
Create MDN Main Docs Page

MDN Pages :

prototype
of
options

Interactive Examples MDN :

Example of usage

Browser compat-data :

Browser compat

Need to add calendar option to get different display names of type: "month", "weekday", "quarter" and "dayPeriod"

Just notice CLDR (and therefore ICU) has different display names for "month", "weekday", "quarter" and "dayPeriod" for different "calendar"

Therefore, we should

accept a "calendar" option in Intl.DisplayNames constructor if the type is "month", "weekday", "quarter" or "dayPeriod"
based on the "calendar" option, the internal fields should be set differently based on the calendar.

How to get non-Gregorian month names?

In calendar systems with month names that are not Gregorian month names, like Hebrew, how do you get the month names? We have the data already in Intl.DateTimeFormat:

new Date().toLocaleDateString("en-us-u-ca-hebrew", { month: "long" })
// "Tamuz"

Canonicalise language, script, and region tags

Language, script, and region tags should be canonicalised, because

This matches how ECMA-402 works for other API.
ICU does this implicitly for some APIs resp. other ICU APIs require a canonicalised input to produce any result.

For example when new Intl.DisplayNames("en", {type: "language"}).of("de-DD") returns "German (Germany)", for consistency we should then ensure that new Intl.DisplayNames("en",{type: "region"}).of("DD") returns "Germany".

Error handling while there are no name for the code

@sffc wrote in #11

Do we really want to throw an error if data is not available, or just return null? If we return null, then we can also use that behavior when exporting a list.

Unless the spec explicitly lists which region codes have to be supported, for example, I do not like the idea of throwing an exception here, because then it means that the normal, expected way to call the function is to wrap it in a try-catch just in case the implementation does not have the needed data.

Should "type" be changed to a mandatory option instead of defaulting it to "language"?

Unless we envision Intl.DisplayNames to be primarily used to retrieve the localised name for language tags, it may make sense to change "type" to a mandatory option instead of defaulting it to "language" in the Intl.DisplayNames constructor.

Weekday or dayOfWeek?

Temporal uses "dayOfWeek" instead of "weekday" as the getter for the ISO 8601 weekday number. We should consider using consistent naming (either changing Temporal or Intl.DisplayNames). @pipobscure @gibson042

Should we map the code if the type is region ?

In #77 (comment)
@anba suggested
"
Region (and scripts) subtags should also get canonicalised to replaced outdated subtags with their preferred value.
"
This issue track the "region part" only since the issue with script is different.

I have concern about this. (canonicalize the region code). There are no pre-defined process in UTS35 for this. The process for the region subtag within unicode_language_id stated in https://unicode-org.github.io/cldr/ldml/tr35.html#Canonical_Unicode_Locale_Identifiers depends on the language code (and script code if present) while there are multiple territories listed in the replacement attribute of territoryAlias.

Dialect support?

https://unicode.org/reports/tr35/tr35-general.html#Display_Name_Elements

example nl-BE = "Flemish", not just "Dutch (Belgium)"

Intl.DisplayNames payload

An analysis of a payload cost for SpiderMonkey to ship this API - https://bugzilla.mozilla.org/show_bug.cgi?id=1557727#c7

Shorten codes for weekdays and month names?

The codes are "monday", "tuesday", ..., "january", "february", ...

Those are long and easy to make typos. They are maybe also not as friendly to non-English speakers. Did you consider shortening them to their common 3-letter abbreviations?

Weekdays:

Months:

Support name of the Numbering System

Several Intl APIs accept numberingSystem as option, we should provide display names of these numberingSystem

Editorial nits on fields/slots

Make sure to list [[Fields]] in your internal slot list
When looking up a record field by variable name, use the syntax _record_.[[<_fieldname_>]]

Possible additional strings to expose

Currency? We already have this due to NumberFormat

Time zone names? We already have this data due to DateTimeFormat

Emoji names? Widely requested, cc @nolanlawson

Separate Language from Locale

The current proposal specifies ofLanguage as accepting either a language or a full language tag (language-script-region).

I'm not sure why this decisions has been made but it seems a bit counter-intuitive and an outlier.

Why not ofLocale for whole locale, and ofLanguage just for languages?

Change the "month" and "weekday" type to 0-based index from 1-based

Currently, ECMA262 define 0-based, not 1-based index for month and weekday in
https://ecma-international.org/ecma-262/#sec-todatestring-day-names
and
https://ecma-international.org/ecma-262/#sec-todatestring-month-names

But in https://tc39.es/proposal-intl-displaynames/#sec-isvalidweekdaycode
it defines https://tc39.es/proposal-intl-displaynames/#table-validcodeforweekday
And in the spec text it refer to this table as

2. If weekday is listed in Table 1, return true.

We should remove this Table 1 but instead change the spec text to refer to the Table 49 in ECMA262 and therefore shift to 0-based index instead. This will make the spec cleaner and align with ECMA262.

Same issue for month.
and in https://tc39.es/proposal-intl-displaynames/#sec-isvalidmonthcode
we defines https://tc39.es/proposal-intl-displaynames/#table-validcodeformonth

@sffc @leobalter @rwaldron @zbraniecki @littledan @mbeck @ljharb @zenparsing

Anyone know how should I put into the source file to referring to a table in ECMA262? Could someone give me an example?

Evaluate options for environments to opt-out of carrying the data necessary for this API

This API has been originally designed to expose data that is already carried to support ECMA402 Edition 3.
Items like names of months or week days in gregorian calendar are carried by all environments that implement Intl.DateTimeFormat.
Two other types of data are now optionally carried in potentially limited form - timezone names and currency names. DateTimeFormat can use a human readable name such as America/Los_Angeles and US Dollar or fallback on the codename GMT-7 and USD.

The new iteration of the API adds tables and columns that currently are not in use by ECMA402:

Language Display Names
Region Display Names
Script Display Names
Currency Display Names
Date Field Names (year, month, day, hour, second etc.)

and more are considered:

TimeZone names (#17)
Emoji names (#16)

I believe that this API in particular has the potential to continue increasing in size as new fields will be requested.

As the scope of the API increases, it brings back the concern raised by Apple long time ago, and shared by Mozilla - that as we increase the sufrace of ECMA402, the data package carried by the engine aiming to implement ECMA402 will start increasing.
While each of those data fields may not be large on its own, carrying data for ~100 locales, and often with three or more styles (short/medium/long etc.) may make it impossible for certain implementations (for example aiming at IoT or other low-capacity environments).

I'm not sure if we have a good solution for how to enable such engine to implement the API in a way that is compatible with how users will use it, while retaining ability to opt out or limit the data fields carries by it.

For example, just returning null for unavailable fields will likely lead to developers assuming that data carried by the most popular engine in the most common environment is available everywhere and write code such as:

let {yearTitle, monthTitle, dayTitle} = (new Intl.DisplayNames(navigator.locales, {type: "dateField")).of(["year", "month", "day"]);
th1.textContent = yearTitle;
th2.textContent = monthTitle;
th3.textContent = dayTitle;

While this would work with currency, language and timezone, where the input code is a valid fallback output, I'm concerned that for dateFields, emojiNames, regionNames and dateSymbols, we do not have ability to fallback on the input as output.

If we don't design the API to encourage taking into account a scenario where the data is not present, developers will likely not consider it in their work.

I don't have a ready solution to the problem, but I believe we should consider it as part of this API design.

unicode_language_id's `_` separator and the implementations in the wild

While WebKit/JavaScriptCore team implements Intl.DisplayNames, we noticed that V8 and SpiderMonkey do not accept _ separator in unicode_language_id while it is defined in UTS35 (https://unicode.org/reports/tr35/#unicode_language_id)

sep | = [-_] ;

V8 version 8.5.153
d8> new Intl.DisplayNames(['en'], {type: 'language', fallback: 'none'}).of('en_US')
(d8):1: RangeError: invalid_argument

SpiderMonkey

js> addIntlExtras(Intl)
js>  new Intl.DisplayNames(['en'], {type: 'language', fallback: 'none'}).of('en_US')
typein:2:70 RangeError: invalid value "en_US" for option language
Stack:
  @typein:2:70

Is this behavior something intentional one and does it need to be integrated into this spec? Or is it just an implementation issue?

Should we change back to take one input string and return one output string?

@littledan wrote in #11

This design decision still seems strange to me. It was never clarified why Array.prototype.map doesn't make sense for these sorts of cases. I think getCanonicalLocales is different, since a locale fallback list is a fundamental concept all over ECMA-402, whereas a list of things that you want to apply .of to is not.

@FrankYFTang replied

My concern of the "return a map" approach is the unnecessary waste of run time memory and performance. Let say a JavaScript engine know about the names of 150 languages, 240 regions, 90 scripts, and the application only need to get 6 display names for "en-US", "en-GB", "zh-Hant", "ja", "ja-Latn", "ko". For the current approach, the app make one call passing in [ "en-US", "en-GB", "zh-Hant", "ja", "ja-Latn", "ko"] and the JavaScript engine create an array and insert 6 string and return. If we take the "return a map" approach, the JavaScript engine then need to create an Map and then create 150x240x90 x 2 = 6480000 strings and return. (x2 because you need a key and a value into the map) Even if the JavaScript engine only deal with 150 languages, it still need to create 150 string to insert into the returning map. Of course, this map can be cached per locale, but it will then still use runtime memory. And consider the run time performance of creating these string which most of the time the caller (the app) won't care to know.

Change Intl.DisplayNames.prototype.of to return the "code" fallback in canonical case?

As currently spec'ed, Intl.DisplayNames.prototype.of returns code in the input case for the "code" fallback. For example:

js> new Intl.DisplayNames("en", {type: "language", fallback: "code"}).of("BB")
"BB"

Returning the "code" fallback in the canonical case (*) for the requested type is probably a better choice.

(*) That means lower-case for language subtags, title-case for script subtags, and upper-case for region subtags and currency codes.

minor: link to explainer from spec introduction

This will make it easier for people to find high-level docs, and also might make it unnecessary to duplicate some of the description of motivation and scope (especially since some of this may change).

Unclear why these strings shouldn't be handled by the website's existing internationalization workflow

Internationalized websites already have large catalogs of strings that they need translated into all the languages the site supports. It's unclear what is different about these specific strings to make them handled differently than the site's existing workflow.

Accept lists of values on input

Currently, the proposed API focused on retrieving a single language/region/script/locale name.

When designing the API for Mozilla internal use, I was discussing it with the ECMA402 task group and we came to conclusion that in almost all cases we can imagine, the user will want to retrieve a high number of items at once.

For that reason, similarly to what we did with getCanonicalLocales we recommended the input argument to be an array (or be ToArrayed), rather than a single string.

I suggest we do the same here.

Values of code for type: "dayPeriod"

Currently, the value of code for type: "dayPeriod" is either "am" or "pm" as string. There were discussion to move this to number 0 and 1 or 1 and 2. This issue track the discussion.

tc39 / proposal-intl-displaynames Goto Github PK

proposal-intl-displaynames's Issues

Recommend Projects

Recommend Topics

Recommend Org