Giter Club home page Giter Club logo

fluent's Introduction

Fluent

Fluent is a localization system designed to unleash the expressive power of the natural language.

This repository contains the specification, the reference implementation of the parser and the documentation for Fluent.

Fluent Syntax (FTL)

FTL is the syntax for describing translation resources in Project Fluent. FTL stands for Fluent Translation List. Read the Fluent Syntax Guide to get started learning Fluent.

The syntax/ directory contains the reference implementation of the syntax as a LL(infinity) parser.

The spec/ directory contains the formal EBNF grammar, autogenerated from the reference implementation.

Development

While working on the reference parser, use the following commands to test and validate your work:

npm test                   # Test the parser against JSON AST fixtures.
npm run lint               # Lint the parser code.

npm run generate:ebnf      # Generate the EBNF from syntax/grammar.js.
npm run generate:fixtures  # Generate test fixtures (FTL → JSON AST).

npm run build:guide        # Build the HTML version of the Guide.

npm run bench              # Run the performance benchmark on large FTL.

Other Implementations

This repository contains the reference implementation of the parser. Other implementations exist which should be preferred for use in production and in tooling.

We also know about the following community-driven implementations:

Learn More and Discuss

Find out more about Project Fluent at projectfluent.org and discuss the future of Fluent at Mozilla Discourse.

fluent's People

Contributors

alabamenhu avatar alerque avatar be-we avatar cimbali avatar danilobuerger avatar demivan avatar dependabot[bot] avatar eemeli avatar flodolo avatar glendc avatar guest20 avatar hkasemir avatar joycebabu avatar jrmajor avatar koivunej avatar kreibaum avatar lus avatar mailaender avatar mathjazz avatar mgol avatar missmatsuko avatar msujaws avatar pike avatar robintown avatar spagy avatar spookylukey avatar stasm avatar stoyandimitrov avatar willfarrell avatar zbraniecki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fluent's Issues

Dynamic message references

It is sometimes desired to parametrize message references in placeables. In this issue I'd like to propose a new argument type, extending FluentType which could be used to programmatically pass message references as arguments to messages.

Problem Statement

Redundancy is considered good for localization. It allows localizers to tailor the wording and the grammar of the translation of each particular case. Also see Fluent Good Practices.

In general, the pattern of having one message per item is preferred over factoring the action out to its own message (Delete This { $item }) and passing the translated item in some way.

# Having two separate messages allows localizers
# to customize translations in each, if needed.
delete-picture = Delete This Picture
delete-video = Delete This Video

In some cases, however, this pattern doesn't scale well.

Consider this example from Firefox (source):

# %S is the website origin (e.g. www.mozilla.org)
getUserMedia.sharingMenuCamera = %S (camera)
getUserMedia.sharingMenuMicrophone = %S (microphone)
getUserMedia.sharingMenuAudioCapture = %S (tab audio)
getUserMedia.sharingMenuApplication = %S (application)
getUserMedia.sharingMenuScreen = %S (screen)
getUserMedia.sharingMenuWindow = %S (window)
getUserMedia.sharingMenuBrowser = %S (tab)
getUserMedia.sharingMenuCameraMicrophone = %S (camera and microphone)
getUserMedia.sharingMenuCameraMicrophoneApplication = %S (camera, microphone and application)
getUserMedia.sharingMenuCameraMicrophoneScreen = %S (camera, microphone and screen)
getUserMedia.sharingMenuCameraMicrophoneWindow = %S (camera, microphone and window)
getUserMedia.sharingMenuCameraMicrophoneBrowser = %S (camera, microphone and tab)
getUserMedia.sharingMenuCameraAudioCapture = %S (camera and tab audio)
getUserMedia.sharingMenuCameraAudioCaptureApplication = %S (camera, tab audio and application)
getUserMedia.sharingMenuCameraAudioCaptureScreen = %S (camera, tab audio and screen)
getUserMedia.sharingMenuCameraAudioCaptureWindow = %S (camera, tab audio and window)
getUserMedia.sharingMenuCameraAudioCaptureBrowser = %S (camera, tab audio and tab)
getUserMedia.sharingMenuCameraApplication = %S (camera and application)
getUserMedia.sharingMenuCameraScreen = %S (camera and screen)
getUserMedia.sharingMenuCameraWindow = %S (camera and window)
getUserMedia.sharingMenuCameraBrowser = %S (camera and tab)
getUserMedia.sharingMenuMicrophoneApplication = %S (microphone and application)
getUserMedia.sharingMenuMicrophoneScreen = %S (microphone and screen)
getUserMedia.sharingMenuMicrophoneWindow = %S (microphone and window)
getUserMedia.sharingMenuMicrophoneBrowser = %S (microphone and tab)
getUserMedia.sharingMenuAudioCaptureApplication = %S (tab audio and application)
getUserMedia.sharingMenuAudioCaptureScreen = %S (tab audio and screen)
getUserMedia.sharingMenuAudioCaptureWindow = %S (tab audio and window)
getUserMedia.sharingMenuAudioCaptureBrowser = %S (tab audio and tab)

Or the use-case @cruelbob gives in #79 (comment):

Collect meat from cows, pigs and sheep.

One of my favorite games, Heroes of Might and Magic III, pits armies consisting of over 140 different unit types in battles against each other. After every move, the battle log reads:

The Bone Dragon does 46 damage. 2 Griffins perish.

Or:

The Cyclops Kings do 233 damage. One Giant perishes.

If we wanted to avoid concatenation of sentences (two sentences per creature: one for do X damage and one for X creatures perish), we'd end up with 141² = 19,881 different permutations of creature pairs.

This doesn't scale well.

Proposed Solution

Introducing some redundancy should still be preferred for small sets of items. For large sets leading to lots and lots of permutations, it should be possible to parametrize the translation of placeables.

I'll use the example of HoMM3 because the other two also require the List Formatting feature to make sense.

I'd like to make it possible to pass external arguments which resolve to message references. Given the following FTL:

-creature-bone-dragon =
    {
       *[singular] Bone Dragon
        [plural] Bone Dragons
    }
-creature-griffin =
    {
       *[singular] Griffin
        [plural] Griffins
    }

# … Hundreds more …

battle-log-attack-perish =
    { $attacker_count ->
        [one] The { $attacker_name[singular] } does
       *[other] The { $attacker_name[plural] } do
    } { $damage_points } damage. { $perish_count ->
        [one] One { $defender_name[singular] } perishes.
       *[other] { $defender_count } { $defender_name[plural] } perish.
    }

…both $attacker_name and $defender_name would be arguments of type FluentReference (extending FluentType; same as FluentNumber and FluentDateTime). The developer would pass them like so:

let msg = ctx.getMessage("battle-log-attack-perish");
log(ctx.format(msg, {
    attacker_name: new FluentReference("-creature-bone-dragon"),
    attacker_count: 1,
    defender_name: new FluentReference("-creature-griffin"),
    perish_count: 2,
    damage_points: 46
}));

This change mostly requires additions to the MessageContext resolution logic. Syntax-wise, the VariantExpression and the AttributeExpression should be changed to accept both message identifiers as well as external arguments as parent objects (like in the $attacker_name[singular] example above).

Open Questions

  1. Should we also allow public messages to be dynamically referenced like this?

Sign-offs

(toggle)
@Pike
  • I support this.
  • I don't care.
  • I object this.
@stasm
  • I support this.
  • I don't care.
  • I object this.
@zbraniecki
  • I support this.
  • I don't care.
  • I object this.

Also CC @flodolo.

Remove `?` from allowed characters in ID, Symbol, Keyword and NamedArgument

The current spec allows for ? character to be present in ID, Keyword, Symbol and NamedArgument.

I believe that this is more confusing then helping anyone and we should not do this.

Examples of things that are allowed by the spec:


[[ ? ]]
? = Value

? = { ? }

? = { ?[?] }

? = { ? =>
        [?] Variant
    }

? = { ?() }

? = { ?(?=0) }

I recognize that characters like - and _ can be called out the same way, but I do see a particular value of being able to use them in ID and function name at least (although I'd prefer to limit that to non-first character)
The ? character is the only one that I see confusing and unnecessary in all identifiers and keywords.

Since the parsers were never updated to handle that, I also believe that it would be an easy change to make.

@Pike , @stasm , @flodolo , @mathjazz - opinions?

Introduce Glossary Messages

Goal

Recognize the glossary translations type which cannot be retrieved by the calling code.

Summary

The proposed syntax for glossary messages is via an identifier starting with - (a dash):

-brand-name = Firefox
app-title = { -brand-name }

Glossary messages cannot be retrieved from the MessageContext they're defined in and they can only be referenced in other messages. Glossary messages and public messages are separate AST nodes extending the message node in the ASDL spec.

Description

Glossary translations are not intended to be used directly by the calling code. Instead they should go through another, public, message. The current syntax already hints at this by disallowing tags and attributes to be defined on the same message. The introduction of glossary messages will make this design explicit.

-brand-name = Firefox
application-title = { -brand-name }

This change boils down to allowing - as the first character of the identifier. Its consequences are, however, far-reaching. Tools like compare-locales should not inspect attributes of glossary messages. These attributes are consequently private and cannot be used for translating widget attributes (like in HTML). They may be used for language-specific information and they could replace tags.

Example

English:

-brand-name = Firefox

app-title = { -brand-name }
has-updated = { -brand-name } has been updated.

Polish (with glossary attributes replacing tags):

-brand-name =
    {
       *[nominative] Firefox
        [genitive] Firefoksa
    }
    .gender = masculine

app-title = { -brand-name }
has-updated =
    { -brand-name.gender ->
        [masculine] { -brand-name } został zaktualizowany.
        [feminine] { -brand-name } została zaktualizowana.
       *[other] Program { -brand-name } został zaktualizowany.
    }

Private Messages

By allowing - as the first character of the identifier we will also enable ids of the form --foo. These could be used by tools to differentiate between glossary messages which must be present in an FTL file and private messages which are localizer-defined.

--tab = tab
open-tab = Open a new { --tab }

I would still encourage everyone to spell the whole sentence out:

open-tab = Open a new tab

Signoffs

Support for array data in externals

In L20n v4 we were able to pass arrays in externals, have them formatted, and transform them (to an extent) using TAKE and DROP functions. After inception of fluent and L20n v5 attempts to use arrays now result in a Unsupported external type: array, object TypeError and I was unable to find any mentions of arrays in fluent specification or fluent/l20n source code, nor was there any mention of their removal in L20n's changelog.

Therefore I'd like to ask: was array support removed on purpose? If so what's recommended for projects which were using arrays, if not what's the plan on bringing them back?

Decide on the section body

We currently place all messages that belong to a section inside a section body.

Since there's no semantic meaning to sections, maybe we want to reconsider that, and flatten the structure.

I'm not sure what would be the consequences, for example for ability to pick section comments and apply them for all messages that belong to the section by l10n tools, but I'd like to discuss it.

How to return an empty message

Hello!

I am just trying to do this:

test = {  $num ->
    *[0]         
     [other]         No empty text
}

but it gets an error:

TypeError
Cannot read property 'undefined' of undefined

How can an empty string be returned?

Allow more characters in keys, and trim whitespace around them

Variant keys can be language-specific and as such should accept a wide range of Unicode characters. For example, this will allow localizers to define grammatical cases in their native language.

Let's also make them more liberal on the white-space around them and always trim on both ends.

Disallow multiple expressions in a placeable

Goal

Simplify the grammar and the AST.

Description

Placeables currently allow more than one expression separated by a comma. The resulting list must be implicitly formatted with a language-specific List formatter before being interpolated into the parent pattern. Separating with a comma conflicts with the planned changes to allowing lists as selectors to select-expression.

The implicit list is also a rare construct and at the same time it impacts the shape of the AST for every placeables. I suggest to remove this feature. The { LIST(…) } syntax can be used instead. In the future we can also consider a syntax for list literals but the placeable would still have only one expression.

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/YBNCSp7J0zU

Allow newline between arguments to a call-expression

After #2 is fixed, we'll run into a similar problem as the one described in l20n/spec#14, but for arglist: the current grammar doesn't allow new lines in the list of arguments to a call-expression. It makes the complex examples from the syntax guide invalid:

liked-photo = { LEN($people) ->
    [1] { $people } likes
    [2] { $people } like
    [3] { LIST(TAKE(2, $people), "one more person") } like

   *[other] { LIST(
        TAKE(2, $people),
        "{ LEN(DROP(2, $people)) ->
            [1]    one more person like
           *[other]  { LEN(DROP(2, $people)) } more people like
        }"
    ) }
} your photo.

Enforce indentation for multiline string and variants

Currently, this form is allowed:

key =
|Value

I think it's likely to be confusing and in the spirit of "it's easier to relax than tighten", I'd like to suggest enforcing a requirement for at least a single whitespace before | character.

If we also do the same for selector variants to prevent:

key = { $OS ->
[one] Foo { $OS2 ->
[two Faa
}
}

then we'll make it easier to visually spot new entries because the only characters at the beginning of the new line will be #, [[, ident_start and } which is easy to scan through in a big file.

Alignment rules for linting and serialization

In #24 (comment), @zbraniecki, @Pike, @flodolo and I had a discussion about the standard formatting and whitespace alignment of values in FTL.

The argument raised by @flodolo against pretty-aligning of values was that it breaks the blame and creates diff noise.

@zbraniecki then created an example file and expressed concerns about its readability without value-aligning:

0.1 syntax: https://github.com/projectfluent/fluent-rs/blob/0.1/tests/workload-low.ftl
0.2 syntax: https://github.com/projectfluent/fluent-rs/blob/master/tests/workload-low.ftl

To remind principles:
- the file should be readable without prior knowledge of FTL
- the file should be easy to scan through
- the file should be editable with minimal FTL knowledge

For one, I'd say we did manage to make it look cleaner (less sigils!) But also, I believe the attribute alignment significantly helps with readability, so I'm not sure if I agree that preserving blame is worth it.

I'm creating a new issue to give this topic its own place for discussion.

Explicit precedence of expressions

Goal

Give users more control when working with lists.

Description

We want to allow list-typed selectors in select-expression which would lead to ambiguous syntax in constructs like { LIST(foo, bar, baz -> … )}. A work-around currently exists:

{ LIST(foo, "{bar, baz -> … }")}

I suggest we introduce explicit syntax for grouping which will instruct the parser unambiguously about the order of expressions. The proposed syntax is { … } for grouping:

{ LIST(foo, {bar, baz -> … })}

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/YBNCSp7J0zU

Maybe rename private messages?

We've seen some confusion about how to talk about private messages and their attributes. The message itself is not really private -- tools will still check if it exists. What's private is its interface: the value and any attributes.

Perhaps we could find a better wording to make this easier to understand? in #62 I originally proposed glossary messages; the feedback was that in the world of localization glossaries already have a defined meaning and it's better not to confuse people further.

identifier definition inconsistency ?

Hi,

I'm not familiar with language grammar but I find something weird with the identifier definition.

According to the railroad diagrams and ebnf file here is the definition:
2017-05-08-154141_457x584_scrot
This suppose to match strings like:

-abcde = abcde
a?cde = abcde
?bcde = abcde

Unfortunately, testing these cases in the fluent playground leads to a parser error.
http://projectfluent.io/play/?gist=6c28357ea7079cfc83f65789e551b67b

Can you confirm the good implementation? and do you know where I can find the source of the parser used in the playground?

Thanks

Disallow whitespace around member keys

I'd like to suggest tightening a bit how we handle whitespace in member keys.

Currently this is allowed:

key =
  [ foo     ] Value

I propose that:

  1. Member keys cannot have leading and trailing whitespaces

Forbid "?" in the identifier

As per Stas' request in projectfluent/fluent.js#84 (comment)

This is one of those things that I don't think developers will actively ask us to introduce but they might
appreciate it if it's there. If you feel strongly about forbidding the question mark (at least right now) and my
above arguments haven't convinced you otherwise, please file a new issue in the Fluent Syntax repo.

I am not convinced and I would see a value in not adding this sygil at this point.

My reasoning boils down to two items:

  1. I think of l10n-id as an identifier, not a quasi representation of the english sentence (much like Axel's point in the linked discussion). Dashes stretch it, question mark breaks it.
  2. Adding sygils is irreverisble and in result should be avoided when possible. There's no request for it, there's no parity with other systems issue, there's no prior experience of people using such a thing and overall, trying to convey the role of the phrase in via the "?" in the ID seems to me like a misplaced goal.

I'd love to use as few sygils in the ID as possible and the [a-Z_-0-9] is imho the far end of what we should start with.

Rename traits to attributes and change syntax

Goal

Provide easier syntax for multi-valued messages.

Description

While #5 is about multiple facets of the same value, this is about multiple values. This is useful in case of web components or other UI widgets.

Currently we'd write:

key = Value
  [label] Label

This looks similar to a variant of a select-expression but actually defines a trait and "Label" is a separate value, one of two values of the key message. When #1 is fixed this will become even more confusing, because lists of traits will not require defaults, but will use the same syntax as lists of variants.

The proposed solution is to use . (the dot) as a new syntax for defining traits. Also, to better reflect their purpose, we want to rename traits to attributes.

key = Value
key.label = Label

If there is no value it's definition can be omitted:

file-open.label = Open File
file-open.accesskey = O 

This will also require an addition of attribute-expression:

attribute-expression ::= identifier '.' keyword;
file-open.label = Open File
file-open.accesskey = O 
file-open.title = To open an existing file press Ctrl + { file-open.accesskey }.

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/dhWfBXHzuZI

Semantic comments

Having a way to semantically describe a message would benefit tooling. It would allow tools to better inform the user what they can do in the translation, and give hints and suggestions.

Perhaps we could consider using something similar JSDoc. In particular the @param tag: http://usejsdoc.org/tags-param.html. JSDoc conveniently allows to specify the type, the description and the default value, which could be used by tools to display an example of a formatted translation.

# @param {number} [$num = 4] Number of new messages
new-messages = { $num ->
   *[one] You have 1 new message.
    [other] You have { $num } new messages.
}

Would it make sense to make this meta-information first-class? Rust differentiates between regular comments (//) and doc comments (///). We could do something similar by making the @ sigil special:

@param {number} [$num = 4] Number of new messages
new-messages = { $num ->
   *[one] You have 1 new message.
    [other] You have { $num } new messages.
}

This is possibly related to #7.

Add an intermediate Placeable production for Expressions within Patterns

We currently don't expose the information about the exact position of the { … } placeable in the source. Consider:

foo = foo { bar }
      ^_^          TextElement span
            ^_^    MessageReference span
      ^---------^  Pattern span

The suggested solution is to add an intermediate Placeable node for Expressions which are elements of Patterns:

foo = foo { bar }
      ^_^          TextElement span
            ^_^    MessageReference span
          ^-----^  Pleaceable span
      ^---------^  Pattern span

Change tag sigil

Goal

Allow to use # for comments in #58.

Description

We chose # for tags in #7 and actually had to change the comment sigil from # to // in #28. If the Comments Levels (#58) gets enough support I'd like to reclaim # for comments and find a new sigil for tags.

I propose + which was also my second proposal in #7.

Example

brand-name = Firefox
    +masculine

updated =
    { brand-name ->
        [masculine] { brand-name } został zaktualizowany.
        [feminine] { brand-name } została zaktualizowana.
       *[other] Program { brand-name } został zaktualizowany.
    }

Relax EBNF restriction for selectors?

According to the EBNF, message references, variant expressions nor attributes of public messages cannot be used as selectors:

fluent/spec/fluent.ebnf

Lines 70 to 74 in 589a7e9

selector-expression ::= quoted-text
| number
| external-identifier
| private-attribute-expression
| call-expression

OTOH, attributes of private messages can be used as selectors because that's how we can allow choosing the right variant for genders etc. (encoded as attributes). Attributes of private messages are language-specific which means that localizers can add any number of them to private messages.

These facts combined, it's possible to work around the EBNF like so:

-brand-name = Firefox
    .hack-message-reference = { foo }
    .hack-variant-expression = { -brand-name[genitive] }
    .hack-public-attribute = { bar.label }

hack-demo =
    { -brand-name.hack-message-reference ->
       *[Foo] I used { foo } as the selector!
    }

Should we relax the EBNF?

Syntax for empty string value

Right now, there's no syntax for an empty string value, AFAICT.

One way to think about that would be to just use

my-text =

The other way is to try to create a non-"" AST value that evaluates to "". One candidate would be

my-text = {""}

But that's also not allowed, 'cause quote-text is '"' (text-char - '"' | '\"')+ '"', note the + instead of a *.

I didn't investigate what the fall-out of allowing * would be.

New syntax for meta-data

Goal

Provide a simple means for defining private meta-data for messages.

Description

Currently, meta-data can be added to messages by using traits. Traits without namespaces are considered private.

brand-name =Firefox
  [gender] masculine

#5 and #6 will simplify traits and we'll need a new way to encode meta-data.

The proposal is to use binary tags attached to the value:

#masculine
brand-name = Firefox

The benefit of the binary approach is that there's usually no need to name the property in question (gender).

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/dhWfBXHzuZI

Comment Levels

Goal

Allow comments to define the outline of the FTL file in a Markdown-like fashion.

Description

This proposal is inspired by Markdown and how multiple #, ##, ### etc. can be used to define headers which in turn define the outline of the document.

In FTL, the number of # would correspond to the comment level:

  • # stands for a level-1 (regular) comment: standalone or attached to a message,
  • ## stands for a standalone level-2 comment, or a group-level comment,
  • ### stands for a standalone level-3 comment, or a resource-level comment.

FTL currently allows sections ([[ Section Name ]]) for defining the outline of the FTL file. Entries under a section are not explicitly grouped in the AST but tools may use sections for grouping messages. Looking at some examples of sections in the wild it's clear that sections are mostly used for their comments. I propose to remove sections and replace them with group-level comments (##) to streamline this practice.

Example

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.

### Localization for Server-side strings of Firefox Screenshots
### Please don't localize Firefox, Firefox Screenshots, or Screenshots

## Global phrases shared across pages

my-shots = My Shots
home-link = Home
screenshots-description =
    Screenshots made simple. Take, save, and
    share screenshots without leaving Firefox.

## Creating page

# Note: { $title } is a placeholder for the title of the web page
# captured in the screenshot. The default, for pages without titles, is
# creating-page-title-default.
creating-page-title = Creating { $title }
creating-page-title-default = page
creating-page-wait-message = Saving your shot…

Details

A change to the comment level automatically starts a new comment. The above example could also be written without the blank line after ## Creating page:

## Creating page
# Note: { $title } is a placeholder for the title of the web page
# captured in the screenshot. The default, for pages without titles, is
# creating-page-title-default.
creating-page-title = Creating { $title }
creating-page-title-default = page
creating-page-wait-message = Saving your shot…

Only # comments can be attached to a message. In the above example we can also remove the blank line after ## Global phrases shared across pages:

## Global phrases shared across pages
my-shots = My Shots
home-link = Home
screenshots-description =
    Screenshots made simple. Take, save, and
    share screenshots without leaving Firefox.

Because ### would now uniquely identify resource-level comments, I would also like to remove the comment field from the Resource production and remove the parser heuristic that treats the first standalone comment as the resource comment.

Commenting out parts of the FTL file is still possible although it should use whitespace to make it unambiguous:

# ## Creating page
# # Note: { $title } is a placeholder for the title of the web page
# # captured in the screenshot. The default, for pages without titles, is
# # creating-page-title-default.
# creating-page-title = Creating { $title }
# creating-page-title-default = page
# creating-page-wait-message = Saving your shot…

Dependencies

If this proposal gets enough support, I'd like to reclaim # for comments and use a different sigil for tags (see #59).

We should also plan a transition period during which both the comment level syntax and the current section syntax are valid. This will make it much easier to handle migrations in live projects.

Signoffs

Syntax sugar for messages without value

In #63 I'd like to make the = after the identifier required.

foo-bar =
    .attr = Attribute Value

@zbraniecki suggested to also allow the syntax without the = as sugar.

foo-bar
    .attr = Attribute Value

I don't feel like this is an improvement or a convenience big enough to justify the syntax sugar. I'm not opposed to it per se, just not convinced at this point.

I'd like to suggest a different syntax shortcut, however. The following would parse as a Message with a null value and an Attribute.

foo-bar.attr = Attribute Value

This syntax sugar would only be available for writing messages with a single attribute. If more attributes are needed, the regular syntax would apply:

foo-bar =
    .attr1 = Attribute 1 Value
    .attr2 = Attribute 2 Value

My intention is to optimize the syntax for the common case of a message with a single attribute. The examples include HTML's input element or XUL's key element.

# Regular syntax.
username-input =
    .placeholder = Your username
open-file-command =
    .key = o

# Shorthand syntax.
username-input.placeholder = Your username
open-file-command.key = o

Specifying accesskey's

It would be great if you can specify which letter to use for the accesskey, instead of only specifying the accesskey. Otherwise you might end up with an accesskey in the middle of a word, while an accesskey at the start of a word is more visible. Also some words are better suited to use as accesskey, e.g. verbs.

Another use case may be accesskeys combined with variables:

move-again
.label = { Move to "{ $directory }" again }
.accesskey = a

Here you might end up with an accesskey in the directory placeholder, and it would be better to have it on a fixed place.

See also https://bugzil.la/1145116 for a similar bug on l10n.

Definition recursion: inline-expression vs argument

Hi,

I'm not familiar with language grammar but I find something weird with the inline-expression & argument definitions.

According to the ebnf,

  • inline-expression is defined by call-expression
  • call-expression is defined by argument
  • argument is defined by inline-expression

This is weird because it introduce recursion in parser implementation. Is it intended? Does call-expression should not be defined by named-argument instead?

To give context, I work on a Rust fluent parser.
https://github.com/ctjhoa/fluent-rs

Thanks

Implementers feedback on EBNF

I started looking into aligning the parser with the full EBNF and encountered a few gotchas:

First of all, the fact that inline-space and line-break are not single characters but ranges with + seems to be unnecessary. You still seem to be using them (and their aliases _ and NL) with quantifiers that seems as if you wanted to indicate if there is a single white space allowed, or multiple and if it is a + or * or ? quantifier.

Capturing multiple newlines within a single line-break just to put a ? for it in the caller, seems to be unnecessary and confusing.

On to details:

  • body
    ** As mentioned above, the _* is confusing with _ = []+
    ** It doesn't seem like it allows for whitespaces (except of NL) to be between entries. Is that intentional? Should an empty line with spaces between entries be a parser error?
  • comment
    ** there doesn't seem to be anything that captures NL which makes it not work with multiline comments
  • number
    ** doesn't seem to support negative numbers -100.23

That's it for now. Will wait for test infra to start implementing it and writing tests for the edge cases at the same time, so I expect I'll discover more.

How to create new custom functions as the built in functions...?

Hi again

I am trying to do this kind of function:

game_message = { FLAG($is_ready) ->
    *[inactive]  Insert coin
     [active]   Choose your hero
}

But no idea how to do it... I don't know how to get the flag_value:

    const functions = {
        FLAG(flag_value) {
            return flag_value? 'active' : 'inactive';
        }
    };
    const mc = new MessageContext(config.locale, { functions });

Forbid `-` at the begining of ID/Symbol

The parsers currently allow "-" in the middle of ID and Symbol, but not at the beginning.

So, ID such as -foo is not allowed, while foo-faa is. I'd like to propose adding this limitation to the syntax instead of removing it from the parser.

Define the behavior of backslash

In #12 (comment) I said we'd need to define the exact behavior of the backslash character \ for the purposes of escaping. This includes defining:

  • the list of known escape sequences (\ ( a space), \t, \n, \*, \[, \{, \u, \\, others?),

  • how the Unicode escapes work: is \u20 valid and the same as \u0020?

  • the behavior of unknown sequences, like \a (does the backslash take the following character out of the syntax parsing?),

  • the behavior for edge-cases, like:

      foo\bar = Foobar
    

    Is that a syntax error? If not, what is the name of the identifier?

      foo = Foo\
      bar = Bar
    

    Is that an escaped new-line?

Allow selectors to be lists

Goal

Allow for alternative syntax for nested variants.

Description

Nested variants are currently hard to write and result in some repetition. ICU's experience with MessageFormat also suggests that it is easier for localizers to work with full sentences as variants.

The proposed solution is to allow lists as selectors in select-expression and as variant keys:

key = { $items, $buckets ->
  [one, one]     Found 1 item in 1 bucket.
  [one, other]   Found 1 item in { $buckets } buckets.
  [other, one]   Found { $items} items in 1 bucket.
  [other, other] Found { $items} items in { $buckets } buckets.
}

Dependencies

#2, #3.

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/YBNCSp7J0zU

Unify the naming scheme

We now fairly intuitively jump between naming Fluent "library" and "framework" and "family of libraries". I recently noticed that gettext uses the term "localization system".

It's particularly confusing because on the wiki [1] we list "localization framework" as a non-goal, and then on fluent.js README.md [2] we state that it is a "localization framework".

I have to say I like how it sounds for us as well. Library or libraries are part of implementation of Fluent, and a framework is for me a wider term that takes a library and tools around it, but Fluent is even wider than that.

It introduces a new paradigm and methodology that guides holistically how localization should be done and informs any bindings, libraries, frameworks, tools and workflows based on its paradigms and features.

[1] https://github.com/projectfluent/fluent/wiki
[2] https://github.com/projectfluent/fluent.js

Relax the indentation requirement

This is a follow-up to #42. I'd like to go further and relax the indentation requirement for everything other than text content. This includes attribute and variant keys as well as expressions.

This doesn't mean that serializers would stop indenting message bodies for readability. We'll still need to define the style guidelines for linting FTL files in #27.

Thanks to #63 this change shouldn't hinder parsers' ability to recover at the next nearest message definition.

A few canonical examples of what would become possible:

multiline-text =
    Lorem ipsum dolor sit amet, consectetur adipiscing elit,
    sed do eiusmod tempor incididunt ut labore et dolore
    magna aliqua.

block-expr = { $num ->
   *[one]
        Lorem ipsum dolor sit amet, consectetur adipiscing elit,
        sed do eiusmod tempor incididunt ut labore et dolore
        magna aliqua.
}

And a few extreme ones (please don't write your FTL this way):

key1 = {
$num ->
*[one]
 One
}

key2 =
.attr = Attribute

key3 = {
key1
}

key4 = {
DATETIME
(
$date,
weekday
:
"short"
)
}

The goal is not to encourage people to write FTL like the above but rather, to accept as much input as possible while being as forgiving as possible.

considering using a namespace for fluent attributes (directives) instead of prefix

Many new frameworks are relying on directives rather than prefixing attributes. e.g.:

<p data-l10n-foo="x">
vs 
<p l10n:foo="x">

The benefit here is that namespacing the attribute can have a very specific meaning, including the ability to not really adding it to the DOM. While using data- could probably conflict with data attributes, which will force you to add it to the Element in the DOM, which will imply a perf penalty for something that doesn't really have to go to the DOM.

Additionally, there is a proposal for HTML to claim namespaces for attributes, which might help in the long run to apply special behavior to new custom elements who are receiving attributes with the corresponding namespace.

Cannot find definition of FTL acronym

It seems to at least be hard to find, in case it is documented anywhere.

How about putting it in the README.md, where it is likely first encountered.

Add ability to present a message as a simple string

In parts of UI (e.g. a list of messages), translation tools cannot present the entire FTL message due to space limitations. For pluralized messages, they usually only show the singular form. Similarly, they will only show the default variant for messages with multiple variants.

Fluent should provide a simplified presentation of each message in a string form. Either in AST directly or through a method taking AST as an argument. This will allow tool developers to quickly get the message presentation as a simple string, instead of writing their own logic, which will lead to inconsistencies among tools.

Private messages references confusing vs. negative number literal

In 0.5 we introduced:

-brand-short-name = Firefox

key = This is a { -brand-short-name }

which for parser is fairly non trivial to differentiate from:

key = This is a { -5 }

It's not impossible, but I think we didn't consider it before accepting - as a sygil. This pushes us further away from context-free syntax.

@stasm - are you ok with that?

Allow select-expression with an empty selector

Goal

Define variants of a message which can be accessed from other strings.

Description

We currently define variants of the message which are local and specific to the natural language using traits. For instance, we'd write:

brand-name =
   *[nominative] Firefox
    [genitive]    Firefox's

This is a bit weird because in the above example, "Firefox" is the real value of the message and nominative and genitive are the value's facets. However the value field of the message remains empty due to the fact that the facets are defined as traits. These traits also need to remain private which in turn necessitates a way to mark public traits as public; we currently use namespaces for this purpose.

The proposed change is to allow defining facets of a value as variants of a select-expression which doesn't have a selector:

brand-name = {
   *[nominative] Firefox
    [genitive]    Firefox's
}

This would also allow to remove namespaces from member keys (keywords). The member-expression would change the meaning: it would now be used to access a named variant of the first placeable in the value of the message. If that placeable isn't a select-expression or already has a selector, normal evaluation would follow.

about1 = About { brand-name }
about2 = About { brand-name[nominative] }

Dependencies

#1.

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/dhWfBXHzuZI

List formatting

Support for list formatting would be extremely useful. I have noticed that you had dropped support for it with reason that it is too complex.

529600f

But complexity can't be a reason to drop it entirely. Now we haven't any alternative.
There is some proposal for list formatting API in ecma402.

https://github.com/tc39/proposal-intl-list-format

It can be an argument to think about reimplementing this functionality.

Require = after the identifier

Right now the equals sing following the identifier = is part of the value production in the EBNF. Message without values are thus written like this:

foo-bar
    .attr = Attribute

I would like to change this to always require the = after the identifier. This would clearly demarcate the identifier and the indented body of the message, similar to Python's :.

foo-bar =
    .attr = Attribute

In the rare cases when the message should have an empty string for its value, the same explicit syntax as in #32 can be used:

foo-bar = {""}
    .attr = Attribute

Signoffs

incomplete unquoted-text definition

Currently the definition of an unquoted-text is as follows:

unquoted-text        ::= ([^{] | '\{')+;

source: https://github.com/projectfluent/syntax/blob/master/fluent.ebnf#L28

For most cases this is fine, however when just relying on this definition it breaks in following example:

opened-new-window = { brandName[gender] ->
 *[masculine] { brandName } otworzyl nowe okno.
  [feminine] { brandName } otworzyla nowe okno. }

In the example given above, it will read the entire line, including the '}' character, which is actually the closing bracket of the root placeable. This problem is fixed if you define an unquoted-text as follows:

unquoted-text        ::= ([^{}] | '\{' | '\}')+;

Remove tags

This is a counter-proposal to #59. If #62 is approved we could go further than only allowing tags on glossary/private messages and replace tags with attributes defined on private messages.

The main reason why tags exist is to encode language-specific data like grammatical genders etc. We decided to add a separate data structure so that it's easy to distinguish between data which is part of the message's public interface and data which is private. In other words, the use-case for tags is tooling.

With private/glossary messages we satisfy this use-case by marking the message as special. Tools can choose to check for existence of glossary messages but ignore their attributes. In such scenario, those attributes can be used to store language-specific information about the translation.

By removing tags we:

  • remove an entire data type which makes Fluent easier to understand,
  • free up the tag's sigil making the syntax easier to learn,
  • simplify the message lookup in selectors.

Wrt. the last point, right now tag lookup simply uses the message reference in the selector. Messages are not matched to variant keys by value but rather by their tags. In case more than one tag are present, any matching tag will select the variant. We haven't yet defined a way to AND tags in variant keys.

brand-name = Aurora
    #żeński

has-updated =
    { brand-name ->
        [męski] { brand-name} został zaktualizowany.
        [żeński] { brand-name } została zaktualizowana.
       *[inny] Program { brand-name } został zaktualizowany.
    }

Without tags, the above becomes a selector with an AttributeExpression on a private message:

-brand-name = Aurora
    .gender = żeński

has-updated =
    { -brand-name.gender ->
        [męski] { -brand-name} został zaktualizowany.
        [żeński] { -brand-name } została zaktualizowana.
       *[inny] Program { -brand-name } został zaktualizowany.
    }

Once we implement #4 it will become easy and consistent with the rest of the syntax to use more than one private attribute for the selection logic.

Signoffs

Require one variant to be the default

Goal

Guarantee that a select-expression always evaluates to a string.

Description

Placeables can contain select-expression whose value should always be a string. We can enforce this by requiring one of the variants to be the default using the * syntax.

The member-list class in the BNF grammar is used both for traits and variants. We will likely want to introduce a new class: variant-list.

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/cpVonPQS0sY

Enforce whitespace in multiline pattern (and comment?)

Currently, we allow for a single white space to precede the value of a multiline pattern and comment.
It works like this:

# Test
# Longer comment
key =
 | Value
 | Longer value

This was meant to increases readability, and I believe it does achieve its goal, but the fact that it's conditional, causes problems on the parser level.

Things like this have to be considered:

key =
 | Value
 |Longer Value
 | And more

key =
 |Value
 |Longer Value
 | And more

The most intuitive solution would be to look-ahead and deduct if all lines have at least one whitespace, and then remove the single space from each line of the block.

But for the sake of simplicity, why don't we just start with enforcing a single space after each "|" for now.

We can always introduce a more sophisticated algorithm later with forward compatibility, but simplify our lives now, without any visible drawbacks (I did not see a single case where someone would prefer to not use the whitespace).

Same applies to comments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.