projectfluent / fluent-rs Goto Github PK

View Code? Open in Web Editor NEW

995.0 17.0 93.0 1.31 MB

Rust implementation of Project Fluent

Home Page: https://projectfluent.org

License: Apache License 2.0

Rust 50.77% Fluent 49.23%

l10n localization rust ftl plural internationalization i18n

fluent-rs's People

Contributors

Stargazers

Watchers

Forkers

zbraniecki stasm glendc alexxnica kryndex nvzqz behnam nox waywardmonkeys criloz johndoneth mmstick unclenachoduh keats xd009642 markpritchard clikengo johnheitmann openstax-poland arathunku kazimuth zyxxoo jakubadamw 3c1u manishearth kgv cmyr inferiorhumanorgans xampprocky ankitects fairingrey alerque csnover michael-f-bryan djg str4d age-rs dminor fhoehle transparencies isgasho pike adamaq01 mathjazz emilio voultapher someguynamedjosh evant nordzilla rumovz seanpm2001 seanwallawalla-forks seanpm2001-all avitex kirinse d34db4b3 kovel saona-raimundo oliver-ni joseluis juliancoffee mathieutricoire timjentzsch waldmatias gregtatum makotokato gybrish oyelowo nftico jasperdesutter squ1dd13 teamhartex eemeli carbon-vault iq-scm bbqsrc necessary-nu ernstvanderlinden progval martinvonz x0f5c3 dtolnay-contrib agriconnect holgergottchristensen getong urgau pitbuster wiiznokes clubby789 feefladder pi-cla

fluent-rs's Issues

Renames

In JS we recently performed a number of renames in preparation for 1.0 - projectfluent/fluent.js#276

We should apply them to Rust.

We should revise our error UX to keep it in sync with python and JS while using the very high standards set by rust error display.
Last time I was working on that, the rust team said they're looking into extracting a generic slice display library for errors out of rust. Maybe they did, maybe we can push them to, maybe we can release one on our own.

Generalize PluralRules

Until we can get our hands on a minimal CLDR/PluralRules API I believe it would be good to mock it much like l20n used to - https://github.com/l20n/l20n.js/blob/v3.x/src/lib/plurals.js

This is important as plural rules are the most common variant selector and without them the library is not of much use.

With such a minimal approach we can lobby for a more complete solution to be prioritized by the Rust team.

"www" in project description's URL leads to an invalid-certificate page

Open "https://projectfluent.org" from projectfluent/fluent => home page of the project
Open "https://www.projectfluent.org" from projectfluent/fluent-rs => big "this connection is not private" warning from my browser

ResolveValue::to_value should return a Result

Right now it returns an Option. By returning a Result instead we could capture error messages and fallback values.

Design an AST for fluent-rs

Let's start an issue to design how we want our AST to look like.

Call expressions

We need to add CallExpressions support to our resolver and some built-in functions for things like date formatting (it's ok if we mock them for now)

Consider replacing RefCell<Vec> with a HashSet in Scope

Could this instead be a HashSet with us removing the element when done? That seems faster on the .contains() call, plus we don't have to manually hash anything.

The only hazard with this is that cycles will reuse the same entry for the same element, however you forbid cycles above so it's fine.

The Vec isn't necessary here since we already have the call stack for keeping track of what got pushed last. A Vec would be useful if we decided to abandon recursion in favor of iteration, but if we're not doing that we should just use a HashSet and take the win on the .contains() call

Originally posted by @Manishearth in #93

Can't publish to creates.io

$ cargo publish --dry-run
    Updating registry `https://github.com/rust-lang/crates.io-index`
error: all dependencies must come from the same source.
dependency `itertools` comes from https://github.com/bluss/rust-itertools instead

What would it take to support stable Rust?

Hello! fluent looks like a really amazing library, and I'd love to use it to localize the open source Rust substudy application. Unfortunately, substudy needs to be able to build under stable Rust (to make it easier for people to install it), and both fluent and fluent-locale use unstable Rust features for things like box_patterns.

Is there any plan to produce a version of fluent that is compatible with stable Rust? I'd love to be able to use it at some point. And if so, would you be interested in a PR that made it work?

Release 0.1

Checklist:

MessageContext.add_messages should not fail

I noticed that #55 changed the signature of MessageContext.add_messages to -> Result<(), FluentError>.

fluent-rs/fluent/src/context.rs

Lines 162 to 169 in 2d9ddb0

 match self.map.entry(id.clone()) { 

 HashEntry::Vacant(empty) => { 

 empty.insert(entry); 

 } 

 HashEntry::Occupied(_) => { 

 return Err(FluentError::Overriding { kind, id }); 

 } 

 }

I don't think this is right. The Fluent philosophy is to be tolerant of errors found in localization resources. If the localizer happens to duplicate a message, MessageContext should not reject to whole resource.

I noticed this thanks to cargo test reporting a lot of warnings about an unused Result from add_messages. If we end up keeping the Result, we should at least fix the warning.

`FluentValue::into_number` may cause issues with decimal conversions

impl FluentValue {
    pub fn into_number<S: ToString>(v: S) -> Result<Self, ParseFloatError> {
        f64::from_str(&v.to_string()).map(|_| FluentValue::Number(v.to_string()))
    }
    // ...
}

https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/types.rs#L32

f64 has limited precision which may not accurately represent an input number, for instance if I'm using a d128 to represent currency.

It might be worth writing a custom parser that validates inputs are usable numbers without modifying them.

Add pseudolocales

We should enable basic pseudolocales ported from JS

MessageContext::new should take a slice as its locales argument

MessageContext instances don't need to own locales which are currently passed as a vector. Let's change it to taking a slice.

Documentation link does not work

Implement calling built-in Functions

CallExpressions currently don't implement the ResolveValue trait. In the first iteration let's add a simple NUMBER() built-in which doesn't even take any additional arguments.

@zbraniecki What are our options for locale-aware number- and date-formatting?

Consider rewriting the parser to make it copy-free

At the moment the parser is copying some parts of the slice while creating the AST.

Most Rust parsers I can find in the wild rather rely on building AST that stores slices and aim for zero-copy mode.
Last example: https://github.com/viperscape/lichen/blob/master/src/parse.rs

I think we could do this here and it would probably benefit performance and memory. The cost is that if you want to modify the AST you'll have to copy it, but that seems to be a reasonable cost.
Most use cases only need to read the AST from what I know.

Consider readding unsafe slicing in the parser

Thanks for removing the unsafe block here, at least for now. Would you mind filing a follow-up for discussing re-adding it, please? I'd like to look into it a bit more, given that we're likely protected from invalid UTF8 input by the sheer fact of accepting String into FluentResource::try_new.

Originally posted by @stasm in #76

Bring fluent-bundle test coverage above 90%

With 0.5 we got fluent-syntax to ~95% coverage! Now it's time to do the same to bundle

https://coveralls.io/github/projectfluent/fluent-rs?branch=master

Align parser testing with fluent-syntax in JS and Python

We currently test the parser using separate methodology and fixtures from what JS and Python do.

I'd like to unify this for 0.2 to make it easier to maintain a single corpus of test fixtures and make it easier to keep parsers in sync.

Release 0.2

Checklist:

Consider switching FluentSyntax::AST to use Cow instead of &str

AST in FluentSyntax currently uses &str which means that its basically read-only. As we mature the fluent-syntax crate we'll likely want to add a serializer and ability for users to operate on the AST.

There are two ways we could approach it:

We could add OwnedMessage which would be added to the Resource::body like any other node and a way to copy Message to a new OwnedMessage. This would be quite limiting, but likely sufficient for most use cases.
We could switch the AST to use Cow<&str> which means it'll still handle &str for all basic parsing/runtime work, but allow it to be replaced with owned String when modified/added.

(2) sounds tempting, but I'm not sure what are performance/memory consequences.

An idea to leverage the type and macro system

This is an idea that I had that leverages the Rust type system and macro system, rather then using stringly-typed arguments (well sort of). It's inspired by Diesel.

First the client that the user has to write:

translate_module!("/path/to/translation/files");

translate!(English, "hello-world");
translate!(Dutch, "intro", "Thomas");

(I know it's still looks stringly-typed, but under the hood it isn't).

The translate_module will create a new module based on the provided directory. The only (public) API this will define is the translate macro, which has the following API:

macro_rules! translate {
    ($lang:tt, $msg:expr, $($arg:tt)*) => { ... };
}

It takes a Language item (will get back to that), a message (just like MessageContext.get_message now) and optional arguments used to format the message (like MessageContext.format).

Now to use Rust's type system. First we'll start off by generating an index or hash for each available translation message, something like:

const _TRANSLATE_HELLO_WORLD: usize = 0;
const _TRANSLATE_INTO: usize = 1;

And the message in an array or a hash map, for each language:

// This could also be hash maps or something.
_TRANSLATIONS_ENGLISH: [&'static str; 2] = ["Hello, world!", "Welcome, { $name }."];
_TRANSLATIONS_DUTCH: [&'static str; 2] = ["Hallo, wereld!", "Welkom, { $name }."];

We can check at compile time if the index/hash and message exists and if not fail the compilation. The Language item provided to translate will define what array/hash map to used, e.g. _TRANSLATIONS_DUTCH for Dutch.

Next for the formatting we'll have a single function inside ether the fluent crate or the generated module, to which the translation string and the other provided arguments to format the message gets passed. Something like this:

// translate!(English, "hello-world") translates into:
translate(_TRANSLATIONS_ENGLISH[_TRANSLATE_HELLO_WORLD])
// translate!(Dutch, "intro", "Thomas")
translate(_TRANSLATIONS_DUTCH[_TRANSLATE_INTO], "Thomas")

Looking forward to a reply, although I don't expect it very soon it got quite long.

parse_fixtures_compare test fails on Windows due to different line endings

Jared Wein@LAPTOP-BRR0AH2E ~/Documents/GitHub/fluent-rs/fluent-syntax
$ cargo build
Updating crates.io index
Compiling fluent-syntax v0.1.1 (C:\Users\Jared Wein\Documents\GitHub\fluent-rs\fluent-syntax)
Finished dev [unoptimized + debuginfo] target(s) in 17.25s

Jared Wein@LAPTOP-BRR0AH2E ~/Documents/GitHub/fluent-rs/fluent-syntax
$ cargo test
Downloaded glob v0.2.11
Downloaded criterion v0.2.9
Downloaded serde_json v1.0.37
Downloaded assert-json-diff v0.2.1
Downloaded rand_xoshiro v0.1.0
Downloaded atty v0.2.11
Downloaded csv v1.0.5
Downloaded itertools v0.8.0
Downloaded rayon v1.0.3
Downloaded serde_derive v1.0.85
Downloaded serde v1.0.85
Downloaded rand_core v0.3.1
Downloaded walkdir v2.2.7
Downloaded cast v0.2.2
Downloaded criterion-plot v0.3.0
Downloaded clap v2.32.0
Downloaded lazy_static v1.2.0
Downloaded tinytemplate v1.0.1
Downloaded byteorder v1.3.1
Downloaded rand_core v0.4.0
Downloaded rayon-core v1.4.1
Downloaded rand_os v0.1.1
Downloaded bitflags v1.0.4
Downloaded winapi-util v0.1.1
Downloaded unicode-width v0.1.5
Downloaded quote v0.6.11
Downloaded either v1.5.0
Downloaded num-traits v0.2.6
Downloaded winapi v0.3.6
Downloaded textwrap v0.10.0
Downloaded same-file v1.0.4
Downloaded itoa v0.4.3
Downloaded crossbeam-deque v0.2.0
Downloaded proc-macro2 v0.4.26
Downloaded ryu v0.2.7
Downloaded csv-core v0.1.5
Downloaded libc v0.2.48
Downloaded crossbeam-utils v0.2.2
Downloaded crossbeam-epoch v0.3.1
Downloaded unicode-xid v0.1.0
Downloaded memchr v2.1.3
Downloaded cfg-if v0.1.6
Downloaded scopeguard v0.3.3
Downloaded memoffset v0.2.1
Downloaded nodrop v0.1.13
Downloaded arrayvec v0.4.10
Downloaded syn v0.15.26
Downloaded num_cpus v1.9.0
Compiling arrayvec v0.4.10
Compiling winapi v0.3.6
Compiling nodrop v0.1.13
Compiling libc v0.2.48
Compiling proc-macro2 v0.4.26
Compiling cfg-if v0.1.6
Compiling memoffset v0.2.1
Compiling lazy_static v1.2.0
Compiling memchr v2.1.3
Compiling ryu v0.2.7
Compiling unicode-xid v0.1.0
Compiling serde v1.0.85
Compiling scopeguard v0.3.3
Compiling rayon-core v1.4.1
Compiling byteorder v1.3.1
Compiling unicode-width v0.1.5
Compiling num-traits v0.2.6
Compiling rayon v1.0.3
Compiling itoa v0.4.3
Compiling either v1.5.0
Compiling rand_core v0.4.0
Compiling cast v0.2.2
Compiling bitflags v1.0.4
Compiling glob v0.2.11
Compiling crossbeam-utils v0.2.2
Compiling textwrap v0.10.0
Compiling itertools v0.8.0
Compiling rand_core v0.3.1
Compiling num_cpus v1.9.0
Compiling clap v2.32.0
Compiling crossbeam-epoch v0.3.1
Compiling quote v0.6.11
Compiling rand_xoshiro v0.1.0
Compiling criterion-plot v0.3.0
Compiling csv-core v0.1.5
Compiling syn v0.15.26
Compiling crossbeam-deque v0.2.0
Compiling winapi-util v0.1.1
Compiling atty v0.2.11
Compiling rand_os v0.1.1
Compiling same-file v1.0.4
Compiling serde_derive v1.0.85
Compiling walkdir v2.2.7
Compiling serde_json v1.0.37
Compiling csv v1.0.5
Compiling tinytemplate v1.0.1
Compiling assert-json-diff v0.2.1
Compiling criterion v0.2.9
Compiling fluent-syntax v0.1.1 (C:\Users\Jared Wein\Documents\GitHub\fluent-rs\fluent-syntax)
Finished dev [unoptimized + debuginfo] target(s) in 46.62s
Running c:\Users\Jared Wein\Documents\GitHub\fluent-rs\target\debug\deps\fluent_syntax-50e75f5cdb1124cd.exe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

 Running c:\Users\Jared Wein\Documents\GitHub\fluent-rs\target\debug\deps\parser_fixtures-cf1c70fdc28839b7.exe

running 2 tests
test parse_fixtures_compare ... FAILED
test parse_fixtures ... ok

failures:

---- parse_fixtures_compare stdout ----
Parsing: "tests\fixtures\any_char.ftl"
Parsing: "tests\fixtures\astral.ftl"
thread 'parse_fixtures_compare' panicked at '

json atoms at path ".body[8].content" are not equal:
expected:
"err-😂 = Value\n\n"
actual:
"err-😂 = Value\r\n\r\n"

json atoms at path ".body[10].content" are not equal:
expected:
"err-invalid-expression = { 😂 }\n\n"
actual:
"err-invalid-expression = { 😂 }\r\n\r\n"

json atoms at path ".body[12].content" are not equal:
expected:
"err-invalid-variant-key = { $sel ->\n *[😂] Value\n}\n"
actual:
"err-invalid-variant-key = { $sel ->\r\n *[😂] Value\r\n}\r\n"

', fluent-syntax\tests\parser_fixtures.rs:17:5
note: Run with RUST_BACKTRACE=1 for a backtrace.

failures:
parse_fixtures_compare

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--test parser_fixtures'

Jared Wein@LAPTOP-BRR0AH2E ~/Documents/GitHub/fluent-rs/fluent-syntax
$

Simple ResourceManager

With 0.5 release, we reorganized our memory management story for resources, which now closer ties with the idea of using a ResourceManager to retrieve (and optionally cache) resources in memory.

I'd like to introduce a simple, file based resource manager for Fluent that can be used for simple projects.
This should lower the learning curve and help people kickstart their projects with less overhead. They would also likely want to use the example ResourceManager as a model for their own integration into their system.

I have a branch that adds a simple ResourceManager, but I'm also developing a more sophisticated one for Mozilla's needs and I'd like to consider bringing fluent-resmgr as a base for l10nregistry-rs.

Decouple locales lifetime from bundle

Hi,

We are exploring using Rust to accelerate our Ruby app (internationalisation shows up as a bottleneck in profiles) and came across fluent-rs.

Thanks very much for creating and maintaining the library - it looks like a perfect fit for our use case.

As a disclaimer, I'm quite new to Rust so may have misunderstood this, but we ran into a problem with dynamically constructing FluentBundle instances and storing them in a map (e.g. keyed by a locale string we receive in a browser header).

The issue appeared to be that the locales input parameter to FluentBundle::new is tagged with the same lifetime as the bundle itself. #68 adds an explicit, separate lifetime to the locales parameter.

Let me know if there are any issues with this PR (its my first one), and I'll be happy to address them.

Cheers,

Mark

Protect against cyclic references

See the skipped test added in 165a75c.

Remove unwrap from Resolver

Resolver code uses some unwrap which should be replaced with error handling.

Reduce allocation

The fluent API is currently very allocation-intensive. It could be possible to reduce allocations by e.g. changing the arguments of format from HashMap<&'a str, FluentValue> to something like &'a [&'a str, FluentValue<'a>], and letting FluentValue store references instead of owned strings.

Why using reference against move in FluentBundle::add_resource()?

TL;DR: Why does the FluentBundle::add_resource() function takes a reference of FluentResource, rather than directly taking the value/ownership (move?)? The FluentResource is useless outside of FluentBundle, as far as I'm concerned.

I'm working on a fluent-rs wrapper (as a part of something else) which takes care of all the fluent loading and FluentBundle object constructing, because of #64 .
Because the project uses Rocket and the fluent part is used in/as a fairing, everything here (FluentBundle & FluentResource) should live inside the fairing or being static (which is not possible(?)). It makes a lot of sense if FluentBundle can take ownership of all its FluentResources and make the fairing (struct) only store the FluentBundle.
So why doesn't FluentBundle::add_resource() have the value as the parameter? The only reason I can think of is to have the same FluentResource passed to two FluentBundles. But this doesn't make sense because FluentBundle::format() can return None and the programmer can fallback to another FluentBundle.

(Since I still don't have much experience in rust, I never find a way to make the FluentResource live in the fairing struct with FluentBundle (in the same struct) pointing to it (neither do I know if this is possible at all)... It would be much easier to new people to use fluent-rs if this investigation can be avoided.)

Add FSI/PDI marks support

We should port the withIsolation part for the Resolver.

Remove MessageContext.get_message

This is similar to projectfluent/fluent.js#208.

I'd like to change MessageContext.format() to accept an identifier of a message and format it directly to a string. It would avoid the intermediate step of having to first get_message the message from the context, which is a bit clunky and is an artifact of how the JavaScript bindings retrieve translated attributes.

0.6

Update syntax to 0.9

Is it safe to expose ids?

I got vaguely worried seeing the error fallback of outputting message ids as formatted messages in error cases. I don't think developers will be thinking about their ids being customer visible when they are creating them. I admit I'm hard-pressed to come up with a great example of how this might cause harm, but I wanted to make sure the discussion happened just in case.

Mild worries:

Internal code names in ids
ids that reflect unreleased secret features, e.g. rebranded-title = ...'
Ids that seem insensitive without internal/cultural context (or that are just plain insensitive), e.g. dickbar-title = ...

Maybe the Fluent spec itself should call out id-exposing error handling? I'll try to clarify it in the fluent-rs docs.

Consider removing `last_non_blank`

Yeah, but why? :) What does last_non_blank keep track of?

Originally posted by @stasm in #76

Review documentation

With the upcoming 0.5 release, we should have all the low-level bits and pieces in place. It'll be a great time to review the docs and make sure that both the API docs and tutorials are clean and guide the aspiring user through the decisions they'll have to make as they integrate localization framework.

Switch benchmark code to use criterion

Criterion seems to be a much better benchmarking env than the default one. We should use it.

Unify and document lifetimes used accross the codebase

As per reviewers feedback, we should list the lifetimes and document them.

Consider switching FluentBundle to take Rc<FluentResource>

Currently, we create FluentBundle passing references to FluentResource. That requires a quite a bit of sophisticated life time hand holding. We could instead handle Rc<FluentResource> which would make lifetimes easier.

Handle errors from parser and resolver

This issue is a super-set of #62. We should unify how we handle errors from parser and resolver in the context.

We currently return Result from add_messages and Option from format and format_message.

I expect that what we really want are:

Result from add_messages which would return all parser errors collected during parsing and context errors collected during context building (currently, duplicated entry)
Option<Result<>> from format and format_message where Option handles whether the value was there, and Result would return the formatted message or the best string we can product and errors recorded during production.

Consider storing FluentNumber as String

Our new intl-pluralrules implementation can handle on input any type that can be stringified. That means that you can do pr.select(5); or pr.select(5.0); or pr.select("5");.

FluentNumber is stored as f32 which means that it always will have the decimal portion.

That means that a number encoded in FTL as 5 will get stored as FluentNumber(5.0) and then passed to PluralRules as such.
A better way will probably be to store FluentNumber as string, so that passed to PluralRules it preserves its form.

Add a lazy cache for Intl objects on MessageContext

Now that we have a proper PluralRules object, we should cache it on MessageContext.

Support custom functions in MessageContext

We want to support passing arbitrary generic functions as arguments to MessageContext constructor to allow developers to provide builtins into localization contexts.

Please dual-license

The gold standard for Rust projects is to dual license them under the Apache license and MIT. Most Rust crates are licensed that way. The Apache license gives users better patent protection, while the MIT is compatible to (L)GPL version 2 only projects. It would be nice if you would switch the license to dual license. The earlier the easier it is :).

Thanks!

Decode raw StringLiteral into unicode in Resolver

Originally posted by @stasm in #76

Introduce lexer

In my early experiments, lexer seems to have a very nice perf impact on parsing.

I'll investigate more, but if someone gets to it first, feel free to take it!

A good background read - https://medium.com/@retep007/javascript-lexing-for-high-performance-f9a800ec930d

Inconsistent error handling

If my bundle has the message:

foo = { bar }

Then format('foo') returns ('___', []). That alone was surprising. It seems like it's common behavior for most "stuff is missing" problems. Either None or ('___', [ warnings]) feel better to me. Middle layers should be able to learn and log about message glitches.

If my message was instead:

foo =
    -attr= something

Then format('foo') returns ('foo', [error]). Those two kinds of errors feel like they should be handled the same way.

To get the best of graceful degradation, clear error signaling, and consistent error handling, a good return type might be (string, [warnings]). If you want to let higher layers skip a pointer chase for the NotFound case, then maybe instead of [warnings] you can use something like:

enum Warning {
    None,
    MessageNotFound,
    IncompleteMessage([details])
}

`MessageContext::AddResource`

In JS we now have addMessages which takes an FTL string, and addResource which takes a parsed AST.

We should add the latter.

Confusion of using in a multi-locale environment

I guess the goal of fluent-rs is to provide a full functioning i18n system with fluent syntax, so the environment of multi-locale will be encountered.

The signature of MessageContext::new() takes a slice of strings as the parameter, which is said to

represents the best possible fallback chain for a given locale

However, in the next section:

MessageContext stores messages in a single locale, but keeps a locale fallback chain for the purpose of language negotiation with i18n formatters.

The design seems to be conflicting with itself - if MessageContext only stores value of one locale, how does it do locale fallback?

This leads to quite a confusion for me, and I don't really know how to use fluent-rs in a multi-locale environment where fallback is really needed.

	match self.map.entry(id.clone()) {
	HashEntry::Vacant(empty) => {
	empty.insert(entry);
	}
	HashEntry::Occupied(_) => {
	return Err(FluentError::Overriding { kind, id });
	}
	}

projectfluent / fluent-rs Goto Github PK

fluent-rs's People

Contributors

Stargazers

Watchers

Forkers

fluent-rs's Issues

Recommend Projects

Recommend Topics

Recommend Org