Comments (5)
As far as I can tell, starlark-rust's current implementation of elems
does what the spec says elem_ords
should do.
elems
itself is interestingly specified... Rust doesn't really have a native concept of a byte string absent a charset in the way that Go does, so there isn't a particularly natural implementation for elems
...
from starlark-rust.
Probably the spec should say something about this, but I’m guessing the answer is UTF-8. cc @alandonovan
from starlark-rust.
There is consensus that it is infeasible to specify the string encoding in a way that does not impose an intolerable performance penalty on at least one implementation, so each impl is free to define the element type of string as it wishes. Starlark-Java's strings are sequences of UTF-16 codes, Starlark-Go's strings are sequences of UTF-16 codes, and judging from the test suite, Starlark-Rust's strings are sequences of Unicode code points. (A poor choice from a performance perspective, but a legal one.) All of the string operations are defined in terms of elements, not bytes/chars/codepoints. The elem_ords method for Starlark-Rust should return a sequence of code points as integers.
Update: to the Starlark program, the Rust implementation of string appears to a be a sequence of codepoints, but in fact its representation is a Rust UTF-8 byte string, and all the Starlark string operations are calling the String.chars method to decode on the flag. Random-access operations that should be constant time, such as s[i] and s[i:j], become linear time. I suggest you expose the UTF-8ness directly, so strings behave just like in the Go implementation. It'll be much more efficient.
from starlark-rust.
The elem_ords method for Starlark-Rust should return a sequence of code points as integers.
Yeah, elem_ords
is easy - bytes can be represented as numbers in an obvious way; the question here is about elems
- it appears to return a string representation of arbitrary single bytes. In the Go implementation, these end up as strings like \xe4
, but because rust treats strings much more as utf-8 and much less like bytes, there isn't an obvious string-ified version of a byte. We could use an escaped string \\xe4
, but that is more of a debug representation - if you were to join it with some other bytes, there would be slashes, rather than code-points merging into multi-byte characters.
from starlark-rust.
Ah, so Rust strings are represented using bytes but may contain only valid UTF-8 encodings. In that case you have two choices: treat Starlark strings as sequences of Unicode code points, in which case indexing goes from constant time to linear-time (ugh). Or, treat Starlark strings as arbitrary byte strings, and provide conversions at the boundary with Rust. Converting a Rust string to a Starlark string is O(1): just alias the bytes. Converting a Starlark string to a Rust string requires scanning it for UTF-8 validity and returning an error or a Rust string (ugh, but less so). The API should provide users with a way to access the raw bytes in the non-UTF-8 case.
from starlark-rust.
Related Issues (20)
- Add some examples of crate usage HOT 2
- Switch starlark-repl to structopt? HOT 3
- Accessing local variables in a Starlark module. HOT 4
- Single quotes within triple quoted strings swallow character following the quote HOT 1
- Make list += inline
- Consider releasing a new version HOT 2
- Does `TypeValues` transition violate Starlark specification? HOT 6
- starlark-rust doesn't build HOT 6
- x[0]=x produces a "recursive data structure" error HOT 1
- crash (stack overflow) when creating a deeply nested data structure HOT 1
- How to implement rust-native functions outside of starlark crate? HOT 2
- Debug print statement in starlark::environment::TypeValues::get_type_value() HOT 4
- Should `FileLoader::load` return `Result<..., Diagnostic>` instead of `Result<..., EvalException>`? HOT 1
- Working with eval'd types for unit tests HOT 2
- FileLoader should ideally use Path/PathBuf's instead of `&str` HOT 3
- Diagnostics don't seem to report file they're loaded from HOT 2
- Some Error Codes Collide HOT 1
- Question: Unknown cause of mutable borrow in environment HOT 1
- scanner: scanner does not ignore an escaped carriage return HOT 1
- Consider handing the crates to https://github.com/facebookexperimental/starlark-rust/ HOT 20
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from starlark-rust.