Comments (6)
Aargh, I should have written the rationale in the README. ;) First, the different behavior is due to the grouping of the sequences (sequences marked *
are invalid, so converted to U+FFFD
):
what WHATWG expects: [BF EC] [BF CD] [FF]* [BE D3]
what rust-encoding expects: [BF EC] [BF CD] [FF BE]* [D3]*
Both are acceptable for most usage. The latter behavior was chosen for simplifying the API; if the former behavior were default, the EncoderError
would look like this:
pub struct EncoderError<'self> {
remaining1: ~str, // a portion of the input consumed by the prior `feed` and should be re-fed
remaining2: &'self str, // a portion of the input given to the current `feed` and not processed
problem: ~str,
cause: ~str,
}
For example, the UTF-8 sequence EA B0 7F
is invalid, but if the caller gave EA B0
and 7F
in the different calls, the second call should return the remaining bytes starting with B0 7F
which first byte is not in the current input. This is rather inconvenient, so I chose to eliminate remaining1
and keep only remaining2
(aka remaining
in the current API).
I admit the entire idea of the custom error range is hard (we can't have an arbitrarily large error range anyway, maybe that was the rationale of the WHATWG behavior), and if we can have a better design I strongly vote to revert this behavior.
from rust-encoding.
@annevk, fyi
from rust-encoding.
FWIW, browsers do not seem to have interop on error handling: data:text/plain;charset=utf8,%EA%B0%33
gives �3
in Firefox and ��3
in Chrome.
from rust-encoding.
FWIW, utf-8 error handling is aligned with the best practice recommendation of the Unicode standard (as noted in the Encoding Standard).
from rust-encoding.
I’ll just leave this here: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
from rust-encoding.
I've got a new error handling semantics compatible to the Encoding standard as of a5b0938. It is intertwined with the removal/renewal of TextEncoder/TextDecoder (#4), so it cannot be merged right now, but this issue is now close to be resolved.
from rust-encoding.
Related Issues (20)
- `encoding::Encoding` cannot be shared between threads safely HOT 1
- warning: private trait in public interface (error E0445) HOT 1
- `all::encodings()` returns an errornous list (and should be sorted alphabetically). HOT 4
- Charset request: ArmSCII-8 HOT 4
- How to Reset a RawDecoder HOT 1
- Implement common traits for Encoding HOT 1
- include LICENSE text to subcrates HOT 3
- C1 are part of ISO-8859-1 (as far as the IANA is concerned)
- Use Cow?
- cp437 HOT 1
- Add Support For CP850
- Warnings emited when building
- Performance: Consider replacing lookup tables with match statements or binary search in single byte index
- request for no_std support
- Abandoned? HOT 2
- to GBK and to UTF8 is not right work
- Community activeness
- Use FM-index & json for DBCS decoding. HOT 3
- Need the performance data for rust-encoding. HOT 1
- Issue with multi-codepoint graphemes HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-encoding.