👋
samthor / fast-text-encoding Goto Github PK
View Code? Open in Web Editor NEWFast polyfill for TextEncoder and TextDecoder, only supports UTF-8
License: Apache License 2.0
Fast polyfill for TextEncoder and TextDecoder, only supports UTF-8
License: Apache License 2.0
👋
Source:
https://github.com/samthor/fast-text-encoding/blob/master/src/polyfill.js#L8-L9
scope['TextEncoder'] = scope['TextEncoder'] || FastTextEncoder;
scope['TextDecoder'] = scope['TextDecoder'] || FastTextDecoder;
Output:
https://github.com/samthor/fast-text-encoding/blob/master/text.min.js
l.TextEncoder = m;
l.TextDecoder = k;
Should the above read this instead?
l.TextEncoder = l.TextEncoder || m;
l.TextDecoder = l.TextDecoder || k;
A minor note, but it looks like the LICENSE file includes the template but doesn't fill it out:
Line 189 in 0423145
FastTextDecoder and FastTextEncoder should also support a utfLabel
argument with the value "utf8". Currently the constructors throw an error if the argument is not exactly equal to "utf-8" (https://github.com/samthor/fast-text-encoding/blob/master/text.js#L36).
According to https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/TextDecoder, both "utf8" and "utf-8" are acceptable values for utfLabel
Some libraries, e.g. https://github.com/kripken/sql.js, pass the value "utf8" to the TextDecoder (or Encoder) constructor.
I found this issue by accident, since our project uses the following libraries (amongst others):
TypeORM attempts to require sql.js, which then fails with this error:
RangeError: Failed to construct 'TextDecoder': The encoding label provided ('utf8') is invalid.
Apparently by some strange coincident this issue has not occurred earlier in our codebase. Perhaps it has something to do with the order in which some transitive dependencies are required.
I'm sorry. I really am. It appears that you have put a lot of time into this project. Nevertheless, it appears there is now a polyfill that is apparently better in every aspect here.
TextEncoder.prototype.encodeInto
polyfillTextEncoder
-only fileTextDecoder
-only fileI realize that you have put a lot of work into this polyfill, and I respect the amount of work you have put into it. I hope you understand.
(Disclaimer: I created FastestSmallestTextEncoderDecoder)
time for deprecation
and archiving the Github repo?
all green env. have text encoder/decoder nowdays
when I use
new TextDecoder().decode([72, 101, 108, 108, 111])
I get [Error: Cannot create URL for blob!] in react native.
Any solution?
The call to String.fromCharCode.apply(null, out) passes each char code as an argument on the stack. This will cause IE11 to generate an out of stack space error if the string is several hundred k long. Other browsers have similar limits.
In the closure compiler, they work around this by doing chunks at a time.
When dealing with certain buffers from other systems, the string is padded with null
. The native implementations preserve this and simply copy the null
s over to the string, ensuring that the length is correct.
This part of the decode does not allow that to happen in fast-text-encoding
:
Lines 149 to 151 in 227d7d2
Some packages that's depend on this package won't work on Node.js
RangeError: Failed to construct 'TextDecoder': The encoding label provided ('utf-16le') is invalid.
at new k (.../node_modules/fast-text-encoding/text.min.js:1:134)
at Object.<anonymous> (.../node_modules/rustbn.js/lib/index.asm.js:1:17613)
The reason is fast-text-encoding
package didn't support utf-16le
This is a feature request/clarification?
fast-text-encoding/src/lowlevel.js
Lines 93 to 97 in 60d0a6c
charCodeAt
gives the UTF-16 codepoints and you need to deal with surrogate pairs.
The charCodeAt() method returns an integer between 0 and 65535 representing the UTF-16 code unit at the given index.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt
For https://en.wikipedia.org/wiki/UTF-8 encoding you just need the Unicode codepoint which is what codePointAt
provides.
The codePointAt() method returns a non-negative integer that is the Unicode code point value at the given position.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/codePointAt
Am I missing something? I am not an expert on this topic.
I wrote an implementation that uses codePointAt
instead of charCodeAt
.
https://jsfiddle.net/pzh6ofnj/2/
const f = (s) => new Uint8Array([...s].map(c => c.codePointAt(0)).flatMap(x => {
if (x < 0x80) {
// first 128 code points need 1 byte
return x;
}
if (x < 0x800) {
// next 1920 code points need 2 bytes
return [((x >>> 6) & 0x1F) | 0xC0, (x & 0x3F) | 0x80];
}
if (x < 0x10000) {
// next 63488 (really only 61440 are in use) code points need 3 bytes
return [((x >>> 12) & 0x0F) | 0xE0, ((x >>> 6) & 0x3F) | 0x80, (x & 0x3F) | 0x80];
}
// rest need 4 bytes
return [
((x >>> 18) & 0x07) | 0xF0,
((x >>> 12) & 0x3F) | 0x80,
((x >>> 6) & 0x3F) | 0x80,
(x & 0x3F) | 0x80,
];
}));
function main() {
// const s = 'abcd😊efgh\n012345689\t€\r🧑🏽🍳helloworld!';
const s = '$£Иह€한𐍈';
console.log(s);
const arr = f(s);
console.log(Array.from(arr).map(x => x.toString(16)));
// test against TextEncoder
{
const expected = new TextEncoder().encode(s);
console.log(Array.from(expected).map(x => x.toString(16)));
const actual = Array.from(arr);
if (Array.from(expected).some((x, i) => x !== actual[i])) {
throw new Error('the encoded bytes do not match');
}
}
// test against TextDecoder
{
const actual = new TextDecoder().decode(arr);
console.log(actual);
if (actual !== s) {
throw new Error(`the decoded string does not match the original: ${actual}`);
}
}
}
main();
The concept of performance and npm package is a nice sentiment, but there are so many different issues with this library. Let me list a few.
Function.prototype.apply
breaks when passed an array that is too big. In the console of Chrome 74, this limit appears to be 125833: [].push.apply([], (new Array(125833)).fill(0))
Array.prototype.push
+ String.fromCharCode
= performance bottleneck. Use binary string concatenation if you must accumulate a string.target
in the FastTextEncoder.prototype.encode
function. Even if this does not happen often, the mere presence of reassignment forces the JIST compiler to insert extra code in case of recompilation due to potential constructor changes of target
. More code means that aligning to the page line is less likely. See this SO post..FastTextDecoder.prototype.decode
, the assertion that if (byte1 === 0) break;
is completely wrong because you should use a null byte ("\x00"
) instead.FastTextDecoder.prototype.decode
, you should try to reuse the underlying buffer instead of copying everything into a new duplicate array in case if a typed array is passed instead of an ArrayBuffer.||
operator instead to ensure greater standards compliance and a smaller file size at the minimal overhead of performance.Array
as a shim-like fallback for TypedArrays to provide limited backwards support into IE5.5-IE9."use strict"
, which your minified file (the only file that actually counts) does not do.TextEncoder
and TextDecoder
separately. There is simply a right way to do polyfilling and there is simply a wrong way to do polyfilling. Except with vendor-specific features, you should never assume that just because one item does not exists, the other one does not exist too.!==
instead of !=
when comparing typeof
. The browser is able to assume that it will be a string-only comparison, thus the extra byte used is a waist of space.Object.defineProperty
, then use try/catch around Object.defineProperty
with alternative code for IE5.5-IE8. The try/catch is because Object.defineProperty
exists only for DOM in IE8.new
in front of Error
/RangeError
. It is not needed and it waists precious space for something that is not a performance priority in the slightest.window
, global
, nor this
(in strict mode) are available in Service Workers. Use self
instead of window
to support the Service Worker environment instead of relying on your minifer to strip the "use strict"
so that this
can be used.module.exports
if available for drop-in use in NodeJS."undefined"
with ""+void 0
in the minified code to save 2 bytes of space with each occurrence without performance penalty.I came up with my own original solution that fixed many problems, greatly increased performance, and reduced file size.
Hi,
I am wondering why the fatal
option is not supported. Will setting fatal=true
will cause any issue when using your library as a polyfill?
Many thanks
Object doesn't support property or method 'slice', when trying to encode something in IE11
(new TextEncoder()).encode('sample string')
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.