Giter Club home page Giter Club logo

fast-text-encoding's Introduction

👋

fast-text-encoding's People

Contributors

dependabot[bot] avatar jilvin avatar kevinushey avatar reeywhaar avatar samthor avatar vapier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fast-text-encoding's Issues

FastTextEncoder and FastTextDecoder being assigned to `scope` regardless if instance already present?

Source:
https://github.com/samthor/fast-text-encoding/blob/master/src/polyfill.js#L8-L9

scope['TextEncoder'] = scope['TextEncoder'] || FastTextEncoder;
scope['TextDecoder'] = scope['TextDecoder'] || FastTextDecoder;

Output:
https://github.com/samthor/fast-text-encoding/blob/master/text.min.js

l.TextEncoder = m;
l.TextDecoder = k;

Should the above read this instead?

l.TextEncoder = l.TextEncoder || m;
l.TextDecoder = l.TextDecoder || k;

utfLabel argument does not support "utf8"

FastTextDecoder and FastTextEncoder should also support a utfLabel argument with the value "utf8". Currently the constructors throw an error if the argument is not exactly equal to "utf-8" (https://github.com/samthor/fast-text-encoding/blob/master/text.js#L36).

According to https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/TextDecoder, both "utf8" and "utf-8" are acceptable values for utfLabel

How to reproduce

Some libraries, e.g. https://github.com/kripken/sql.js, pass the value "utf8" to the TextDecoder (or Encoder) constructor.

I found this issue by accident, since our project uses the following libraries (amongst others):

TypeORM attempts to require sql.js, which then fails with this error:

RangeError: Failed to construct 'TextDecoder': The encoding label provided ('utf8') is invalid.

Apparently by some strange coincident this issue has not occurred earlier in our codebase. Perhaps it has something to do with the order in which some transitive dependencies are required.

I'm Sorry

I'm sorry. I really am. It appears that you have put a lot of time into this project. Nevertheless, it appears there is now a polyfill that is apparently better in every aspect here.

  • It is smaller
  • It is faster on every benchmark. In particular, when decoding a huge ascii array (which is quite common), the linked polyfill is twice as fast.
  • It offers a TextEncoder.prototype.encodeInto polyfill
  • It offers a solo TextEncoder-only file
  • It offers a solo TextDecoder-only file

I realize that you have put a lot of work into this polyfill, and I respect the amount of work you have put into it. I hope you understand.

(Disclaimer: I created FastestSmallestTextEncoderDecoder)

out of stack space when string is too long

The call to String.fromCharCode.apply(null, out) passes each char code as an argument on the stack. This will cause IE11 to generate an out of stack space error if the string is several hundred k long. Other browsers have similar limits.

In the closure compiler, they work around this by doing chunks at a time.

Null padded string length not preserved

When dealing with certain buffers from other systems, the string is padded with null. The native implementations preserve this and simply copy the nulls over to the string, ensuring that the length is correct.

This part of the decode does not allow that to happen in fast-text-encoding:

fast-text-encoding/text.js

Lines 149 to 151 in 227d7d2

if (byte1 === 0) {
break; // NULL
}

Could you guys please support utf-16le

Some packages that's depend on this package won't work on Node.js

RangeError: Failed to construct 'TextDecoder': The encoding label provided ('utf-16le') is invalid.
    at new k (.../node_modules/fast-text-encoding/text.min.js:1:134)
    at Object.<anonymous> (.../node_modules/rustbn.js/lib/index.asm.js:1:17613)

The reason is fast-text-encoding package didn't support utf-16le

feat: why not use `codePointAt`?

Overview

This is a feature request/clarification?

var value = string.charCodeAt(pos++);
if (value >= 0xd800 && value <= 0xdbff) {
// high surrogate
if (pos < len) {
var extra = string.charCodeAt(pos);

charCodeAt gives the UTF-16 codepoints and you need to deal with surrogate pairs.

The charCodeAt() method returns an integer between 0 and 65535 representing the UTF-16 code unit at the given index.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt

For https://en.wikipedia.org/wiki/UTF-8 encoding you just need the Unicode codepoint which is what codePointAt provides.

The codePointAt() method returns a non-negative integer that is the Unicode code point value at the given position.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/codePointAt

Am I missing something? I am not an expert on this topic.

Code

I wrote an implementation that uses codePointAt instead of charCodeAt.
https://jsfiddle.net/pzh6ofnj/2/

const f = (s) => new Uint8Array([...s].map(c => c.codePointAt(0)).flatMap(x => {
  if (x < 0x80) {
    // first 128 code points need 1 byte
    return x;
  }
  if (x < 0x800) {
    // next 1920 code points need 2 bytes
    return [((x >>> 6) & 0x1F) | 0xC0, (x & 0x3F) | 0x80];
  }
  if (x < 0x10000) {
    // next 63488 (really only 61440 are in use) code points need 3 bytes
    return [((x >>> 12) & 0x0F) | 0xE0, ((x >>> 6) & 0x3F) | 0x80, (x & 0x3F) | 0x80];
  }
  // rest need 4 bytes
  return [
    ((x >>> 18) & 0x07) | 0xF0,
    ((x >>> 12) & 0x3F) | 0x80,
    ((x >>> 6) & 0x3F) | 0x80,
    (x & 0x3F) | 0x80,
  ];
}));

function main() {
  // const s = 'abcd😊efgh\n012345689\t€\r🧑🏽‍🍳helloworld!';
  const s = '$£Иह€한𐍈';

  console.log(s);
  const arr = f(s);
  console.log(Array.from(arr).map(x => x.toString(16)));

  // test against TextEncoder
  {
    const expected = new TextEncoder().encode(s);
    console.log(Array.from(expected).map(x => x.toString(16)));
    const actual = Array.from(arr);
    if (Array.from(expected).some((x, i) => x !== actual[i])) {
      throw new Error('the encoded bytes do not match');
    }
  }
  // test against TextDecoder
  {
    const actual = new TextDecoder().decode(arr);
    console.log(actual);
    if (actual !== s) {
      throw new Error(`the decoded string does not match the original: ${actual}`);
    }

  }
}

main();

19 Reasons Why It Is Not "Fast"

The concept of performance and npm package is a nice sentiment, but there are so many different issues with this library. Let me list a few.

  1. Function.prototype.apply breaks when passed an array that is too big. In the console of Chrome 74, this limit appears to be 125833: [].push.apply([], (new Array(125833)).fill(0))
  2. Array.prototype.push + String.fromCharCode = performance bottleneck. Use binary string concatenation if you must accumulate a string.
  3. Stop reassigning the typed array target in the FastTextEncoder.prototype.encode function. Even if this does not happen often, the mere presence of reassignment forces the JIST compiler to insert extra code in case of recompilation due to potential constructor changes of target. More code means that aligning to the page line is less likely. See this SO post..
  4. In FastTextDecoder.prototype.decode, the assertion that if (byte1 === 0) break; is completely wrong because you should use a null byte ("\x00") instead.
  5. In FastTextDecoder.prototype.decode, you should try to reuse the underlying buffer instead of copying everything into a new duplicate array in case if a typed array is passed instead of an ArrayBuffer.
  6. You should support NodeJS<3.0's native Buffer and use it as an alternative to typed arrays.
  7. Stop using rest parameters. Minifiers are horrible at bloating the code size on account of them. Use the || operator instead to ensure greater standards compliance and a smaller file size at the minimal overhead of performance.
  8. In the browser, you should use Array as a shim-like fallback for TypedArrays to provide limited backwards support into IE5.5-IE9.
  9. Bring in the new and the shiny "use strict", which your minified file (the only file that actually counts) does not do.
  10. Test TextEncoder and TextDecoder separately. There is simply a right way to do polyfilling and there is simply a wrong way to do polyfilling. Except with vendor-specific features, you should never assume that just because one item does not exists, the other one does not exist too.
  11. Apply SMI integer optimizations to lines 55, 59 x2, 63 x2, 67, 69, 70, 79, 80, 81 x4, 90, 93, 95, 96, 98-100, 106, 144, 148 x2, 156, 159, 160, and 163-165 to maximize performance.
  12. Use !== instead of != when comparing typeof. The browser is able to assume that it will be a string-only comparison, thus the extra byte used is a waist of space.
  13. If you must use Object.defineProperty, then use try/catch around Object.defineProperty with alternative code for IE5.5-IE8. The try/catch is because Object.defineProperty exists only for DOM in IE8.
  14. Stop using new in front of Error/RangeError. It is not needed and it waists precious space for something that is not a performance priority in the slightest.
  15. Support AMD.
  16. Add in support for Service Worker usage. Neither window, global, nor this (in strict mode) are available in Service Workers. Use self instead of window to support the Service Worker environment instead of relying on your minifer to strip the "use strict" so that this can be used.
  17. Write to module.exports if available for drop-in use in NodeJS.
  18. Use a different minifier. Your current minified code reassigns argument variables corresponding to what was originally rest parameters, thus greatly diminishing the stack overhead optimizations that the V8 JIST compiler is able to apply to the function and delaying entry into the function.
  19. Replace "undefined" with ""+void 0 in the minified code to save 2 bytes of space with each occurrence without performance penalty.

I came up with my own original solution that fixed many problems, greatly increased performance, and reduced file size.

FastestSmallestTextEncoderDecoder

Why `fatal` option is not supported?

Hi,

I am wondering why the fatal option is not supported. Will setting fatal=true will cause any issue when using your library as a polyfill?

Many thanks

IE11 doesn't work

Object doesn't support property or method 'slice', when trying to encode something in IE11
(new TextEncoder()).encode('sample string')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.