Giter Club home page Giter Club logo

proposal-arraybuffer-base64's Introduction

Uint8Array to/from base64 and hex

base64 is a common way to represent arbitrary binary data as ASCII. JavaScript has Uint8Arrays to work with binary data, but no built-in mechanism to encode that data as base64, nor to take base64'd data and produce a corresponding Uint8Arrays. This is a proposal to fix that. It also adds methods for converting between hex strings and Uint8Arrays.

It is currently at Stage 2 of the TC39 process.

Try it out on the playground.

Spec text is available here, and test262 tests in this PR.

Basic API

let arr = new Uint8Array([72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100]);
console.log(arr.toBase64());
// 'SGVsbG8gV29ybGQ='
console.log(arr.toHex());
// '48656c6c6f20576f726c64'
let string = 'SGVsbG8gV29ybGQ=';
console.log(Uint8Array.fromBase64(string));
// Uint8Array([72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100])

string = '48656c6c6f20576f726c64';
console.log(Uint8Array.fromHex(string));
// Uint8Array([72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100])

This would add Uint8Array.prototype.toBase64/Uint8Array.prototype.toHex and Uint8Array.fromBase64/Uint8Array.fromHex methods. The latter pair would throw if given a string which is not properly encoded.

Base64 options

Additional options are supplied in an options bag argument:

  • alphabet: Allows specifying the alphabet as either base64 or base64url.

  • lastChunkHandling: Recall that base64 decoding operates on chunks of 4 characters at a time, but the input maybe have some characters which don't fit evenly into such a chunk of 4 characters. This option determines how the final chunk of characters should be handled. The three options are "loose" (the default), which treats the chunk as if it had any necessary = padding (but throws if this is not possible, i.e. there is exactly one extra character); "strict", which enforces that the chunk has exactly 4 characters (counting = padding) and that overflow bits are 0; and "stop-before-partial", which stops decoding before the final chunk unless the final chunk has exactly 4 characters.

The hex methods do not take any options.

Writing to an existing Uint8Array

The Uint8Array.fromBase64Into method allows writing to an existing Uint8Array. Like the TextEncoder encodeInto method, it returns a { read, written } pair.

let target = new Uint8Array(8);
let { read, written } = Uint8Array.fromBase64Into('Zm9vYmFy', target);
assert.deepStrictEqual([...target], [102, 111, 111, 98, 97, 114, 0, 0]);
assert.deepStrictEqual({ read, written }, { read: 8, written: 6 });

This method takes an optional final options bag with the same options as above.

As with encodeInto, there is not explicit support for writing to specified offset of the target, but you can accomplish that by creating a subarray.

Uint8Array.fromHexInto is the same except for hex.

Streaming

There is no explicit support for streaming. However, it is relatively straightforward to do effeciently in userland on top of this API, with support for all the same options as the underlying functions.

FAQ

What variation exists among base64 implementations in standards, in other languages, and in existing JavaScript libraries?

I have a whole page on that, with tables and footnotes and everything. There is relatively little room for variation, but languages and libraries manage to explore almost all of the room there is.

To summarize, base64 encoders can vary in the following ways:

  • Standard or URL-safe alphabet
  • Whether = is included in output
  • Whether to add linebreaks after a certain number of characters

and decoders can vary in the following ways:

  • Standard or URL-safe alphabet
  • Whether = is required in input, and how to handle malformed padding (e.g. extra =)
  • Whether to fail on non-zero padding bits
  • Whether lines must be of a limited length
  • How non-base64-alphabet characters are handled (sometimes with special handling for only a subset, like whitespace)

What alphabets are supported?

For base64, you can specify either base64 or base64url for both the encoder and the decoder.

For hex, both lowercase and uppercase characters (including mixed within the same string) will decode successfully. Output is always lowercase.

How is = padding handled?

Padding is always generated. The base64 decoder allows specifying how to handle inputs without it with the lastChunkHandling option.

How are the extra padding bits handled?

If the length of your input data isn't exactly a multiple of 3 bytes, then encoding it will use either 2 or 3 base64 characters to encode the final 1 or 2 bytes. Since each base64 character is 6 bits, this means you'll be using either 12 or 18 bits to represent 8 or 16 bits, which means you have an extra 4 or 2 bits which don't encode anything.

Per the RFC, decoders MAY reject input strings where the padding bits are non-zero. Here, non-zero padding bits are silently ignored unless lastChunkHandling: "strict" is specified.

How is whitespace handled?

The encoders do not output whitespace. The hex decoder does not allow it as input. The base64 decoder allows ASCII whitespace anywhere in the string.

How are other characters handled?

The presence of any other characters causes an exception.

Why are these synchronous?

In practice most base64'd data I encounter is on the order of hundreds of bytes (e.g. SSH keys), which can be encoded and decoded extremely quickly. It would be a shame to require Promises to deal with such data, I think, especially given that the alternatives people currently use all appear to be synchronous.

Why just these encodings?

While other string encodings exist, none are nearly as commonly used as these two.

See issues #7, #8, and #11.

Why not just use atob and btoa?

Those methods take and consume strings, rather than translating between a string and a Uint8Array.

Why not TextEncoder?

base64 is not a text encoding format; there's no code points involved. So despite fitting with the type signature of TextEncoder/TextDecoder, base64 encoding and decoding is not a conceptually appropriate thing for those APIs to do.

That's also been the consensus when it's come up previously.

What if I just want to encode a portion of an ArrayBuffer?

Uint8Arrays can be partial views of an underlying buffer, so you can create such a view and invoke .toBase64 on it.

proposal-arraybuffer-base64's People

Contributors

bakkot avatar ljharb avatar michaelficarra avatar arjunsajeev avatar septs avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.