Giter Club home page Giter Club logo

cborg's Introduction

cborg - fast CBOR with a focus on strictness

CBOR is "Concise Binary Object Representation", defined by RFC 8949. Like JSON, but binary, more compact, and supporting a much broader range of data types.

cborg focuses on strictness and deterministic data representations. CBORs flexibility leads to problems where determinism matters, such as in content-addressed data where your data encoding should converge on same-bytes for same-data. cborg helps aleviate these challenges.

cborg is also fast, and is suitable for the browser (is Uint8Array native) and Node.js.

cborg supports CBOR tags, but does not ship with them enabled by default. If you want tags, you need to plug them in to the encoder and decoder.

Example

import { encode, decode } from 'cborg'

const decoded = decode(Buffer.from('a16474686973a26269736543424f522163796179f5', 'hex'))
console.log('decoded:', decoded)
console.log('encoded:', encode(decoded))
decoded: { this: { is: 'CBOR!', yay: true } }
encoded: Uint8Array(21) [
  161, 100, 116, 104, 105, 115,
  162,  98, 105, 115, 101,  67,
   66,  79,  82,  33,  99, 121,
   97, 121, 245
]

CLI

When installed globally via npm (with npm install cborg --global), the cborg command will be available that provides some handy CBOR CLI utilities. Run with cborg help for additional details.

The following commands take either input from the command line, or if no input is supplied will read from stdin. Output is printed to stdout. So you can cat foo | cborg <command>.

cborg bin2diag [binary input]

Convert CBOR from binary input to a CBOR diagnostic output format which explains the byte contents.

$ cborg hex2bin 84616161620164f09f9880 | cborg bin2diag
84                                                # array(4)
  61                                              #   string(1)
    61                                            #     "a"
  61                                              #   string(1)
    62                                            #     "b"
  01                                              #   uint(1)
  64 f09f                                         #   string(2)
    f09f9880                                      #     "๐Ÿ˜€"

cborg bin2hex [binary string]

A utility method to convert a binary input (stdin only) to hexadecimal output (does not involve CBOR).

cborg bin2json [--pretty] [binary input]

Convert CBOR from binary input to JSON format.

$ cborg hex2bin 84616161620164f09f9880 | cborg bin2json
["a","b",1,"๐Ÿ˜€"]

cborg diag2bin [diagnostic string]

Convert a CBOR diagnostic string to a binary data form of the CBOR.

$ cborg json2diag '["a","b",1,"๐Ÿ˜€"]' | cborg diag2bin | cborg bin2hex
84616161620164f09f9880

cborg diag2hex [diagnostic string]

Convert a CBOR diagnostic string to the CBOR bytes in hexadecimal format.

$ cborg json2diag '["a","b",1,"๐Ÿ˜€"]' | cborg diag2hex
84616161620164f09f9880

cborg diag2json [--pretty] [diagnostic string]

Convert a CBOR diagnostic string to JSON format.

$ cborg json2diag '["a","b",1,"๐Ÿ˜€"]' | cborg diag2json
["a","b",1,"๐Ÿ˜€"]

cborg hex2bin [hex string]

A utility method to convert a hex string to binary output (does not involve CBOR).

cborg hex2diag [hex string]

Convert CBOR from a hexadecimal string to a CBOR diagnostic output format which explains the byte contents.

$ cborg hex2diag 84616161620164f09f9880
84                                                # array(4)
  61                                              #   string(1)
    61                                            #     "a"
  61                                              #   string(1)
    62                                            #     "b"
  01                                              #   uint(1)
  64 f09f                                         #   string(2)
    f09f9880                                      #     "๐Ÿ˜€"

cborg hex2json [--pretty] [hex string]

Convert CBOR from a hexadecimal string to JSON format.

$ cborg hex2json 84616161620164f09f9880
["a","b",1,"๐Ÿ˜€"]
$ cborg hex2json --pretty 84616161620164f09f9880
[
  "a",
  "b",
  1,
  "๐Ÿ˜€"
]

cborg json2bin [json string]

Convert a JSON object into a binary data form of the CBOR.

$ cborg json2bin '["a","b",1,"๐Ÿ˜€"]' | cborg bin2hex
84616161620164f09f9880

cborg json2diag [json string]

Convert a JSON object into a CBOR diagnostic output format which explains the contents of the CBOR form of the input object.

$ cborg json2diag '["a", "b", 1, "๐Ÿ˜€"]'
84                                                # array(4)
  61                                              #   string(1)
    61                                            #     "a"
  61                                              #   string(1)
    62                                            #     "b"
  01                                              #   uint(1)
  64 f09f                                         #   string(2)
    f09f9880                                      #     "๐Ÿ˜€"

cborg json2hex '[json string]'

Convert a JSON object into CBOR bytes in hexadecimal format.

$ cborg json2hex '["a", "b", 1, "๐Ÿ˜€"]'
84616161620164f09f9880

API

encode(object[, options])

import { encode } from 'cborg'

Encode a JavaScript object and return a Uint8Array with the CBOR byte representation.

  • Objects containing circular references will be rejected.
  • JavaScript objects that don't have standard CBOR type representations (without tags) may be rejected or encoded in surprising ways. If you need to encode a Date or a RegExp or another exotic type, you should either form them into intermediate forms before encoding or enable a tag encoder (see Type encoders).
    • Natively supported types are: null, undefined, number, bigint, string, boolean, Array, Object, Map, Buffer, ArrayBuffer, DataView, Uint8Array and all other TypedArrays (the underlying byte array of TypedArrays is encoded, so they will all round-trip as a Uint8Array since the type information is lost).
  • Numbers will be encoded as integers if they don't have a fractional part (1 and 1.0 are both considered integers, they are identical in JavaScript). Otherwise they will be encoded as floats.
  • Integers will be encoded to their smallest possible representations: compacted (into the type byte), 8-bit, 16-bit, 32-bit or 64-bit.
  • Integers larger than Number.MAX_SAFE_INTEGER or less than Number.MIN_SAFE_INTEGER will be encoded as floats. There is no way to safely determine whether a number has a fractional part outside of this range.
  • BigInts are supported by default within the 64-bit unsigned range but will be also be encoded to their smallest possible representation (so will not round-trip as a BigInt if they are smaller than Number.MAX_SAFE_INTEGER). Larger BigInts require a tag (officially tags 2 and 3).
  • Floats will be encoded in their smallest possible representations: 16-bit, 32-bit or 64-bit. Unless the float64 option is supplied.
  • Object properties are sorted according to the original RFC 7049 canonical representation recommended method: length-first and then bytewise. Note that this recommendation has changed in RFC 8949 to be plain bytewise (this is not currently supported but pull requests are welcome to add it as an option).
  • The only CBOR major 7 "simple values" supported are true, false, undefined and null. "Simple values" outside of this range are intentionally not supported (pull requests welcome to enable them with an option).
  • Objects, arrays, strings and bytes are encoded as fixed-length, encoding as indefinite length is intentionally not supported.

Options

  • float64 (boolean, default false): do not attempt to store floats as their smallest possible form, store all floats as 64-bit
  • typeEncoders (object): a mapping of type name to function that can encode that type into cborg tokens. This may also be used to reject or transform types as objects are dissected for encoding. See the Type encoders section below for more information.
  • mapSorter (function): a function taking two arguments, where each argument is a Token, or an array of Tokens representing the keys of a map being encoded. Similar to other JavaScript compare functions, a -1, 1 or 0 (which shouldn't be possible) should be returned depending on the sorting order of the keys. See the source code for the default sorting order which uses the length-first rule recommendation from RFC 7049.

decode(data[, options])

import { decode } from 'cborg'

Decode valid CBOR bytes from a Uint8Array (or Buffer) and return a JavaScript object.

  • Integers (major 0 and 1) that are outside of the safe integer range will be converted to a BigInt.
  • The only CBOR major 7 "simple values" supported are true, false, undefined and null. "Simple values" outside of this range are intentionally not supported (pull requests welcome to enable them with an option).
  • Indefinite length strings and byte arrays are intentionally not supported (pull requests welcome to enable them with an option). Although indefinite length arrays and maps are supported by default.

Options

  • allowIndefinite (boolean, default true): when the indefinite length additional information (31) is encountered for any type (arrays, maps, strings, bytes) or a "break" is encountered, an error will be thrown.
  • allowUndefined (boolean, default true): when major 7, minor 23 (undefined) is encountered, an error will be thrown. To disallow undefined on encode, a custom type encoder for 'undefined' will need to be supplied.
  • coerceUndefinedToNull (boolean, default false): when both allowUndefined and coerceUndefinedToNull are set to true, all undefined tokens (major 7 minor 23: 0xf7) will be coerced to null tokens, such that undefined is an allowed token but will not appear in decoded values.
  • allowInfinity (boolean, default true): when an IEEE 754 Infinity or -Infinity value is encountered when decoding a major 7, an error will be thrown. To disallow Infinity and -Infinity on encode, a custom type encoder for 'number' will need to be supplied.
  • allowNaN (boolean, default true): when an IEEE 754 NaN value is encountered when decoding a major 7, an error will be thrown. To disallow NaN on encode, a custom type encoder for 'number' will need to be supplied.
  • allowBigInt (boolean, default true): when an integer outside of the safe integer range is encountered, an error will be thrown. To disallow BigInts on encode, a custom type encoder for 'bigint' will need to be supplied.
  • strict (boolean, default false): when decoding integers, including for lengths (arrays, maps, strings, bytes), values will be checked to see whether they were encoded in their smallest possible form. If not, an error will be thrown.
    • Currently, this form of deterministic strictness cannot be enforced for float representations, or map key ordering (pull requests very welcome).
  • useMaps (boolean, default false): when decoding major 5 (map) entries, use a Map rather than a plain Object. This will nest for any encountered map. During encode, a Map will be interpreted as an Object and will round-trip as such unless useMaps is supplied, in which case, all Maps and Objects will round-trip as Maps. There is no way to retain the distinction during round-trip without using a custom tag.
  • rejectDuplicateMapKeys (boolean, default false): when the decoder encounters duplicate keys for the same map, an error will be thrown when this option is set. This is an additional strictness option, disallowing data-hiding and reducing the number of same-data different-bytes possibilities where it matters.
  • retainStringBytes (boolean, default false): when decoding strings, retain the original bytes on the Token object as byteValue. Since it is possible to encode non-UTF-8 characters in strings in CBOR, and JavaScript doesn't properly handle non-UTF-8 in its conversion from bytes (TextEncoder or Buffer), this can result in a loss of data (and an inability to round-trip). Where this is important, a token stream should be consumed instead of a plain decode() and the byteValue property on string tokens can be inspected (see lib/diagnostic.js for an example of its use.)
  • tags (array): a mapping of tag number to tag decoder function. By default no tags are supported. See Tag decoders.
  • tokenizer (object): an object with two methods, next() which returns a Token, done() which returns a boolean and pos() which returns the current byte position being decoded. Can be used to implement custom input decoding. See the source code for examples. (Note en-US spelling "tokenizer" used throughout exported methods and types, which may be confused with "tokeniser" used in these docs).

decodeFirst(data[, options])

import { decodeFirst } from 'cborg'

Decode valid CBOR bytes from a Uint8Array (or Buffer) and return a JavaScript object and the remainder of the original byte array that was not consumed by the decode. This can be useful for decoding concatenated CBOR objects, which is often used in streaming modes of CBOR.

The returned remainder Uint8Array is a subarray of the original input Uint8Array and will share the same underlying buffer. This means that there are no new allocations performed by this function and it is as efficient to use as decode but without the additional byte-consumption check.

The options for decodeFirst are the same as for decode(), but the return type is different and decodeFirst() will not error if a decode operation doesn't consume all of the input bytes.

The return value is an array with two elements:

  • value: the decoded JavaScript object
  • remainder: a Uint8Array containing the bytes that were not consumed by the decode operation
import { decodeFirst } from 'cborg'

let buf = Buffer.from('a16474686973a26269736543424f522163796179f564746869736269736543424f522163796179f5', 'hex')
while (buf.length) {
  const [value, remainder] = decodeFirst(buf)
  console.log('decoded:', value)
  buf = remainder
}
decoded: { this: { is: 'CBOR!', yay: true } }
decoded: this
decoded: is
decoded: CBOR!
decoded: yay
decoded: true

encodedLength(data[, options])

import { encodedLength } from 'cborg/length'

Calculate the byte length of the given data when encoded as CBOR with the options provided. The options are the same as for an encode() call. This calculation will be accurate if the same options are used as when performing a normal encode(). Some encode options can change the encoding output length.

A tokensToLength() function is available which deals directly with a tokenized form of the object, but this only recommended for advanced users.

Type encoders

The typeEncoders property to the options argument to encode() allows you to add additional functionality to cborg, or override existing functionality.

When converting JavaScript objects, types are differentiated using the method and naming used by @sindresorhus/is (a custom implementation is used internally for performance reasons) and an internal set of type encoders are used to convert objects to their appropriate CBOR form. Supported types are: null, undefined, number, bigint, string, boolean, Array, Object, Map, Buffer, ArrayBuffer, DataView, Uint8Array and all other TypedArrays (their underlying byte array is encoded, so they will all round-trip as a Uint8Array since the type information is lost). Any object that doesn't match a type in this list will cause an error to be thrown during decode. e.g. encode(new Date()) will throw an error because there is no internal Date type encoder.

The typeEncoders option is an object whose property names match to @sindresorhus/is type names. When this option is provided and a property exists for any given object's type, the function provided as the value to that property is called with the object as an argument.

If a type encoder function returns null, the default encoder, if any, is used instead.

If a type encoder function returns an array, cborg will expect it to contain zero or more Token objects that will be encoded to binary form.

Tokens map directly to CBOR entities. Each one has a Type and a value. A type encoder is responsible for turning a JavaScript object into a set of tags.

This example is available from the cborg taglib as bigIntEncoder (import { bigIntEncoder } as taglib from 'cborg/taglib') and implements CBOR tags 2 and 3 (bigint and negative bigint). This function would be registered using an options parameter { typeEncoders: { bigint: bigIntEncoder } }. All objects that have a type bigint will pass through this function.

import { Token, Type } from './cborg.js'

function bigIntEncoder (obj) {
  // check whether this BigInt could fit within a standard CBOR 64-bit int or less
  if (obj >= -1n * (2n ** 64n) && obj <= (2n ** 64n) - 1n) {
    return null // handle this as a standard int or negint
  }
  // it's larger than a 64-bit int, encode as tag 2 (positive) or 3 (negative)
  return [
    new Token(Type.tag, obj >= 0n ? 2 : 3),
    new Token(Type.bytes, fromBigInt(obj >= 0n ? obj : obj * -1n - 1n))
  ]
}

function fromBigInt (i) { /* returns a Uint8Array, omitted from example */ }

This example encoder demonstrates the ability to pass-through to the default encoder, or convert to a series of custom tags. In this case we can put any arbitrarily large BigInt into a byte array using the standard CBOR tag 2 and 3 types.

Valid Token types for the second argument to Token() are:

Type.uint
Type.negint
Type.bytes
Type.string
Type.array
Type.map
Type.tag
Type.float
Type.false
Type.true
Type.null
Type.undefined
Type.break

Using type encoders we can:

  • Override the default encoder entirely (always return an array of Tokens)
  • Override the default encoder for a subset of values (use null as a pass-through)
  • Omit an object type entirely from the encode (return an empty array)
  • Convert an object to something else entirely (such as a tag, or make all numbers into floats)
  • Throw if something should that is supported should be unsupported (e.g. undefined)

Tag decoders

By default cborg does not support decoding of any tags. Where a tag is encountered during decode, an error will be thrown. If tag support is needed, they will need to be supplied as options to the decode() function. The tags property should contain an array where the indexes correspond to the tag numbers that are encountered during decode, and the values are functions that are able to turn the following token(s) into a JavaScript object. Each tag token in CBOR is followed by a data item, often a byte array of arbitrary length, but can be a more complex series of tokens that form a nested data item. This token is supplied to the tag decoder function.

This example is available from the cborg taglib as bigIntDecoder and bigNegIntDecoder (import { bigIntDecoder, bigNegIntDecoder } as taglib from 'cborg/taglib') and implements CBOR tags 2 and 3 (bigint and negative bigint). This function would be registered using an options parameter:

const tags = []
tags[2] = bigIntDecoder
tags[3] = bigNegIntDecoder

decode(bytes, { tags })

Implementation:

function bigIntDecoder (bytes) {
  let bi = 0n
  for (let ii = 0; ii < bytes.length; ii++) {
    bi = (bi << 8n) + BigInt(bytes[ii])
  }
  return bi
}

function bigNegIntDecoder (bytes) {
  return -1n - bigIntDecoder(bytes)
}

Decoding with a custom tokeniser

decode() allows overriding the tokenizer option to provide a custom tokeniser. This object can be described with the following interface:

export interface DecodeTokenizer {
  next(): Token,
  done(): boolean,
  pos(): number,
}

next() should return the next token in the stream, done() should return true when the stream is finished, and pos() should return the current byte position in the stream.

Overriding the default tokeniser can be useful for changing the rules of decode. For example, it is used to turn cborg into a JSON decoder by changing parsing rules on how to turn bytes into tokens. See the source code for how this works.

The default Tokenizer class is available from the default export. Providing options.tokenizer = new Tokenizer(bytes, options) would result in the same decode path using this tokeniser. However, this can also be used to override or modify default decode paths by intercepting the token stream. For example, to perform a decode that disallows bytes, the following code would work:

import { decode, Tokenizer, Type } from 'cborg'

class CustomTokeniser extends Tokenizer {
  next () {
    const nextToken = super.next()
    if (nextToken.type === Type.bytes) {
      throw new Error('Unsupported type: bytes')
    }
    return nextToken
  }
}

function customDecode (data, options) {
  options = Object.assign({}, options, {
    tokenizer: new CustomTokeniser(data, options)
  })
  return decode(data, options)
}

Deterministic encoding recommendations

cborg is designed with deterministic encoding forms as a primary feature. It is suitable for use with content addressed systems or other systems where convergence of binary forms is important. The ideal is to have strictly one way of mapping a set of data into a binary form. Unfortunately CBOR has many opportunities for flexibility, including:

  • Varying number sizes and no strict requirement for their encoding - e.g. a 1 may be encoded as 0x01, 0x1801, 0x190001, 1a00000001 or 1b0000000000000001.
  • Varying int sizes used as lengths for lengthed objects (maps, arrays, strings, bytes) - e.g. a single entry array could specify its length using any of the above forms for 1. Tags can also vary in size and still represent the same number.
  • IEEE 754 allows for NaN, Infinity and -Infinity to be represented in many different ways, meaning it is possible to represent the same data using many different byte forms.
  • Indefinite length items where the length is omitted from the additional item of the entity token and a "break" is inserted to indicate the end of of the object. This provides two ways to encode the same object.
  • Tags that can allow alternative representations of objects - e.g. using the bigint or negative bigint tags to represent standard size integers.
  • Map ordering is flexible by default, so a single map can be represented in many different forms by shuffling the keys.
  • Many CBOR decoders ignore trailing bytes that are not part of an initial object. This can be helpful to support streaming-CBOR, but opens avenues for byte padding.

By default, cborg will always encode objects to the same bytes by applying some strictness rules:

  • Using smallest-possible representations for ints, negative ints, floats and lengthed object lengths.
  • Always sorting maps using the original recommended RFC 7049 map key ordering rules.
  • Omitting support for tags (therefore omitting support for exotic object types).
  • Applying deterministic rules to number differentiation - if a fractional part is missing and it's within the safe integer boundary, it's encoded as an integer, otherwise it's encoded as a float.

By default, cborg allows for some flexibility on decode of objects, which will present some challenges if users wish to impose strictness requirements at both serialization and deserialization. Options that can be provided to decode() to impose some strictness requirements are:

  • strict: true to impose strict sizing rules for int, negative ints and lengths of lengthed objects
  • allowNaN: false and allowInfinity to prevent decoding of any value that would resolve to NaN, Infinity or -Infinity, using CBOR tokens or IEEE 754 representationโ€”as long as your application can do without these symbols.
  • allowIndefinite: false to disallow indefinite lengthed objects and the "break" tag
  • Not providing any tag decoders, or ensuring that tag decoders are strict about their forms (e.g. a bigint decoder could reject bigints that could have fit into a standard major 0 64-bit integer).
  • Overriding type decoders where they may introduce undesired flexibility.

Currently, there are two areas that cborg cannot impose strictness requirements (pull requests welcome!):

  • Smallest-possible floats, or always-float64 cannot be enforced on decode.
  • Map ordering cannot be enforced on decode.

Round-trip consistency

There are a number of forms where an object will not round-trip precisely, if this matters for an application, care should be taken, or certain types should be disallowed entirely during encode.

  • All TypedArrays will decode as Uint8Arrays, unless a custom tag is used.
  • Both Map and Object will be encoded as a CBOR map, as will any other object that inherits from Object that can't be differentiated by the @sindresorhus/is algorithm. They will all decode as Object by default, or Map if useMaps is set to true. e.g. { foo: new Map() } will round-trip to { foo: {} } by default.

JSON mode

cborg can also encode and decode JSON using the same pipeline and many of the same settings. For most (but not all) cases it will be faster to use JSON.parse() and JSON.stringify(), however cborg provides much more control over the process to handle determinism and be more restrictive in allowable forms. It also operates natively with Uint8Arrays rather than strings which may also offer some minor efficiency or usability gains in some circumstances.

Use import { encode, decode, decodeFirst } from 'cborg/json' to access the JSON handling encoder and decoder.

Many of the same encode and decode options available for CBOR can be used to manage JSON handling. These include strictness requirements for decode and custom tag encoders for encode. Tag encoders can't create new tags as there are no tags in JSON, but they can replace JavaScript object forms with custom JSON forms (e.g. convert a Uint8Array to a valid JSON form rather than having the encoder throw an error). The inverse is also possible, turning specific JSON forms into JavaScript forms, by using a custom tokeniser on decode.

Special notes on options specific to the JSON:

  • Decoder allowBigInt option: is repurposed for the JSON decoder and defaults to false. When false, all numbers are decoded as Number, possibly losing precision when encountering numbers outside of the JavaScript safe integer range. When true numbers that have a decimal point (., even if just .0) are returned as a Number, but for numbers without a decimal point and that are outside of the JavaScript safe integer range, they are returned as BigInts. This behaviour differs from CBOR decoding which will error when decoding integer and negative integer tokens that are outside of the JavaScript safe integer range if allowBigInt is false.

See @ipld/dag-json for an advanced use of the cborg JSON encoder and decoder including round-tripping of Uint8Arrays and custom JavaScript classes (IPLD CID objects in this case).

Example

Similar to the CBOR example above, using JSON:

import { encode, decode } from 'cborg/json'

const decoded = decode(Buffer.from('7b2274686973223a7b226973223a224a534f4e21222c22796179223a747275657d7d', 'hex'))
console.log('decoded:', decoded)
console.log('encoded:', encode(decoded))
console.log('encoded (string):', Buffer.from(encode(decoded)).toString())
decoded: { this: { is: 'JSON!', yay: true } }
encoded: Uint8Array(34) [
  123,  34, 116, 104, 105, 115,  34,  58,
  123,  34, 105, 115,  34,  58,  34,  74,
   83,  79,  78,  33,  34,  44,  34, 121,
   97, 121,  34,  58, 116, 114, 117, 101,
  125, 125
]
encoded (string): {"this":{"is":"JSON!","yay":true}}

Advanced types and tags

As demonstrated above, the ability to provide custom typeEncoders to encode(), tags and even a custom tokenizer to decode() allow for quite a bit of flexibility in manipulating both the encode and decode process. An advanced example that uses all of these features can be found in example-bytestrings.js which demonstrates how one might implement RFC 8746 to allow typed arrays to round-trip through CBOR and retain their original types. Since cborg is designed to speak purely in terms of Uint8Arrays, its default behaviour will squash all typed arrays down to their byte array forms and materialise them as plain Uint8Arrays. Where round-trip fidelity is important and CBOR tags are an option, this form of usage is an option.

License and Copyright

Copyright 2020 Rod Vagg

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

cborg's People

Contributors

achingbrain avatar alanshaw avatar c2bo avatar dependabot[bot] avatar oed avatar pro-wh avatar rvagg avatar semantic-release-bot avatar wellcaffeinated avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cborg's Issues

** operator

This line gets transpiled to Math.pow by the default settings of create-react-app which means you end up doing Math.pow(BigInt(2), BigInt(64)) which fails with Uncaught TypeError: Cannot convert a BigInt value to a number

Needs a main entry in package.json

Browserify doesn't understand exports in package.json. Even if it did, it'd probably take the browser conditional export which would give it esm which would make it explode without extra config.

This module should be published with a main field that provides a cjs fallback for environments that don't understand exports or esm.

[Feature Request]: Encode as blobParts (may include blobs)

I just came up with a grate idea that i posted in cbor-x about supporting blobs and having them be kind of exactly the same things as an ArrayBuffer.

just fyi, it's not about supporting encoding/decoding them as a Blob or a File with a custom tags that may include a filename, mimetype and lastModified date. it's just about having blobs be the same equivalent representation as byte arrays

it was a long post so I'm just going to link to kriszyp/cbor-x#57 (comment)


now if i would like to support things such as File tags then i could just as equally just encode it as:

encode({
  name: file.name,
  lastModified: file.lastModified,
  type: file.type,
  content: file // or await file.arrayBuffer()
})

And decode/map it to a file when i later decode it... or write a tag plugin to support this transformation automatically.

more low level decoding

Hi, this pkg looks neat. but it wasn't as low level as i would have hope.
I want to decode a buffer partially in steps... i would like to iterate over each Token and want to read data more manually and search around.
I would like to jump around with offsets and skip reading particular sections.

eg: i would like to read one token, realize that it's a byte array, knowing how large the buffer is and skip reading n bytes. without having that particular buffer in memory. i want to be able to search inside of my bundle

I assume i'm after the toToken functionality but also without the
assertEnoughData and slice method... just wanna know how large the byte array is...

Decoding concatenated data items

It would be convenient to be able to decode concatenated data items, which has applications in streaming settings as is suggested in the streaming section (link) of the spec:

In a streaming application, a data stream may be composed of a sequence of CBOR data items concatenated back-to-back. In such an environment, the decoder immediately begins decoding a new data item if data is found after the end of a previous data item.

If you are interested in catering to the use-case but don't want to add a new API for it, one light way to support this might be to make the default Tokeniser class public, which I expect would make it simpler for folks to add this support themselves as needed. There are also precedents for libraries supporting this through a streaming interface or a separate method specifically for decoding multiple data items.

safari error

this would cause some glitches on safari

Here is the diff that solved my problem:

diff --git a/node_modules/cborg/cjs/lib/0uint.js b/node_modules/cborg/cjs/lib/0uint.js
index cda2986..30beaf5 100644
--- a/node_modules/cborg/cjs/lib/0uint.js
+++ b/node_modules/cborg/cjs/lib/0uint.js
@@ -4,13 +4,14 @@ Object.defineProperty(exports, '__esModule', { value: true });
 
 var token = require('./token.js');
 var common = require('./common.js');
+var biginteger = require('big-integer');
 
 const uintBoundaries = [
   24,
   256,
   65536,
   4294967296,
-  BigInt('18446744073709551616')
+  biginteger('18446744073709551616')
 ];
 function readUint8(data, offset, options) {
   common.assertEnoughData(data, offset, 1);
@@ -40,7 +41,7 @@ function readUint64(data, offset, options) {
   common.assertEnoughData(data, offset, 8);
   const hi = data[offset] * 16777216 + (data[offset + 1] << 16) + (data[offset + 2] << 8) + data[offset + 3];
   const lo = data[offset + 4] * 16777216 + (data[offset + 5] << 16) + (data[offset + 6] << 8) + data[offset + 7];
-  const value = (BigInt(hi) << BigInt(32)) + BigInt(lo);
+  const value = (biginteger(hi) << biginteger(32)) + biginteger(lo);
   if (options.strict === true && value < uintBoundaries[3]) {
     throw new Error(`${ common.decodeErrPrefix } integer encoded in more bytes than necessary (strict decode)`);
   }
@@ -94,7 +95,7 @@ function encodeUintValue(buf, major, uint) {
       nuint & 255
     ]);
   } else {
-    const buint = BigInt(uint);
+    const buint = biginteger(uint);
     if (buint < uintBoundaries[4]) {
       const set = [
         major | 27,
@@ -106,8 +107,8 @@ function encodeUintValue(buf, major, uint) {
         0,
         0
       ];
-      let lo = Number(buint & BigInt(4294967295));
-      let hi = Number(buint >> BigInt(32) & BigInt(4294967295));
+      let lo = Number(buint & biginteger(4294967295));
+      let hi = Number(buint >> biginteger(32) & biginteger(4294967295));
       set[8] = lo & 255;
       lo = lo >> 8;
       set[7] = lo & 255;
@@ -124,7 +125,7 @@ function encodeUintValue(buf, major, uint) {
       set[1] = hi & 255;
       buf.push(set);
     } else {
-      throw new Error(`${ common.decodeErrPrefix } encountered BigInt larger than allowable range`);
+      throw new Error(`${ common.decodeErrPrefix } encountered biginteger larger than allowable range`);
     }
   }
 }
diff --git a/node_modules/cborg/cjs/lib/1negint.js b/node_modules/cborg/cjs/lib/1negint.js
index f456e92..5272b52 100644
--- a/node_modules/cborg/cjs/lib/1negint.js
+++ b/node_modules/cborg/cjs/lib/1negint.js
@@ -5,6 +5,7 @@ Object.defineProperty(exports, '__esModule', { value: true });
 var token = require('./token.js');
 var _0uint = require('./0uint.js');
 var common = require('./common.js');
+var biginteger = require('big-integer');
 
 function decodeNegint8(data, pos, _minor, options) {
   return new token.Token(token.Type.negint, -1 - _0uint.readUint8(data, pos + 1, options), 2);
@@ -15,8 +16,8 @@ function decodeNegint16(data, pos, _minor, options) {
 function decodeNegint32(data, pos, _minor, options) {
   return new token.Token(token.Type.negint, -1 - _0uint.readUint32(data, pos + 1, options), 5);
 }
-const neg1b = BigInt(-1);
-const pos1b = BigInt(1);
+const neg1b = biginteger(-1);
+const pos1b = biginteger(1);
 function decodeNegint64(data, pos, _minor, options) {
   const int = _0uint.readUint64(data, pos + 1, options);
   if (typeof int !== 'bigint') {
@@ -28,7 +29,7 @@ function decodeNegint64(data, pos, _minor, options) {
   if (options.allowBigInt !== true) {
     throw new Error(`${ common.decodeErrPrefix } integers outside of the safe integer range are not supported`);
   }
-  return new token.Token(token.Type.negint, neg1b - BigInt(int), 9);
+  return new token.Token(token.Type.negint, neg1b - biginteger(int), 9);
 }
 function encodeNegint(buf, token) {
   const negint = token.value;

This issue body was partially generated by patch-package.

parity with cbor-x

I'm trying out both cbor-x and cborg...

Here is the deal:

  • cbor-x supports both typed arrays and arrayBuffer
  • cborg don't support typed arrays and produces regular byte array instead (same as arrayBuffer)
  • when i use ArrayBuffer with cbor-x then it will produce similar response as when i'm using cborg (example at bottom)

So here is an example where i use cbor-x and encoding Uint8Array's (it will produce a uint8 arrays)

const { encode } = await import('https://cdn.jsdelivr.net/npm/cbor-x/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]) }))

/*
HEX: b900016464617461d84043616263

b9 0001        # map(1)
   64          #   text(4)
      64617461 #     "data"
   d8 40       #   typed array of u8, tag(64)
      43       #     bytes(3)
         61    #       unsigned(97)
         62    #       unsigned(98)
         63    #       unsigned(99)
*/

Where as when i using cborg it would produce another result:

const { encode } = await import('https://cdn.jsdelivr.net/npm/cborg/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]) }))
/*
HEX: a1646461746143616263

a1             # map(1)
   64          #   text(4)
      64617461 #     "data"
   43          #   bytes(3)
      616263   #     "abc"
*/
To achieve the same result using `cbor-x` then i can use regular ArrayBuffers
const { encode } = await import('https://cdn.jsdelivr.net/npm/cbor-x/+esm')
const { encode: toHex } = await import('https://jspm.dev/hex-string')

toHex(encode({ data: new Uint8Array([97,98,99]).buffer }))

/*
HEX: b90001646461746143616263

b9 0001        # map(1)
   64          #   text(4)
      64617461 #     "data"
   43          #   bytes(3)
      616263   #     "abc
*/

Now the actual issue/feature request:

I'm not exactly asking you to support TypedArrays or any other tags.
I'm asking you to rather stop using Uint8Arrays in your codebase and your code example and switch to using ArrayBuffer instead of Uint8Arrays.

Why?

  • To make it easier to switch between other cbor encoder/decoders
  • to make parity with other encoders to work similar
  • And lastly allow plugin to write tag supports for Uint8Array and other typed arrays

(credit to @Nemo157 for its online diagnostic tool
(also want to ping @kriszyp to ask if he has any saying to this)
(ps: i don't like how every package name their own function encode/decode it's conflicting with other hex/base64/TextDecoder etc - just name name according to what kind of thing they do: like `encodeToHex(input)`)

Buffer detection

I'm trying to build this module with webpack in js-ipfs/examples/browser-webpack and the Buffer detection doesn't work:

ERROR in ../../node_modules/cborg/esm/lib/byte-utils.js 1:32-39
Module not found: Error: Can't resolve 'process/browser' in '/Users/alex/Documents/Workspaces/ipfs/js-ipfs/node_modules/cborg/esm/lib'
Did you mean 'browser.js'?
BREAKING CHANGE: The request 'process/browser' failed to resolve only because it was resolved as fully specified
(probably because the origin is a '*.mjs' file or a '*.js' file where the package.json contains '"type": "module"').
The extension in the request is mandatory for it to be fully specified.
Add the extension to the request.
 @ ../../node_modules/cborg/esm/lib/encode.js 9:0-40 192:13-18
 @ ../../node_modules/cborg/esm/cborg.js 1:0-41 7:0-12:2
 @ ../../packages/ipfs-core/src/components/pin/pin-manager.js
 @ ../../packages/ipfs-core/src/components/index.js 93:20-48
 @ ../../packages/ipfs-core/src/index.js 27:4-27
 @ ../../packages/ipfs/src/index.js 12:35-55
 @ ./src/components/app.js 11:11-26
 @ ./src/components/index.js 7:10-26

It's good to use buffer because Buffer.allocUnsafe(size) is significantly faster than new Uint8Array(size), but it's bad to use node globals.

Solutions:

  1. Use the process module:
import process from 'process'

export const useBuffer = !process.browser && global.Buffer && typeof global.Buffer.isBuffer === 'function';
  1. Use the buffer module:
import { Buffer } from 'buffer'

export const useBuffer = true;

// replace global.Buffer with Buffer

Ability to configure width for cli diagnostic output

The cli does not offer any possibility to adjust the width of the diagnostic output and always defaults to 100.
The library offers that as an argument:

function * tokensToDiagnostic (inp, width = 100) {

I looked at the arg parsing and I am not sure if it is a good idea to put a width argument in there, but I do believe adding a check for an optional environment variable would be a pretty easy & pragmatic solution. I can propose a small PR if feedback is positive.

Max encodable BigInt is smaller than max encodable Number?

I just came across this weird edge case:

it('encodes big numbers', () => {
  const obj = {
    number: Number.MAX_VALUE
  }
  const buf = encode(obj)
  const out = decode(buf)

  assert.deepEqual(out, obj)
})
// passes
it('encodes big bigints', () => {
  const obj = {
    bigint: BigInt(Number.MAX_VALUE)
  }
  const buf = encode(obj) // <-- throws: CBOR decode error: encountered BigInt larger than allowable range
  const out = decode(buf)

  assert.deepEqual(out, obj)
})

The error is thrown during encoding (not decoding as the message suggests), and it seems encodable Number ranges are bigger than BigInt ranges?

Typings don't work when cborg is imported in a ESM project

I'm using an ESM module project to import cborg and I got this error trying to use it from VS Code.

Could not find a declaration file for module 'cborg'. '~/projects/my-project/node_modules/cborg/esm/cborg.js' implicitly has an 'any' type.
  There are types at '~/projects/my-project/node_modules/cborg/types/cborg.d.ts', but this result could not be resolved when respecting package.json "exports". The 'cborg' library may need to update its package.json or typings.ts(7016)

I could make the error go away by changing the exports in package.json and I'm happy to PR that change, but I'm not sure what exactly best practices are in this realm.

Decoder failure

Hello.
I attempted to decode the following hex sequence:

a50102032620012158209b58eab4a0bd78474117f6f23a6c457cc5351d7b6bbeab62271d5ce2c8fdbda4225820db189f7af9cf4eb6149d6204ebe0821b0ed95193f10dc84044fac1e3f26c4a50a16b6372656450726f7465637402

using:

decode(buffer, { useMaps : true })

but cborg throws Error: CBOR decode error: too many terminals, data makes no sense

Expected

To decode the following data:

{1: 2, 3: -7, -1: 1, -2: h'9B58EAB4A0BD78474117F6F23A6C457CC5351D7B6BBEAB62271D5CE2C8FDBDA4', -3: h'DB189F7AF9CF4EB6149D6204EBE0821B0ED95193F10DC84044FAC1E3F26C4A50'}

Diagnostics from https://cbor.me

A5                                      # map(5)
   01                                   # unsigned(1)
   02                                   # unsigned(2)
   03                                   # unsigned(3)
   26                                   # negative(6)
   20                                   # negative(0)
   01                                   # unsigned(1)
   21                                   # negative(1)
   58 20                                # bytes(32)
      9B58EAB4A0BD78474117F6F23A6C457CC5351D7B6BBEAB62271D5CE2C8FDBDA4 # "\x9BX๊ด \xBDxGA\u0017\xF6\xF2:lE|\xC55\u001D{k\xBE\xABb'\u001D\\\xE2\xC8\xFD\xBD\xA4"
   22                                   # negative(2)
   58 20                                # bytes(32)
      DB189F7AF9CF4EB6149D6204EBE0821B0ED95193F10DC84044FAC1E3F26C4A50 # "\xDB\u0018\x9Fz\xF9\xCFN\xB6\u0014\x9Db\u0004\xEB\xE0\x82\e\u000E\xD9Q\x93\xF1\r\xC8@D\xFA\xC1\xE3\xF2lJP"


##### 14 unused bytes after the end of the data item:

A1 6B 63 72 65 64 50 72 6F 74 65 63 74 02

Can I get a second pair of eyes on this? Thanks!

Disallow certain types when decoding

I want to decode a buffer, but supporting only a subset of CBOR value types - only those that will round-trip to JSON and error out if any non-compliant values are found.

I can control some of this with existing options allowBigInt, useMaps, etc but not everything (Uint8Arrays, for example).

Is it possible to override decoding of certain types to throw?

I saw the tokenizer option to decode but looking at the source it seems a bit all-or-nothing, and the default Tokenizer needs access to jump and quick tables so it's not easily used from outside of cborg.

I think what I really want is something similar to the tags option that lets me supply a decoder for individual data types, or it might be as simple as exporting the default Tokenizer for extension and throwing based on the output of super.next()?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.