multiformats / cid Goto Github PK

Self-describing content-addressed identifiers for distributed systems

License: Other

cid's Introduction

multiformats

The main repository for discussing multiformats

Multiformats is a set of self-describing protocol values. These values are foundational in that they are low-level building blocks for both data and network layers of the composable protocols making up IPFS, IPLD, libp2p, and many other decentralized data systems. This repository's issues and pull requests are currently the primary venue for the coordination between the various registries making up the group, each of which is separately being hardened as specifications and public, formal registries over time.
See contributing.md for more details on governance and process.

Current Registries

Currently, we have the following formats, each of which corresponds to a specification and a registry. More formats are being discussed and may be added over time, but the following are the mature ones to date:

Repo	Status	Specification
multiaddr	stable	TBD
multibase	stable	W3C CCG
multicodec	stable	TBD
multihash	stable	W3C CCG

See the project directory, below, for implementations and other related repositories.

Current Registries
Table of Contents
Background
- A note on the word Multiformats
Project Directory
- Implementations
Maintainers
Contribute
License

Background

Every choice in computing has a tradeoff. This includes formats, algorithms, encodings, and so on. And even with a great deal of planning, decisions may lead to breaking changes down the road, or to solutions which are no longer optimal. Allowing systems to evolve and grow is important.

Multiformats is a collection of protocols which aim to future-proof systems, today. They do this mainly by allowing data to be self-describable. This allows interoperability, protocol agility, and helps us avoid various forms of lock-in. Currently, these interlocking protocols (both works in progress and implemented) cover the following areas:

multiaddr: network addresses
multibase: base encodings
multicodec: serialization codes
multihash: cryptographic hashes
multikey: cryptographic keys and artifacts

Several of the multiformats are stable, and work on the others is ongoing. Implementers and refiners of the drafts of any one of these registries or their tooling are welcome to contribute, without needing to understand deeply or track progress on the others. Across these otherwise different use-cases and mechanisms, the self-describing aspects of the protocols have a few design goals in common:

the "prefixes" use to self-describe a value must be inline with the value (not passed out-of-band, in function calls, implicit choices, or documentation);
they must be compact and have a binary-packed representation (as opposed to a sparser encoding) or they will hinder performance;
they must have a human-readable representation.

A note on the word Multiformats

Multiformats is the name for the community (and the "organization" in GitHub's access control model), but multiformats can also be used to refer to protocols; for instance, in the sentence "Use one of the multiformats". Formats is interchangeable with protocols, here, as each format is designed in tandem with one or more protocols which handle those self-describing values centrally. We try to capitalize Multiformats when it refers to the organization.

Project Directory

Below, a list of all of the projects in the Multiformats organization is listed.

Maintainers are the active leads for each project, even if the specification is still under construction. Their responsibilities are to make sure that issues and pull requests are attended to in a timely manner, and general upkeep. If you have questions about a repository, or need feedback, please contact them as appropriate. If any of the specifications defining these formats are formalized in a standards body, these maintainers may continue on as Registrars of the table of entries which can keep growing after stabilizing the syntax and tooling interfaces.

Implementations

There are no official or maintained implementations of the entire set of multiformats specifications and registries. The readme file of each multiformat specification repository includes a list of known implementations, some of which are hosted in this GitHub organization.

Maintainers

Maintainers of the each multiformats specification are listed in the appropriate repositories. The external standardization of multiformats specifications and registries is currently managed and coordinated by @bumblefudge of learningProof UG.

Contribute

Check out our contributing document for more information on how we work, and about contributing in general.

License

cid's People

Contributors

Stargazers

Watchers

cid's Issues

`<multibase-prefix>` in CID definition is unnecessary

The spec explicitly states that CID begins with <multibase-prefix>, which is actually not true -- not when the CID is sent over the wire in binary form. Even when CID is sent as multibase encoded string, all the implementator of the spec must know is that it's a multibase encoded byte array following the definition:

<cidv1> ::= <cid-version><multicodec-packed-content-type><multihash-content-address>

There is no reason to mention <multibase-prefix> in the spec, for the same reason there is no need to mention <multihash-code> in front of <multihash-content-address>. The multibase prefix is a concern for the multibase specification, not the CID specification.

Putting it there suggests that the prefix is either appended to the binary form, which is wrong because it's not, or the multibase encoded form, which is redundant information.

Move CIDs to the "multiformats" project

So, when proposing to use CIDs as peer IDs, I got a bit of push-back as CIDs currently imply IPLD and bring in the concept of IPLD formats, links, dags, data-structures, etc. However, that doesn't have to be the case. Really, CIDs are just (codec, content-addr) tuples. That is, they're just "typed pointers".

Proposal: Move the CID spec/concept to multiformats.

This would mean that the codec in <version><codec><mhash> wouldn't necessarily map to a defined IPLD codec. Instead, it would just be some multicodec that may or may not have an IPLD format defined (yet).

Why?

People can start using CIDs without thinking about IPLD. Typed pointers are really nice and it would be great to be able to use them everywhere without having to sell all of IPLD.
We can retroactively define IPLD formats for these codecs. Then, when projects realize "oh shoot, I wish I had used IPLD", they'll have an easier time upgrading.

Why not?

This may encourage users a proliferation of multicodecs/IPLD formats. In IPLD, we use multicodecs to indicate structure not semantics.

Thoughts? Should we just keep CIDs where they are but say "you can use these without the rest of IPLD" (not sure we can really sell that).

CIDv2 idea: include the heights of trees in the CID

The other week I was brainstorming whether it would be possible to use IPLD as a potential data storage/transfer format for a future version of Bazel’s Remote Execution protocol, most notably its Content Addressable Storage (CAS). See bazelbuild/remote-apis#250 for details. Where this use case differs from the IPFS is that Bazel remote execution follows a more traditional client-server model. There is no immediate intent to use peer-to-peer sharing of objects.

One thing I was thinking about, was what an efficient algorithm for replicating a DAG from a client to a server (i.e., uploading source code to build), or from a server to a client (i.e., downloading build outputs) would look like. Considering that IPLD/IPFS relies on chunking more heavily compared to what Bazel does right now, it’d be pretty important for build clients/servers to use heavy parallelism to transfer such data across.

That said, you do want to place bounds to the amount of parallelism to prevent exhaustion in case of large data sets. It’s fine for a DAG to have large fanout, and it’s fine for a DAG to be deep. But if a DAG has large fanout and is very deep, then it might be necessary to limit the amount of parallelism traversing the DAG to avoid keeping too many partially replicated blocks in memory.

One piece of information that would be very useful to have to be able to implement a properly bounded parallel replication algorithm is tree height. If this information was attached to every link, then a replication algorithm could at any point make smart choices on how aggressively fan out when traversing.

Unfortunately, this information is not part of CIDv0/CIDv1, meaning that if a storage system using IPLD wants to use such information, it would need to track it out of band. Alternative, one could use IPLD with a custom link system, but that has all sorts of unfortunate implications, such as being unable to export build results into IPFS for archiving purposes.

My question is therefore whether a future version of CID, if ever created, could include tree height (in blocks) as well.

Mapping to and from strings (for URLs)

Is there a canonical format for mapping IPFS identifiers into URLs ? We need to represent IPFS objects in applications, so that we can know to open them in IPFS (not HTTP for example), and I'd rather not reinvent the wheel or define yet another format. Naively I was using strings like: ipfs:/ipfs/Qm.... as was recommended somewhere (I can't find that reference now).

However thee strings don't work, because depending on how that hash was generated it has to be passed to a different API - for example the hash returned by JS.Block.put has to go to JS.Block.get; the hash returned by the HTTP API has to be passed to either JS.Files.get or JS.Block.get depending on the size of the file (See ipfs/js-ipfs#1049 ) Passing any of these to the wrong API generates either infinite waiting; return of 0 bytes or an exception, so it sounds like I have to encode which API in the actual URL.

Note - encoding the CID doesn't work, because the HTTP API returns the hash rather than the CID, and the Block.put returns a CID which tells you the codec is dag-cbor (which may itself be a bug) so it looks just like the correct CID for a getting a long file from files.get

I can't seem to find which of these are bugs in the code; which are bugs in the design; and which are intentionally and need kludges to work around ?

Some places I've checked that don't appear have it ! JS implementation; Github/ipfs/cid; git/multiformats/multicodec;

Add types to appease mypy!

Mypy complains and I have to silence it like this (which is not ideal).

from multiformats_cid import from_bytes  # type: ignore

If the project is already typed, then it should be as easy as adding an empty file called py.typed to the base module.

and adding this to setup configuration:

[options.package_data]
<YOUR_SRC_DIR> = py.typed

However, I went looking for the code here and I don't see any:

Move CID spec to specs repo

As part of my spec repo refactor I included a copy of the current CID spec.

https://github.com/ipld/specs/pull/72/files

However, before that lands I think we need to discuss moving the spec to that repo. There's obviously a lot of history in this repo discussing the spec, but it would also be nice to consolidate IPLD spec conversations and RFCs.

CID Specification does not define "base58btc"

The CID Specification refers to an encoding/decoding method "base58btc" but does not define what this method is.

I suspect the intended meaning is "Base58 encoding as used by Bitcoin". It might also mean "Base58Check encoding as defined by Bitcoin. Base58Check appears to have properties like: 1. well-defined way of handling leading zero bytes in the binary payload, and thus ability to represent any length of binary payload; 2. a notion of a one-byte "prefix" to which the binary payload is concatenated; 3. a 4-byte checksum which is concatenated to the prefix and payload. It is not clear to me whether or not the CID Specification's term "base58btc" is supposed to include these properties.

I am new to Multiformats and IPFS and Filecoin, so I have a very naive reading of the CID Specification. I did a little web searching, and could not find something which purported to be an authoritative spec for something named "base58btc". There are many pages which describe Base58 representation, and/or Bitcoin's Base58Check encoding, but none of these seemed to claim to be an authoritative specification. I may well have overlooked something.

The Base58 Encoding Scheme is an IETF internet draft by M Sporny. It describes the Base58 alphabet used by Bitcoin, and property 1. above. It does not include properties 2. or 3. This draft is labelled as a "draft", and as expiring in May 2021, so it may not be robust enough to refer to in the CID Specification.

Base58check encoding from the Bitcoin Cash Protocol website appears to have a pretty clear specification for the Base58Check encoding as defined by Bitcoin. I don't know how authoritative it is.

I saw a piece of source code which implemented Base58Check encoding, which quote extensively from Bitcoin source code defining Base58Check encoding. The Bitcoin content was attributed to Satoshi Nakamoto, and was very clear. I don't know where to find an authoritative copy of the Bitcoin source code module containing this quote. Maybe that source code is a good spec to refer to from the CID Specification.

In any case, as a naive software developer attempting to understand the CID Specification, it would be helpful to have a clear reference to what is specified by "base58btc".

Create cid from signature

I can not create a cid with signature of a signed message.
Is it possible get cid from a signed message?

multicodec-packed gives a 404

On the README page the link to multicodec-packed gives a 404

Currently points to :

https://github.com/multiformats/multicodec/blob/master/multicodec-packed.md

Not 100% sure where the correct URI is

Revisit language of the whole spec

This CID spec is a pretty old piece and language around some of the things has changed. For example <multicodec-content-type> would now be called something like <ipld-codec>.

Cite URI specification

CID is a standard that references URIs.

Please reference the dependent specification:

Uniform Resource Identifier (URI): Generic Syntax. https://tools.ietf.org/html/rfc3986

CID length and identity hashes

Moved from: ipfs/kubo#4918 as this isn't go-ipfs specific and will affect the spec.

Basically, we'd like to allow inlining small blocks into CIDs (using the identity hash function) for performance reasons. However, the larger the block we allow to be inlined, the less user friendly CIDs get. Unfortunately, we have to pick a "default inlined size" up front or we'll end up changing a bunch of hashes later.

Open questions:

Do we have a hard limit. That is, do we say that all CIDs must be shorter than X?
What should be the maximum size of CIDs created by default?

@whyrusleeping @kevina @diasdavid @vmx @kyledrake

Redefine CIDv1 to be <cidv1-multicodec><format-multicodec><multihash>

That is, define 0x1 to be a multicodec for CIDv1. Future versions will get their own multicodecs which will be guaranteed to be distinct from all other multicodecs. We may just use 0x2 for v2 but we'd put that in the multicodec table.

Motivation

Currently, the CID format is <version><format-multicodec><multihash>. Unfortunately, because the version is "just a number", we already have to be careful to skip version 0x12. If we don't, we'll clash with the sha2-256 multihash (and existing V0 CIDs).

We just ran into this same issue when considering the upgrade path for libp2p peer IDs to CIDs. If the version were a multicodec, we'd never have any ambiguity issues even if we allowed peer IDs to be arbitrary hashes (for now) and later decided to make them CIDs. That is, we'd always be able to say "this multicodec is a multihash, therefore it's a bare multihash" or "this multicodec is the CIDv1 multicodec, therefore we have a CID".

TL;DR: Make CIDv1 self describing (that is, make it declare that it is, in fact, a CIDv1).

Generalize CIDv0 to PeerIDs

Currently, CIDv0 is defined to be a protobuf encoded IPFS node. Unfortunately, we're also using hashes that look like CIDv0s to point to public keys (PeerIDs). It would be nice to be able to treat these keys as IPLD objects.

Luckily, due to the way protobufs are encoded, we can use some ✨ magic numbers ✨ to make this work:

PBNodes always start with either 0x0a or 0x12 (1<<3|2 or 2<<3|2).
Public/Private keys always start with: 0x08

Define CIDv0 to allow sha256 multhashes only.

All of our code assumes this by checking for the Qm prefix. Furthermore, the argument for CIDv0 is backwards compatibility and, as far as I know, nobody is using non-sha256 CIDv0 CIDs at the moment (at least I hope not as our code can't really handle them...).

Link to py-cid is outdated

This link is to a public archive: https://github.com/ipld/py-cid

This appears to be the actively maintained version: https://github.com/pinnaculum/py-multiformats-cid

Implementations: should a v0 be equal to v1

This issue was raised in rust-cid, but is a general topic for all implementations.

If you convert a V0 to a V1 CID and then compare those two with an "equal" operation, should it return true?

If we want to make it return true that would mean for specific languages:

Rust: implement ParialEq for CID manually and if a V1 is DagPB, SHA-1 and has the same hash digest as th V0, they are considered equal.
JS: Cid.equals() would need to be changed, currently it expects the version to be the same.

I think making this change makes sense, as then "equal" CIDs would be the ones that point to the same content.

Specify CID without a specific binary encoding

In the current spec, CIDs are tied to a specific binary encoding. I propose splitting the CID spec into a definition of the values it describes and some default binary encoding.

The description of the values would talk about the version, the IPLD codec and the Multihash. It would be independent on how it is represented. For example rust-CID supports encoding a CID using the SCALE codec, which is not the default binary encoding of the CID.

There would then be a default binary encoding (as it is today) with the varints.

"identity" multibase CIDv1 binary

Given a binary CID cid, where N is the first varint in cid. If N == 0, could not be a CIDv1 encoded as identity multibase? I ask it because, in dag-cbor is specified that CIDv1 should be encoded as identity multibase. However, following the Decoding Algorithm, it would raise an error instead of decoding it.

So, should the decoding algorithm be modified, or CID does not natively support to encode/decode CIDv1 using the raw-binary identity Multibase?

CID Decoding Algorithm does not define "string", and allows false positives

The CID Decoding Algorithm contains a step, "1. If it's a string (ASCII/UTF-8):". This is not sufficiently specified, and is subject to false positives.

The first problem is the consequences of the term "UTF-8", and the lack of an algorithm to determine if the target byte sequence "is a string (in UTF-8)". UTF-8 strings may consist of byte values less than, or greater than, or equal to 0x80. But not all sequences of byte values are valid UTF-8. The Unicode Standard, Section 3.9, Unicode Encoding Forms, goes into the details of what is valid and what is not. But some of the ins and outs are non-obvious, and probably undesireable in a CID decoding algorithm. If the intention is that the test should be, "1. if the target byte sequence consists of byte values from 0x21 to 0x7F inclusive", then say that. If the intention is that the test should be, "1. if the target byte sequence is a valid base58

The second problem is that any byte sequence which can be interpreted as a valid ASCII string, or a valid UTF-8 string, is still also a valid binary byte sequence. Thus the decoding algorithm does not rule out the case where someone constructs a CID which happens to consist of a sequence of bytes with values from 0x21 to 0x7F inclusive. This might accidentally get parsed according to case 1 of the algorithm, when it should get processed according to case 2. If there is something about the CID structure which means that a binary CID will never pass the test of case 1, it would be clearer for the documentation to say so.

A third problem is ambiguity is what a decoder should do if it attempts to decode a string according to case 1, but the contents are not valid according to the base58btc or multibase specifications. Should the decoder treat the string as a binary byte sequence and attempt to decode that way, or should the decoding attempt fail with an error?

I am new to the CID design, but I am a software engineer with experience working with text-based formats, including UTF-8. These sort of ambiguities can cause real problems. It may be that they are made clear elsewhere. This issue reflects my naive reading of the CID Decoding Algorithm in isolation. I suggest that this algorithm should not be ambiguous even when read in isolation.

Suggested rewording of the spec to address the first problem above:

The algorithm to decode a byte sequence into a CID is as follows.

If the bytes in the sequence, when interpreted as a UTF-8 string, consist only of the characters of the Base58 alphabet ("123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"), then …

Defining CID without Multibase prefix

In the current version of the spec a CID is defined as containing a multibase prefix.

This doesn't align with the current understanding across the IPLD team. We regard a CID as <cid-version><ipld-codec><multihash>. In cases where a CID is not represented as bytes (e.g. for display purpose) it is Multibase encoded.

Reproducible CID

Say I download a file from IPFS:

ipfs get zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA

Now I have the file, and I also have the CID. However the CID is not very useful to verify the authenticity of the file, since it doesn't specify details like ipld document layout, chunking algorithm, max chunk sizes etc. Also, it does not even specify whether the resource is a file or directory (or whatever else it could be).

I think it's possible to design a CID scheme that captures all the information needed to verify the file against the CID, without any need for guesswork (while still allowing for "exotic" layouts/special cases, see last paragraph).

A major advantage of having this would be the ability to retrieve a file from any source (a gateway, a normal http server, a usb drive, etc) and given its CID (assuming that it is available somewhere, either within the filename or as external metadata) it would be possible to reconstruct the exact layout and chunking specified by the CID, and properly share it on IPFS with the exact original identifier.

Something like:

ipfs add --force-cid zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA myimage.jpg

And also the ability to verify a file against a known CID (where the CID is effectively used as a normal hash):

ipfs verify --cid zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA myimage.jpg

[The same applies for sharing and verifying entire directories against a known CID]

I do understand that IPFS allows using "exotic"/variable chunking algorithms, sizes and layouts. These would be supported by the CID scheme as well, but they would simply render the CID as non-reproducible (it would be clearly marked in the CID itself).

(Edit: ipfs add --force-cid and ipfs verify --cid might also work even with non-reproducible CIDs, as long as the original dag document can be retrieved from the network)

Decide how to handle -1 in Prefix.

Currently, Prefix.MhLength is an int and -1 can be (and is) used to mean "default length". Unfortunately, this means:

cid1.Bytes() == cid2.Bytes() does not imply cid1.Prefix() == cid2.Prefix().
Prefix.Bytes() is broken.

Solutions:

Make it a uint64, provide some convenience constructors constructors (e.g. func V1Prefix(codec uint64, mhType uint64) Prefix). This will break things.
Fix Prefix.Bytes() and provide an Equals method (less convenient in the long run).

Thoughts?

Universal container format based on progressive specialization

[This is a work-in-progress draft design which has been heavily edited since it was first published]

This is an attempt at designing a highly flexible, yet compact, multipurpose container format that can function both as a content/entity identifier, a file header, as a part of a protocol message, or even to contain both metadata and data by itself.

Basically there's a very simple underlying concept here: that successive type enumerations can be used to progressively "namespace" into more and more specialized contexts describing more fine-grained information. Note these type enumerations don't have to be limited to built-in fields (like entity domain or schema version) -- they can be dynamically inferred from fields whose semantics are progressively refined by the schema itself (somewhat like a state machine).

(This is mostly an illustrative example of how such format could be designed, but I did put a lot of thought into it so I think it's a worthwhile read)

It starts with a message encoding identifier (1 character), which can be any one of raw-binary, base64, base32 etc:

<message encoding [1 char]>

Now that we're in binary, a version number for the container format (varint):

<container version [varint]>

Now a varint for a entity domain identifier (e.g. file, ipfs, ipns, https, bitcoin, ethereum etc.)

<entity domain [varint]>

And now a varint version number of the schema for the domain (each domain independently maintains its own schema versioning):

<domain-specific schema version [varint]>

Now the base payload (AKA required fields), where its schema is specialized for the particular domain and version number, (note that total length is included to allow for a client to segment it even if it is unfamiliar with the particular combination):

<base payload length [varint]>
<base payload [arbitrary binary layout - can be variable length]>

And now field data (AKA optional fields), in a simplified protocol buffer like encoding (roughly described below):

<field data [unspecified total length])>

That's all really. It's not bound to contain a hash of any sort, or to be associated with a particular category within a set of predefined codec types.

Example: say we want to encode [raw-binary, container version 2, IPFS, schema version 1] so the first required field would be resource type, say it's UnixFS File, which in turn would refine the schema further to expect <dag hash type [varint]> and <dag hash [binary string]> as following fields.

The base document would look something like:

<encoding: "b" [1 character]>
<container version: 2 [1 byte]>
<entity domain: IPFS [1 byte]>
<domain-specific schema version: 1 [1 byte]>
<base payload length: 34 [1 byte]>
<resource type: UnixFS file [1 byte]>
<dag hash type: sha-256 [1 byte]>
<dag hash [32 bytes]>

(Total length: 1 char + 38 bytes)

Optional fields:

Each optional field is structured as:

<data type and field identifier [varint]>
<field payload>

Where the first bit of data type and field identifier represents the type and the rest the field identifier (specific for the particular schema), which can grow indefinitely since its a varint (fitting into a single byte would allow for 6 bits which can support up to 64 different field IDs).

Data type can be:

0: varint 
1: length prepended binary string (where length is a varint)

(I'm not sure if there's a need for anything else, since booleans can be contained in bitfields and floats can be stored in binary strings)

So let's say for the example we wanted to add a file size, chunking algorithm and max chunk size optional fields to the base CID:

<data type: 0, field id: file size (#0) [1 byte]>
<field payload [6 bytes]>
<data type: 0, field id: chunking algorithm (#1) [1 byte]>
<field payload [1 byte]>
<data type: 0, field id: max chunk size (#2) [1 byte]>
<field payload [3 bytes]>

Totals (file size: 7 bytes, chunking algorithm: 2 bytes, chunk size: 4 bytes). Of course if the information cannot be represented here (say, chunking is variable): it may simply not be included at all.

Now let's say the user wants to also add a signature for the hash, and that is not supported in the base schema, so they would need to use their own application specific field identifier in a reserved range (for this example say 4096+ is reserved [4096 is roughly midway within the range available for 2 byte identifiers]).

<data type: 1, field id: hmac-sha-256 hash signature (#4096) [2 bytes]>
<field payload [1 for length + 32 bytes for data]>

Even if the client doesn't understand this field, it can safely ignore and skip it since all the length information is available through the encoding itself.

Note that it's possible to standardize identifiers within the range 4096+ as application reserved globally for all domains. This would mean that application-specific fields could be added to a document even if its schema is not understood by the client.

Comparison and relationship with W3C DID standard

It is a bit annoying that neither DID spec mentions CID in any way nor CID recognized DID existence.

Would be nice to have something that describes similarities and differences between the two to make it easier for developer to make a decision on which one to use (or both) for a particular use case.

There is CID mention in https://did-ipid.github.io/ipid-did-method/ draft, but it is a draft and it is old.

Right now if I am citing this I have to call it a "DRAFT" specification because nothing is said in the normative specification that it is a final version.

Ambiguity of implicit multibase for CIDv0

This current version of ipld/cid specifies that for CIDv0:

the multibase is always base58 and implicit (not written).

However, the list in multiformats/multibase doesn't include any base exactly named base58...

base58flickr  Z       highest char
base58btc     z       highest char

...so I'm not sure which alphabet I should use.

This spec may need to updated with the appropriate multibase name.

Why length in bits instead of bytes in Human Readable CIDs?

# example CID
zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA
# corresponding human readable CID
base58btc - cidv1 - raw - sha2-256-256-6e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95

In multihash length in bytes

Format

<varint hash function code><varint digest size in bytes><hash function output>

Binary example (only 4 bytes for simplicity):

fn code  dig size hash digest
-------- -------- ------------------------------------
00010001 00000100 101101100 11111000 01011100 10110101
sha1     4 bytes  4 byte sha1 digest