cabal-club / cable Goto Github PK

View Code? Open in Web Editor NEW

62.0 12.0 3.0 697 KB

A lightweight peer-to-peer chat protocol.

Home Page: https://cabal.chat

p2p p2p-chat specification

cable's People

Contributors

Stargazers

Watchers

Forkers

cryptix noelleleigh mycognosist

cable's Issues

Handshake: Post-Handshake Operation missing end-of-stream marker to protect against stream truncation

hi @hackergrrl,

i was looking at the new "handshake.md" draft, however i noticed one important feature missing in the "Post-Handshake Operation": an end-of-stream marker to protect against stream truncation.

from the Noise spec section 13: "Application Responsibilities":

Session termination: Applications must consider that a sequence of Noise transport messages could be truncated by an attacker. Applications should include explicit length fields or termination signals inside of transport payloads to signal the end of an interactive session, or the end of a one-way stream of transport messages.

i've been working on a similar post-handshake stream: "Secret Channel".

for Cable, i can recommend what i did in "Secret Channel":

disallow a content length of 0, since that doesn't make sense anyways
interpret a length of 0 to actually mean an end-of-stream

then Cable should be protected against stream truncation, since end-of-stream messages are authenticated.

an figurative attack to illustrator is described in pull-box-stream:

Alice: hey bob, just calling to say that I think TLS is really great, really elegant protocol, and that I love everything about it.

Mallory (man in the middle): (SNIP! ...terminates connection...)

Alice: NOT!!!!! (Bob never receives this!)

Bob... WTF, I thought Alice had taste!

Bob never gets the punchline, so thinks that Alice's childish humor was
actually her sincere belief.

Questions & Comments

I did a read-through this morning, and typed up my thoughts below!

Hashes

The spec notes that 32-byte blake2b hashes. blake2b can do 32-64 byte digests. What fueled the decision for 32 bytes?

Would it make sense to have the spec permit a variable blake2b hash length? (32+ bytes) Thinking about cable in a very Deep Future sense, whether being able to use more bits in our hashes would (as I understand it) help offset faster hardware in the years ahead being able to reverse hashes more easily.

Timestamps

what are the consequences of mal-intentioned people being able to post messages to the past?
what are the consequences of mal-intentioned people being able to post messages to the future?
how do we want clients to handle these cases?

`channel time range` Request, `channel state` Request

As I understand it, this means, if a user is in 30 channels, they'd be sending up to 30 * 2 * numPeers of these requests to track all of the channels they're in. They'd also have up to that many incoming requests when they join the swarm. I'm a bit worried this'll make joining the swarm pretty heavy & network intensive, and I'm still thinking about if there are alternatives that'd make sense.

I also wasn't clear on whether the channel time range request will return info/state messages as they occur? Like, when someone joins a channel, will you get that post from both requests (if they're awaiting live data)?

Nitpick

s/length of the channel in/length of the channel name in

String Encoding

I think it would make sense to specify that strings are UTF-8 encoded across the board, e.g. channel names. If we allow arbitrary binary data here, clients wouldn't know necessarily how to interpret them to present them to the user.

`channel state` Request

This mentions that deletes get sent. Would ALL historic deletes be included every time the req is made? This could be LOTS, e.g. 1000s, for an old channel where someone decided to erase their whole history.

`channel list` Request

What do y'all think about having an offset param in addition to limit? I'm having a hard time thinking up a use-case where you'd want limit without offset -- you can't paginate.

Also thinking whether we'd want to specify that impls should try to produce a stable sort of channels to return to this query. It would be unhelpful imo to get different results every time you queried the list from a given peer.

"post"

Suggestion: specify what a "post" is at top of this section.

Also, "post" == "data chunk" == "message"? They're used semi-interchangably in the document. I think it'd be great if we could firm up on the terminology and make sure we use it the same way everywhere.

`link`

How does an impl choose a "latest" message? Biggest timestamp? Choose randomly between heads?

Blocks, Hids

The spec mentions 'blocks' and 'hides' but doesn't specify their meaning. I like that we're putting hides & blocks on the table (they'd be PERFECT for filtering out unwanted users' content from queries), but if we're talking about it here then it sounds like something the spec should explicitly address. Otherwise we should leave it out entirely.

`post/topic`

This should specify that topic is a string, unless some reason to allow non-string data here?

`post/join`

I was unclear on what "Peers can obtain a link to anchor their join message by requesting a list of channels." means? Does it mean you can get the latest msgs from a channel first, to find the latest, so you have s/t to link to? What happens if a post/join is made /w no link set?

sub-protocols

There is already some interest in using cable for non-cabal chat purposes as well as questions about how to layer other features such as sharing files on top of cabal with cable. This could be done by adding more message types, but that would require more coordination (each protocol needs to reserve message type varints for each of its message types) and integration than allowing for sub-protocols where each sub-protocol only needs to reserve a single sub-protocol varint. Using sub-protocols would also let cable focus more on low-level concerns such as routing and cryptography while more application-specific tasks such as text chat can be versioned separately. This is already somewhat the case with the post vs message parts of the cable spec, but using sub-protocols would also move the query logic into a separate realm from the networking and hash/data delivery parts of the protocol.

work-in-progress branch for sub-protocol refactoring: https://github.com/cabal-club/cable/tree/protocol

Eventual consistency issue + proposed solution

Hey y'all. Posting this here to hopefully get some smart eyeballs on it for feedback before I start making the changes.

Premises

In Cable there are implied Conflict-free Replicated Data Types:
1. each user's latest post/info
2. each user's, per channel, membership status (regulated by post/{join,leave})
3. each channel's topic (regulated by post/topic)
For any given post/{topic,join,leave,info}, P, a node could miss it during normal sync, by means of missing it in their usual sync window. Imagining a sync window of 1 week, a node might sync right before P was created, then go offline for just over a week, and then not receive it using the normal time range request.
For each of these, there is a single "latest value".
Latest value is determined using a combination of post timestamps and causal chains.
Causal chains are treated as authoritative over timestamps, when available.
A < C is true if A.timestamp < C.timestamp or the node knows of a causal chain such that A <- ... <- C.
If no causal chain between A and C is known, timestamp alone is used to determine sort order.
Given any posts A and C, a node is always able to order the posts, using timestamps. This is not always true with causal chains, where some intermediary posts may be missing from the local node.
Assume the following:
1. The A <- B <- C causal chain exists.
2. N1 is a node that knows about A and C but not B.
3. N2 is a node that knows about A, B, and C.
If we assume A.timestamp < C.timestamp, N1 would report C as the latest value. N2 would report the same. No conflicts here.
However, if we assume A.timestamp > C.timestamp, then N1 would think A is the latest value, while N2 would still think C is the latest, thanks to N2's knowledge of the causal chain.
(11) is a problem that inhibits eventual consistency. N1 will report A as latest even after syncing with N2. The only way N1 would ever stop considering A as latest is if N1 acquired all posts in a causal chain such that A <- ... <- C. However, due to (2), this may never happen.
This problem also arises for deletions. Thanks to (2), if a deletion is missed during periodic time range sync, there is no way for a node to learn about it after.
The result is that some nodes may see and report old "latest" values for user info, membership status, and channel topics, including deleted ones, even after syncing with clients who "know" what the newest values are.
Cable is not eventually consistent for these data types.

Summary

To summarize, any time someone posts a post/{topic,info,join,leave}, due to clock skew or malice, with a timestamp newer than those that follow it, this situation can happen, where some nodes continue to display and return old data, due to an inaccurate timestamp and lack of causal chain information.

What nefarious about this is that no amount of syncing with nodes who know about the causal chains will fix it. The nodes "stuck" on old data will only use a newer value once a new post, D, is made where D.timestamp > A.timestamp. In a malicious case where the timestamp is set to the VERY distant future, a manufactured value could get locked in essentially forever.

This has, imo, big consequences:

deleted posts can continue to circulate
old user info can continue to circulate
old channel membership info can continue to circulate
old channel topics can continue to circulate

Solution 1

Premise: If a node N2 knows the causal chain A <- ... <- C, they must also know whether A.timestamp > C.timestamp.

Proposed Edit 1: First, when N2 responds to a Channel State request, they would detect this case outlined in the premises section above, and choose to send ALL of the hashes that make up the causal chain between A and C. This would let a node with a broken causal chain to "fill in the blanks" and realize that C is actually sorted after A, and is thus the latest value.

Proposed Edit 2: This solution also proposes making a change to how deletions are made. In order to be able to send the complete causal chain when needed, holes -- like are caused by deletions -- can't exist. However, if deletions were to copy the links from the post it is deleting and make those its links, clients will still be able to traverse the causal chain even in the presence of deletions. e.g. given A <- B <- C, a deletion of B, D, would have D.links = [A]. So, the causal chain sent would include D instead of B, and send {A,D,C}.

Pros:

Solves the problem without needing any major changes, and no wire format changes.
If a new post E is made s.t. E.timestamp > A.timestamp, the causal chain won't ned to be sent any longer, avoiding the problem of this creating grow-only sets.

Cons:

The node would need to send ALL of these hashes on EVERY Channel State request where it's relevant. For post/info this effort may be duplicated across each channel the requester and requestee have in common.
In the case of a maliciously futuristic timestamp, like A.timestamp = Jan 1 2035111, the chain effectively becomes a grow-only set, since the full causal chain will be needed for the foreseeable future.
Another piece of implementation complexity.

Mitigations

For Con#2, we could add a rule in the spec that says that posts with a timestamp newer than local_time + N SHOULD be ignored and not ingested. This would put more of an upper limit on how long a must-send causal chain could get.

Proposed Edit 3: Add a rule to the wire spec that says to discard any posts with timestamp > local_time + 6 hours.

Other

Proposed Edit 4: I also propose we get rid of this clause under Channel State Request:

"If future = 1 and a post that is part of the latest state for a channel is deleted, the responder MUST immediately send the hash of the next-latest piece of state of that same type as a Hash Response. For example, if the latest post/topic setting the channel's topic string is deleted by its author with a post/delete post, the hash of the second-latest post/topic for that channel SHOULD be sent."

I think it's redundant and adds extra complexity. It's already written that whenever state changes, the relevant update hashes are sent.

Oh! Also:

Proposed Edit 5: Add a new section to the wire spec that explains how these implied CRDTs work. Right now we just imply their existance. I think it'd be useful to make it explicit. It would also make it simpler to add future similar data types in the future!

SHA-256 or SHA-512 when signing?

Hey friends, just wondering which hash function should be used when signing messages? sha256 or sha512?

after v1 review: create test vectors to facilitate interoperating implementations

when i was helping out on the ssb ngi project that was developing partial replication, we had a concept called testvectors (example) that i think could be really helpful as a resource to have for cable.

given that we're going through a spec review, this should come at the earliest after the v1 has been reviewed, RFCs process is done, and a final version decided on and landed in the spec repo.

so onto what a testvector is: a piece of documentation x example x testbed that gives pairs of [plaintext input, binary output] together with a description of what use case is being demonstrated by the pair.

so, a test vector for a cable hash response would:

list the exact values used in one instance of a hash response for the fields that make up the message type
give the corresponding buffer that a correct cable implementation would produce, for the specified input

we can also provide test vectors that do not work. e.g faulty input that should not result in a cablegram

the end result is that implementors have, in addition to the spec, a set of clear language-independent test cases they can try their implementation against. they can input the inputs and compare it against the generated buffer, as well as take the buffer and decode it to hopefully arrive at the given inputs.

i don't care so much about the format in particular, the example above is just one format we could choose. i would personally be happy with having a markdown document for test vectors, where each vector is listed as a description, the inputs, as well as the base16 encoded bytes representing the cablegram.

reusable circuits

If you broadcast messages to multiple peers and receive a response, you might want to send additional messages to the same peer through the same intermediaries, saving bandwidth so that follow-up requests do not need to do a flood-fill up to N hops. This branch adds a circuit_id field to requests and responses:

https://github.com/cabal-club/cable/tree/circuit_id

Peers responding to requests can set a circuit_id so that the requesting peer can reach them by the same route. Peers between a requester and responder can keep a routing table mapping circuit ids to a peer connection to send replies along so that messages with the same id do not need to be broadcast on all connections.

Circuits are not intended to be durable or long-lasting. Once a peer disconnects, all circuits on that interface can be forgotten. Peers keeping a circuit table might remember only a fixed number of circuits for each connection, vacating circuits not recently used for circuits more recently used.

Some Questions

Heya :-)

I read through the protocol and it looks quite interesting. I like the goals too. I have a few questions if you dont mind.

how do data responses in general (or specifically a channel list response) look like?
2. Do they contain at least a single (latest?) hash for the latest message/post for peers to anchor join msgs?
Are join and leave messages just for politeness or should others ignore posts until they see a join?
Do time range requests usually return text, topic,join,leave,info messages without the ones deleted?
- can you filter them based on being part of a specific "conversation thread"?
- or to only include messages from specific peers?
Do channel state requests only ask for topic,join,leave,info messages without the ones deleted?
- do they include text? (it is not mentioned in the readme)
- can I filter them (history and live) to only include messages from specific peers?
Is it planned to allow to query sparsly for the current list of (active) peers in a channel?
Is there a size limit when it comes to state fields? { name, blocks, hides, max_age }

would people list e.g. a hyperdrive for larger files?

Also:

Most post types will link to the most recent post in a channel from any user (from their perspective) but self-actions such as naming or moderation will link to the most recent self-action.

Was it considered to allow posts with more than one back link to enable topological sorting and requests?

a text message could be a response to multiple messages at once ...some kind of "merge text message"
the delete message already has at least two back links as far as i can see (link and hash)

I imagine that could allow:

to respond to specific earlier messages of one or more threads.
to sparsly find out the latest name of a peers state without downloading all their state messages?

add version number query

version numbers in protocols alleviate upgrading after serious use unveils necessary improvements, adding them to cable (based on the version bumps we've had with cabal proper) seems like it could be a win.

one option for cable would be adding a version number as part of the cable frame (alongside e.g. signatures). the downside of this is that all messages are bloated by the few bytes. another option would be to add a handshake / introduction to cable. this second option has the seeming drawback of introducing connection state into what i think is an otherwise state-less protocol; more complexity for marginal up-front gain.

another avenue of approach could be to introduce a general version number query, where clients respond with what version of cable they are running as far as they know. this could see additional utility when combined with cable's subprotocols, such that the version number query returns a key:val list of subprotocol:version number instead of just a single version number for cable.

post header: move timestamp to be header field 5

was spending the morning getting reacquainted with the spec and a facet of the post types stood out to me. timestamp is present in all posts, but the position it has in a post depends on the message type. e.g. for post/text it is field 7, after channel_size + channel, while for post/delete it is field 5, after the post header's always present 4 fields.

since it seems reasonable that any type of post will contain timestamp information, i think we should make it part of the post header as header field 5. imo this could make it easier for implementors, since its position doesn't arbitrarily shift depending on the particular post type—consistency 👌

what kind of varint

I think it would be good to outline what the varint encoding is.

I saw @substack uses desert in the first rust draft but I couldn't really tell how they are turned into bytes from the crate docs.

possible inspirations for an already speced out unsigned varint (not sure if cable needs negative integers): https://github.com/AljoschaMeyer/varu64

handling of deletes and channel state

This is more of a conceptual question to get more understanding than about specific wording of the spec.

Why aren't `post/delete`s part of channel state replies?

The way i'm seeing it, getting for e.g. two topics without the delete in the middle, where the 2nd/older topic would have non-recent links, would require a certain amount of trust to the peer, no?

Adding the delete in the middle would at least show "well this thing is gone" now and maybe i still have that old topic anyway.

tangentially related: can all post's be deleted?

If i create a new channel and then delete the join, what's the most recent state? empty channel? no channel? if somebody else joins and i join again, should i reference the delete?

Time? Whose?

Regarding time range queries, should peers just optimistically trust each other's clocks (bad idea) or is there another approach implied?