quicwg / load-balancers Goto Github PK

View Code? Open in Web Editor NEW

34.0 17.0 10.0 1.26 MB

In-progress version of draft-ietf-quic-load-balancers

Home Page: https://quicwg.org/

License: Other

Makefile 100.00%

quic ietf

load-balancers's People

Contributors

Stargazers

Watchers

Forkers

huitema ianswett bisdhdh neo-zk lingtaonju zcnzcn33 getdnsapi martinthomson martinduke nandsky

load-balancers's Issues

Add retry_source_connection_id to Retry Service Token Format

The server now needs to include this field.

For the no-shared state service, we just need language that the Retry service needs to encode enough information validate the packet DCID as well and drop if it fails validation. If it does, the server MUST use the packet DCID in the retry_source_connection_id TP.

For the shared state service, the retry source connection ID is going to have to be in the token. We might be able to compress this by xoring some fields; I'll think about it.

Applicability to DTLS 1.3

Almost by accident, QUIC-LB can also route DTLS 1.3 associations over UDP, as long as the client agrees to support connection IDs.

The Ciphertext packets match the short header in the relevant ways, and so can be routed without any issues.

The plaintext packets appear to be QUIC short headers, but as the second byte happens to always be 0xfe, QUIC-LB will 4-tuple route it.

IIUC, this doesn't apply to earlier versions of DTLS. So, one can deploy DTLS behind a QUIC-LB infrastructure as long as

Servers reject DTLS < 1.3
Servers reject ClientHellos that do not have the connection_id extension.

One could also write a slightly different version of QUIC-LB that supported DTLS without reservations, but you'd have to be aware that the packet was DTLS and leverage information about the format of the first byte.

Bytes for server use

Make sure each algorithm always provides 2-3 octets for server use. The current SID limits don't do that.

Server ID: bits instead of octets?

@ianswett asks if there is added value in expressing server ID lengths in bits instead octets.

This is a bit of implementation complexity and a lot of churn in the spec and the handful of implementations that exist, but it does give the configuration agent a little more granularity.

Does anyone find the value of this granularity to be compelling?

Switch to QUIC notation

There are so many variable length fields that the ASCII art doesn't serve much of a purpose. Just switch to the notation used in quic-transport for the various CID and token formats.

Load Balancing Invariant Longer Header Packets

We've had some discussion in this area in the past, and we decided that the best way to statelessly (and consistently) load balance long header packets would be to use a hash of the UDP 4-tuple and the client's source CID; since these are the only constants for all incoming (to the server) long header packets. I have come up with a couple of problems with this approach:

As far as I know, there is no statement in the Invariants that says these must all stay constant for all future versions of QUIC.
Using the hash approach can only function statelessly if there is not change in the DIP configuration. If the set of servers being load balanced changes (which we must assume to be common), then whatever stateless logic you have that maps hash to DIP would also change, resulting in long header packets getting routed incorrectly most likely.
A follow up to (2), if you assume that the long header packets cannot therefore be routed statelessly based on the hash, and state must be tracked to continue to consistently route all long header packets until they are no longer used, at what point can the LB discard this state. By design, there is no on-path signal to indicate "long header packets are no longer used". Any heuristic that might be added here would be affected by (1) too.

Because of these issues, I'm left scratching my head on the best way to recommend to LBs on how to load balance invariant long header packets. The best thing I can think of is:

Use the hash mentioned above, but maintain state for each flow (tuple+client_cid).
Reset a timer (5 sec? 10 sec? 30 sec?) after each new packet is received for the flow. Discard the state when the timer fires.

Because the client's CID is included in the flow calculation, it allows an attacker to create nearly unlimited number of flow states on the LB. You might argue that Retry could be first used by a (cooperating) DoS appliance to first validate source address, but after that is done, this attack can still be executed. It would then require some heuristics on the LB to protect against.

@martinduke @martinthomson any ideas here?

Consider an alternative name to 'arbitrary' algorithm

I'd suggest 'Fallback' or another word than arbitrary.

A definition of Arbitrary is: "Based on random choice or personal whim, rather than any reason or system."

In fact, there are sensible constraints on this algorithm to ensure routing works correctly for non-compliant CIDs and connections don't fail.

Fix terminology of Config Rotation

Erik Fuller points out via email:

This term “configuration phase” had me confused. These two sentences are the only place we reference it. It’s basically a configuration ID so we can distinguish between settings across connections during a deployment, right? Once a new config is deployed, what happens to all the connections in the old format?

After reading through I’m still not certain what “phase of the algorithm” means

He's right. We should just call it a configuration ID and be clearer on what's what.

Fix Figures 3 and 4

The bit variable length bit fields are wrong, and don't match the text. Fix them.

SNI switching

Some load balancers today switch based on the SNI. Obviously, this is not version-invariant. We should add some language about this.

What should such an LB when it encounters an unknown version? It probably has no choice but to forward it based on CID and hope for the best.

Replace configuration pseudocode with YANG

As the config gets more complicated, the C-ish pseudocode is getting more unwieldy. YANG is the standard for configuration models, so I should bite the bullet and just figure out YANG.

Keepalive design discussion

Hi author:
Will quic-lb design Keepalive mechanism next? Surely it's a very important mechanism in load balancer.

Allow-list and deny-list for QUIC versions

It would help the anti-DDoS properties if the Retry Service could receive explicit instructions about which QUIC versions the server might support. This provides a way to deploy new versions without having to upgrade the retry service software or hardware.

Extend low-config concept to all algorithms

In principle, the low-config algorithm's method of extracting a server ID can be extended to all the algorithms. Thus the "server ID allocation method" would be an independent variable, with value 'dynamic' and 'static', and the algorithms could operate with either method.

This is a significant refactor of the routing section, but will make future decisions about static/dynamic much cleaner to discuss.

Retry service NATs and ports

A few more issues with Retry tokens:

For Retry, we're supposed to check the client port. Because this isn't true for NEW_TOKEN, we probably can't put it in the shared-state pseudoheader.

Relatedly, we currently just say that the shared-state Retry Service has to be behind any NAT. This isn't enough.

For non-shared-state, we're probably fine. For a Retry, the NAT will keep the 4-tuple binding so that on either side there is something to validate.

For shared-state:

if the service is in front of the NAT, the server can't validate the address.
if the service is behind the NAT, it'll "work" but to add any value at all there really has to be a port in there somewhere.

We can probably assume that the alternate path that creates the need for shared state won't cause a service-generated token to suddenly appear on the unprotected path.

Configuration ID might be too small

As server clusters increase in size, the need to reallocate server identifiers becomes more acute.

In one model, the configuration ID is used to indicate a stable routing configuration. Server identifiers for a given configuration ID are routed to the same server, no matter how many other instances are added or removed. In order to allow for changes in the cluster, the configuration ID is used so that old servers can be removed from consideration and new ones added.

If these changes happen frequently enough, the number of bits allocated to identifying a configuration might be insufficient. Why not make the length of the identifier flexible? That might mean that you need to make the length of the length similarly configurable.

Rules for Resumption Tokens

We should tighten up the rules for Resumption Token processing by the Retry Service.

When active, it should reject the packet, but it should send Retry. I believe we can distinguish resumption from Retry because the CID length fields are zero; as any Retry token must have a ODCIDL of at least 8, this would appear to be robust.

On a related note, the requirement on servers to encode a way to distinguish the two token types is silly, because of this propery.

Unguessable connection IDs

There is a requirement that it be difficult for a party other than the server and load balancer to guess a CID that will be accepted as valid for a target connection.

This requirement needs to be validated for the schemes described in the draft. This might impose some constraints on the designs chosen.

For instance, I don't believe that the plaintext algorithm meets this goal. The server ID can take all the available space, which is probably wrong. Clearly it is impossible to create sufficient connection IDs for even a single connection if there is only one valid identifier per server. However, it might be argued that even an 18 byte server ID makes it too easy to guess a valid connection ID for a connection (just 16 guesses would be enough to get a 50% chance at that). So it seems to me that a shorter connection ID is necessary.

The same applies to any attempt at obfuscation.

The encrypted versions might be similarly challenging to get right. The For Server Use field in the stream cipher variant needs to be sufficiently long as to avoid engineered collisions. The value used for the stream cipher is malleable, which means that an attacker isn't prevented from guessing. In many ways, this is more challenging than the plaintext variant because the nonce consumes space.

The zero-padding in the block cipher mode might be the best way of preventing guessing, if it were sufficiently long. Similarly, if "Encrypted bits for server use" were sufficiently sparsely populated, then guessing can be hard enough.

Discuss uniqueness of config across load balancers

Add a security consideration to avoid the following scenario:

MyCloudProvider has a single QUIC-LB config for all its load balancers. It rotates keys periodically, etc, but everyone gets the same config. Obviously, all the attacker has to do is open an account with MyCloudProvider and it is able to recover all the server IDs.

Configs ought to be restricted to load balancers serving a finite set of servers. It is possible another MyCloudProvider customer is in the pool behind that load balancer, but that's already a privileged position as already described in the draft.

Obviously, this will require some wordsmithing, as the statement above isn't very precise.

A question about retry token format

In following Figure 6: Cleartext format of shared-state retry tokens , the token format only encode original dcid and retry scid, but not initial scid.

As in the transport draft, it describes that all these three cids are carried in the transport parameters for Authenticating:

Packet Number in Retry Token

Currently the Initial Packet number MAY be encoded in the Retry Token. We must either:

Include language that a server MUST NOT reject a token because this information is not present; OR
Just make it part of the format. (IMO this would be bad, because then the Retry service would have to decrypt the packet number)

Rework Shared-State Token Security

I'm not an encryption expert, but if IIUC it's insufficiently hard to forge a shared-state retry token.

By inducing a Retry, an attacker can obtain the Retry SCID Length, and then focus entirely on an ODCIDs that allow the CID + CID Length part of the token to be a multiple of 16 B. For example, if RSCIDL= 10 B, then make ODCIDL = 20 B -> 32 Bytes for the block.

This then breaks Retry forgery into three separate problems:
(1) Obtaining the mapping of IP Address to the first 16 Bytes of the token. A well-positioned observer could build a database of these in the time scale between token key rotations.
(2) Generate lots of Retry tokens with an ODCID of the correct length, so there is a range of valid CID blocks.
(3) Obtain a valid Retry every few seconds, using the right ODCID length, so that we have a valid timestamp.

Thus, the attacker has a database of

valid encrypted IP addresses
valid encrypted CIDs (really, it's the CID length that would cause validation to fail)
valid timestamps

As the spec uses AES-ECB, these blocks can be mixed and matched to create valid Retry tokens.

This is not exactly trivially open to attack [1], but it does feel like we're conceding a lot of entropy here. I would like someone to propose an alternate design that restores some of that entropy.

[1] Step (1) seems to require a fairly privileged position in the network.

Is the non-shared-state use case realistic?

Buried deep in Sec 7.2.2:

In inactive mode, the service MUST forward all packets that have no token or a token with the first bit set to '1'. It MUST validate all tokens with the first bit set to '0'. If successful, the service MUST forward the packet with the token intact. If unsuccessful, it MUST either drop the packet or forward it with the token removed. The latter requires decryption and re-encryption of the entire Initial packet to avoid authentication failure. Forwarding the packet causes the server to respond without the original_destination_connection_id transport parameter, which preserves the normal QUIC signal to the client that there is an on-path attacker.

My understanding of these services is that they will be injected in the path only when under DoS attack. According to this, something has to hang around to validate Retry tokens. Is this feasible?

PCID without SID configuration?

@ianswett proposed a PCID design that assigns SIDs on the fly instead of having to pre-configure them. There are drawbacks but it has some nice properties.

Consider an alternative name to 'non-compliant'

In PR #95 I suggested non-routable, though I realize non-conformant may also be an option.

The sentence which caused me to suggest non-routable is: https://github.com/quicwg/load-balancers/blob/master/draft-ietf-quic-load-balancers.md#non-compliant-connection-ids-non-compliant

These client-generated CIDs might not conform to the expectations of the routing algorithm and therefore not be routable by the load balancer. Those that are not routable are "non-compliant DCIDs" and receive similar treatment regardless of why they're non-compliant:

I'm happy to write a PR if others find non-compliant potentially confusing as well.

Support of server generated HCID with retry tokens

Since draft 28, the retry mechanism includes a requirement that the client DCID in the retried connection matches the server SCID in the retry packet. I do not see a discussion of mechanisms to verify the retried DCID in section 5 of the draft.

SCID acronym

This is often used to mean Source Connection ID in other contexts. A collision here is likely to cause confusion.

Giving the client more information

QUIC-LB has a bit of an incentive mismatch. The server infrastructure decides how linkable the CID algorithm is, but the client bears most of the cost of the CIDs being linkable. Worse yet, the client has no idea, without a lot of effort, what the servers are doing. Even worse, the servers have some incentives to pick something that's easily linkable.

In Section 8, it says:

Servers that are running the Plaintext CID algorithm SHOULD only use it to generate new CIDs for the Server Initial Packet and SHOULD NOT send CIDs in QUIC NEW_CONNECTION_ID frames

This is a concise way of not giving the client tools to link itself by trying an unsafe migration.

We could just stick with that. A richer way to go would be to create a new transport parameter (e.g. cid_is_linkable, cid_not_encrypted) that would explicitly communicate the risks to the client. We could have a different value for OCID or batch PCID and OCID together.

Dynamic SIDs and High Availability

How does a dynamic framework survive an HA handover? It would seem to lose all the SID allocations and break all connections.

Cut obfuscated server ID algorithm

As discussed at IETF 108. Split from #8.

Retry service handling of non-Initial

In Section 7:
"Retry services MUST forward all QUIC packets that are not of type Initial or 0-RTT. Other packet types might involve changed IP addresses or connection IDs, so it is not practical for Retry Services to identify such packets as valid or invalid."

MUST is too strong. If it keeps any state (i.e. tracking 4-tuples) it can drop non-initial packets. (However, this would make migration not work).

If the Retry Service is in front of a QUIC-LB load balancer, the LB will drop random 1-RTT packets but not Handshake and 0-RTT unless it is version-aware, so passing 1-RTT is "safe."

If the Retry Service is behind the load balancer, which is probably better because LBs are often NATs, then random 1-RTT is already dropped and it is safe to forward 1-RTT to preserve migration.

If there's a non QUIC-LB load balancers, migration doesn't work anyway; might as well drop it.

If it's a single server and the CIDs are random, admitting 1-RTT is weakening the DoS defense.

And then there is the issue with QUIC versions and admitting/dropping them, which is hard to adjudicate with short headers.

A little confused about configuration agent

Hi Author:
I have a little confused about 'configuration agent', from the description of draft, I think it should be a centralized control plane of 'load balancer' and 'server', but from the name 'agent', it seems like it should be a agent component which was used to receive message from control plane. So, what is the most correct definition of 'configuration agent'?

Using ECB for retry tokens seems sub optimal

Section 5.3 specifies a Shared-State Retry Service describes a token format in which the token include the ODCID, a client IP encoded on 128 bits, and a 20 octets data-time, plus additional data. The token is encrypted using AES ECB. This seems sub-optimal:

Using AES GCM or another AEAD format seems more natural. AEAD checks will immediately detect an invalid token, while using ECB forces reliance on invalidity heuristics.
If using AEAD, there is no need to encode the IP address in the token. It can be derived from the IP header and placed in a pseudo header. The pseudo header can then be authenticated as part of AEAD decryption.
The pseudo header approach can be used to authenticate other fields, e.g. verify that the DCID matches the SCID sent in the Retry packet.
Encoding the time as 64 bits time64_t seems more natural than ASCII encoding, and also shorter.

ODCID is at least 64 bits

The diagrams indicate they are in the range 0..160. This is incorrect.

Security Considerations for Shared-State Retry Keys

@huitema has done some nice analysis on the constraints of using a 96-bit random nonce to encrypt the Shared-state retry keys. This analysis should be in security considerations.

Simplify the configuration by merging server-id and zero-pad

The encrypted CID format includes a zero-pad field that is used to detect whether the decryption succeeded or not. I suggest merging this field with the server ID field, and test whether the decryption succeed by checking whether the server ID is valid or not. This assumes that the server ID field is sparsely populated. For example, if there are just 256 servers, in theory a 1-octed field would be sufficient; instead, we could use a 4 or 5 octet server ID field that would be sparsely populated, allowing for error detection.

This would allow for unified validity detection across all supported methods:

clear text: verify that the server ID is valid;
obfuscated: the divider need to have the same size as the full length server ID; the modulo is the server ID; validity can be verified there.
stream: decrypt and verify that the server-id is valid
encrypt: decrypt and verify that the server-id is valid

It would also allows for simplification of the configuration for the encrypted method, by specifying just one field instead of two.

Low-Config CID creating huge problems with coexistence of configurations

For low-config CID (LCID), which dynamically allocates server IDs, the current editor's draft has a heuristic to extract a server ID from a client-generated non-compliant CID. The fundamental issue is that the server has to get an SID from an incoming Initial, even if the SR bits of the Connection ID imply that it doesn't map to the LCID config.

If the DCID references the 4-tuple routing bits or an undefined configuration, use the following procedure to establish a predictable template for server ID extraction:

Identify the instance of Low-Config CID configuration with the largest config rotation codepoint. For example, if configurations 0b10, 0b01, and 0b00 all use the low-config CID algorithm and have server ID lengths of 3, 5, and 7 octets, respectively, and a packet comes in with codepoint 0b11, the load balancer would extract 3 octets for the server ID.

Extract the appropriate number of octets.

If the server ID matches one already in the table, forward the packet to that server.

If not, the load balancer runs the algorithm of its choosing and adds the extracted server ID to the table corresponding to the highest-value Low-Config CID Configuration codepoint.

This doesn't work with config rotation. The main principle in CR is that the load balancer needs to get the configuration first -- otherwise the server might generate CIDs that the LB can't route, and we break ongoing connections. Given that, we have a problem. Consider:

LB and server both have LCID configuration 0b00 with SIDL of 1.
LB gets LCID configuration 0b01 with an SIDL of 2.
LB gets a packet with CR bits 0b01 and octets 2-3 0xfa13, and randomly forwards it to the server
The LB will add an SID entry for 0xfa13, but the server will add an SID entry for 0xfa.
The server will generate CIDs with CR 0xb00 and SID 0xfa, and the LB will not route them correctly.

Stated more generally: in the current design, the LB can never be sure if the server has a given configuration and that makes it very hard for the LB to infer what the server is going to do with a given CID to get an SID for its table. We don't even know if it's non-compliant at the server or not!

There are some potential fixes here:

Give up on dynamic server ID allocation
Have the LB keep track of config rotation bits it's observed in short header packets to each server -- this is an indication the server has the CR bits. This is also seems vulnerable to attack with injection of random short-header packets.
Reserve the SID corresponding to all configurations when routing a packet. In effect, this makes it impossible to adapt the crypto algorithms to use dynamically allocated server ID and have them coexist with plaintext ones: the crypto algorithms will essentially have randomly distributed plaintext fields, so the table will fill up fast.
Remove config rotation from dynamic allocation; this makes changes to config a site maintenance event. This also effectively prevents using crypto with dynamic allocation, because you can't rotate keys.
Change the behavior of a server when it gets CR bits it doesn't understand. Don't extract a server ID. If the server has no IDs because it just booted, simply echo the client-generated CID. When the server gets a config that lets it extract an SID from this, or gets a CID on another connection that it can decode, then it can update the CID on this connection. This implies that we could have valid short header packets with non-compliant DCIDs, so we'd have to have the LB admit these instead of dropping them like it does currently.
Same as #5, but instead of echoing the client-generated CID, it uses the 4-tuple routing bits for a new CID. This will make the
CID compliant, but abandon the whole purpose of QUIC-LB until an Initial with the right CR bits arrives (1 in 4 chance per connection).

Unfortunately, this breaks even more badly it coexists with a configuration with static SIDs. Another example:

LB and server have LCID config 0b00 with an SIDL of 1. Server A has SID 0x01 and Server B has SID 0x02.
LB gets stream CID config 0b01 (encrypted, static SIDs)
An Initial packet has CR bits 0b01. The second octet is 0x02 but the proper SCID decoding maps to Server A. So, it routes to Server A.
Server A doesn't have config 0b01 yet, so it extracts an SID from the CID and adds 0x02 to its list of SIDs. It generates an Low-config CID that the LB will route to Server B!

From this, I conclude that dynamically and statically configured SIDs can't safely coexist in the same config space.

This is all very hard to reason about. All the options are ugly but my instinct is to retreat to the first option and just abandon this dynamic design.

Moving connections between server instances

Some text on how a server cluster might support moving of connections from one server instance to another would be useful. The current design might permit portability under certain conditions, but there are things that might need to be considered, such as the way in stateless resets are generated.

Effect on stateless resets

The draft doesn't address the impact of each method of connection ID generation on how servers can use stateless resets.

Most of this is likely bound up in decisions stemming from #8. If you can guess a valid but unused connection ID, then you might be able to induce a stateless reset that could be used to kill an open connection.

As the draft only includes methods that include an explicit server identifier, it is possible that as long as valid values cannot be guessed, the effect is minimal and each server instance can have its own configured stateless reset key (or a shared key from which a per-server key is derived using a KDF).

Cryptographic agility

The stream and block cipher configuration are locked to AES-ECB. AES-128-ECB too (this needs to be clear).

This is a fine design. If a better design is required, that can be achieved by adding a new arm to routing_algorithm. It might pay to say that and cite https://tools.ietf.org/html/rfc7696 at the same time.

Tweak non-compliant DCID recommendation

@martinthomson sayeth on the list:

That routing will rely on the stability of a subset of fields. I would select from (source IP, source port, destination IP, destination port, DCID) and no others.

It would be useful to include something like this in Section 4 as a non-normative hint on what fields to use, since our hint currently consists of an example that just uses the DCID.

Similarly, make that assumption clearer in Section 8.

A bit maybe used in cid to mark whether the server info encoded in the long header or not?

Dear sir,
As you know, there is no bit in the cid to mark whether the cid encoded the server info or not. In this case, no matter the server info encoded or not, the load balancer needs to decrypt or decode the cid. Do you think this is a useless try when the packet is the first initial packet? Do you think this is a useful idea to expand the cid format to use the first bit to mark the cid encoded or not? However, in this case, the client need to obey the rule.

Reduce load of Dynamic SID allocation

Today, an "lb_timeout" parameter tells LBs how long they need to save an SID allocation after it's last observed on an incoming packet. Servers may have to retire CIDs when they're approaching this limit. It's the only current method of getting rid of these, as we lack any sort of in-band mechanism for the two entities to communicate.

Advantages:

servers have more SIDs to choose from, which dramatically improves the entropy of PCID (where, admittedly, the operator has chosen not to care about concealing the mapping)
the server is seldom compelled to retire active CIDs, so the overhead of NEW/RETIRE_CONNECTION_ID frames is low

Disadvantages:

If the server ID space >> the number of servers, over time the LB's table gets very large, to the extent it could challenge the RAM of the device. Careful selection of a server ID length can help, but that's putting a lot on the operator.
The lb_timeout mechanism is annoying because LBs that store a little bit of state have to inspect every packet to make a note of the last time it observed an SID. Servers have to do the same to check when things expire. This seems easy to mess up.
Servers also have to decode every incoming Initial CID to extract the SID.

An alternative would configure servers to keep a small number of SIDs. This could be as low as 1 but might be 8 or 16. If an allocatable SID arrives at the server, if it does not instantly use that SID in the CID it sends with the server hello, it forfeits the allocation.

Example of acceptance:

SID 0x14 arrives in a Client Initial. LB hashes it to a server and adds it to a hash table of provisional allocations
Server generates CID using 0x14 and sends it the Server Initial, adding it to its list of server IDs
LB notes that the first inbound short header encoded 0x14 and makes it a permanent allocation

Example of rejection

SID 0x15 arrives in a Client Initial. LB hashes it to a server and adds it to a hash table of provisional allocations
Other long headers with 0x15, with any 4-tuple, also go to the server thanks to the provisional list (you can't be provisional to two servers!)
Server generates a CID that encodes 0x14.
LB notes that the first inbound short header did not encode 0x15 and decrements the ref count for the provisional allocation, deleting it if the ref count is zero.

We could even make it so the server MUST accept the first n that it can; then, the LB need not even do a provisional allocation after that peer has reached the threshold.

Advantages:

much fewer resources for SID tables
no timers, less CID parsing & inspection
the architecture doesn't mandate more CID retiring due to changing SIDs

Disadvantages:

Any suggestion about transmit client ip from quic-lb to quic-server?

Dear author:
As you know, in many production scenarios, quic-server need to know the real ip/port of client. But when there is a quic-lb in the middle(a fullnat quic-lb), there are not any standard way to implement this function. Actually this function is not difficult to implement, will quic-lb-draft suggest or define a standard way for this function later?

Setup CI

The editor's draft and the gh-pages branch are currently empty.

Routing of ICMP Packet too big messages

QUIC-LB LBs will not have the ability to properly route ICMP PTB messages without some additional work.

Servers SHOULD prepend a garbage Handshake packet to their MSS Probes, so that the SCID is there.
LBs SHOULD learn to parse these to extract the SCID and route them as they would a packet with that DCID.

Alternatively, it could keep track of client IPs/ports and their mapping to servers.

Make it clear the server might do length encoding on its own

The length encoding is mainly there for crypto offload, and the server MAY use this option even if the load balancer and config agent don't need it.

Low-config PCID: why stop using some server IDs?

This part is unclear to me:

A server SHOULD have a mechanism to stop using some server IDs if the list gets large relative to its share of the codepoint space, so that these allocations time out and are freed for reuse by servers that have recently joined the pool.

It is not obvious why these server IDs would be used by new server instances.

question about timestamp in token

As the draft describes:

'The date-time string is a total of 20 octets and encodes
the time the token was generated. The format of date-time is
described in Section 5.6 of [RFC3339].'

this needs 20 octets in ascii,
may be the unix time which is the the number of seconds since the Unix epoch in 8 octets
is a better choice in transmission and computing ?

Add Acknowledgments

Martin Thomson
Christian Huitema

others I'm missing?