quicwg / load-balancers Goto Github PK
View Code? Open in Web Editor NEWIn-progress version of draft-ietf-quic-load-balancers
Home Page: https://quicwg.org/
License: Other
In-progress version of draft-ietf-quic-load-balancers
Home Page: https://quicwg.org/
License: Other
The server now needs to include this field.
For the no-shared state service, we just need language that the Retry service needs to encode enough information validate the packet DCID as well and drop if it fails validation. If it does, the server MUST use the packet DCID in the retry_source_connection_id TP.
For the shared state service, the retry source connection ID is going to have to be in the token. We might be able to compress this by xoring some fields; I'll think about it.
Almost by accident, QUIC-LB can also route DTLS 1.3 associations over UDP, as long as the client agrees to support connection IDs.
The Ciphertext packets match the short header in the relevant ways, and so can be routed without any issues.
The plaintext packets appear to be QUIC short headers, but as the second byte happens to always be 0xfe, QUIC-LB will 4-tuple route it.
IIUC, this doesn't apply to earlier versions of DTLS. So, one can deploy DTLS behind a QUIC-LB infrastructure as long as
One could also write a slightly different version of QUIC-LB that supported DTLS without reservations, but you'd have to be aware that the packet was DTLS and leverage information about the format of the first byte.
Make sure each algorithm always provides 2-3 octets for server use. The current SID limits don't do that.
@ianswett asks if there is added value in expressing server ID lengths in bits instead octets.
This is a bit of implementation complexity and a lot of churn in the spec and the handful of implementations that exist, but it does give the configuration agent a little more granularity.
Does anyone find the value of this granularity to be compelling?
There are so many variable length fields that the ASCII art doesn't serve much of a purpose. Just switch to the notation used in quic-transport for the various CID and token formats.
We've had some discussion in this area in the past, and we decided that the best way to statelessly (and consistently) load balance long header packets would be to use a hash of the UDP 4-tuple and the client's source CID; since these are the only constants for all incoming (to the server) long header packets. I have come up with a couple of problems with this approach:
As far as I know, there is no statement in the Invariants that says these must all stay constant for all future versions of QUIC.
Using the hash approach can only function statelessly if there is not change in the DIP configuration. If the set of servers being load balanced changes (which we must assume to be common), then whatever stateless logic you have that maps hash to DIP would also change, resulting in long header packets getting routed incorrectly most likely.
A follow up to (2), if you assume that the long header packets cannot therefore be routed statelessly based on the hash, and state must be tracked to continue to consistently route all long header packets until they are no longer used, at what point can the LB discard this state. By design, there is no on-path signal to indicate "long header packets are no longer used". Any heuristic that might be added here would be affected by (1) too.
Because of these issues, I'm left scratching my head on the best way to recommend to LBs on how to load balance invariant long header packets. The best thing I can think of is:
Because the client's CID is included in the flow calculation, it allows an attacker to create nearly unlimited number of flow states on the LB. You might argue that Retry could be first used by a (cooperating) DoS appliance to first validate source address, but after that is done, this attack can still be executed. It would then require some heuristics on the LB to protect against.
@martinduke @martinthomson any ideas here?
I'd suggest 'Fallback' or another word than arbitrary.
A definition of Arbitrary is: "Based on random choice or personal whim, rather than any reason or system."
In fact, there are sensible constraints on this algorithm to ensure routing works correctly for non-compliant CIDs and connections don't fail.
Erik Fuller points out via email:
This term “configuration phase” had me confused. These two sentences are the only place we reference it. It’s basically a configuration ID so we can distinguish between settings across connections during a deployment, right? Once a new config is deployed, what happens to all the connections in the old format?
After reading through I’m still not certain what “phase of the algorithm” means
He's right. We should just call it a configuration ID and be clearer on what's what.
The bit variable length bit fields are wrong, and don't match the text. Fix them.
Some load balancers today switch based on the SNI. Obviously, this is not version-invariant. We should add some language about this.
What should such an LB when it encounters an unknown version? It probably has no choice but to forward it based on CID and hope for the best.
As the config gets more complicated, the C-ish pseudocode is getting more unwieldy. YANG is the standard for configuration models, so I should bite the bullet and just figure out YANG.
Hi author:
Will quic-lb design Keepalive mechanism next? Surely it's a very important mechanism in load balancer.
It would help the anti-DDoS properties if the Retry Service could receive explicit instructions about which QUIC versions the server might support. This provides a way to deploy new versions without having to upgrade the retry service software or hardware.
In principle, the low-config algorithm's method of extracting a server ID can be extended to all the algorithms. Thus the "server ID allocation method" would be an independent variable, with value 'dynamic' and 'static', and the algorithms could operate with either method.
This is a significant refactor of the routing section, but will make future decisions about static/dynamic much cleaner to discuss.
A few more issues with Retry tokens:
For Retry, we're supposed to check the client port. Because this isn't true for NEW_TOKEN, we probably can't put it in the shared-state pseudoheader.
Relatedly, we currently just say that the shared-state Retry Service has to be behind any NAT. This isn't enough.
For non-shared-state, we're probably fine. For a Retry, the NAT will keep the 4-tuple binding so that on either side there is something to validate.
For shared-state:
We can probably assume that the alternate path that creates the need for shared state won't cause a service-generated token to suddenly appear on the unprotected path.
As server clusters increase in size, the need to reallocate server identifiers becomes more acute.
In one model, the configuration ID is used to indicate a stable routing configuration. Server identifiers for a given configuration ID are routed to the same server, no matter how many other instances are added or removed. In order to allow for changes in the cluster, the configuration ID is used so that old servers can be removed from consideration and new ones added.
If these changes happen frequently enough, the number of bits allocated to identifying a configuration might be insufficient. Why not make the length of the identifier flexible? That might mean that you need to make the length of the length similarly configurable.
We should tighten up the rules for Resumption Token processing by the Retry Service.
When active, it should reject the packet, but it should send Retry. I believe we can distinguish resumption from Retry because the CID length fields are zero; as any Retry token must have a ODCIDL of at least 8, this would appear to be robust.
On a related note, the requirement on servers to encode a way to distinguish the two token types is silly, because of this propery.
There is a requirement that it be difficult for a party other than the server and load balancer to guess a CID that will be accepted as valid for a target connection.
This requirement needs to be validated for the schemes described in the draft. This might impose some constraints on the designs chosen.
For instance, I don't believe that the plaintext algorithm meets this goal. The server ID can take all the available space, which is probably wrong. Clearly it is impossible to create sufficient connection IDs for even a single connection if there is only one valid identifier per server. However, it might be argued that even an 18 byte server ID makes it too easy to guess a valid connection ID for a connection (just 16 guesses would be enough to get a 50% chance at that). So it seems to me that a shorter connection ID is necessary.
The same applies to any attempt at obfuscation.
The encrypted versions might be similarly challenging to get right. The For Server Use field in the stream cipher variant needs to be sufficiently long as to avoid engineered collisions. The value used for the stream cipher is malleable, which means that an attacker isn't prevented from guessing. In many ways, this is more challenging than the plaintext variant because the nonce consumes space.
The zero-padding in the block cipher mode might be the best way of preventing guessing, if it were sufficiently long. Similarly, if "Encrypted bits for server use" were sufficiently sparsely populated, then guessing can be hard enough.
Add a security consideration to avoid the following scenario:
MyCloudProvider has a single QUIC-LB config for all its load balancers. It rotates keys periodically, etc, but everyone gets the same config. Obviously, all the attacker has to do is open an account with MyCloudProvider and it is able to recover all the server IDs.
Configs ought to be restricted to load balancers serving a finite set of servers. It is possible another MyCloudProvider customer is in the pool behind that load balancer, but that's already a privileged position as already described in the draft.
Obviously, this will require some wordsmithing, as the statement above isn't very precise.
Currently the Initial Packet number MAY be encoded in the Retry Token. We must either:
I'm not an encryption expert, but if IIUC it's insufficiently hard to forge a shared-state retry token.
By inducing a Retry, an attacker can obtain the Retry SCID Length, and then focus entirely on an ODCIDs that allow the CID + CID Length part of the token to be a multiple of 16 B. For example, if RSCIDL= 10 B, then make ODCIDL = 20 B -> 32 Bytes for the block.
This then breaks Retry forgery into three separate problems:
(1) Obtaining the mapping of IP Address to the first 16 Bytes of the token. A well-positioned observer could build a database of these in the time scale between token key rotations.
(2) Generate lots of Retry tokens with an ODCID of the correct length, so there is a range of valid CID blocks.
(3) Obtain a valid Retry every few seconds, using the right ODCID length, so that we have a valid timestamp.
Thus, the attacker has a database of
As the spec uses AES-ECB, these blocks can be mixed and matched to create valid Retry tokens.
This is not exactly trivially open to attack [1], but it does feel like we're conceding a lot of entropy here. I would like someone to propose an alternate design that restores some of that entropy.
[1] Step (1) seems to require a fairly privileged position in the network.
Buried deep in Sec 7.2.2:
In inactive mode, the service MUST forward all packets that have no token or a token with the first bit set to '1'. It MUST validate all tokens with the first bit set to '0'. If successful, the service MUST forward the packet with the token intact. If unsuccessful, it MUST either drop the packet or forward it with the token removed. The latter requires decryption and re-encryption of the entire Initial packet to avoid authentication failure. Forwarding the packet causes the server to respond without the original_destination_connection_id transport parameter, which preserves the normal QUIC signal to the client that there is an on-path attacker.
My understanding of these services is that they will be injected in the path only when under DoS attack. According to this, something has to hang around to validate Retry tokens. Is this feasible?
@ianswett proposed a PCID design that assigns SIDs on the fly instead of having to pre-configure them. There are drawbacks but it has some nice properties.
In PR #95 I suggested non-routable, though I realize non-conformant may also be an option.
The sentence which caused me to suggest non-routable is: https://github.com/quicwg/load-balancers/blob/master/draft-ietf-quic-load-balancers.md#non-compliant-connection-ids-non-compliant
These client-generated CIDs might not conform to the expectations of the routing algorithm and therefore not be routable by the load balancer. Those that are not routable are "non-compliant DCIDs" and receive similar treatment regardless of why they're non-compliant:
I'm happy to write a PR if others find non-compliant potentially confusing as well.
Since draft 28, the retry mechanism includes a requirement that the client DCID in the retried connection matches the server SCID in the retry packet. I do not see a discussion of mechanisms to verify the retried DCID in section 5 of the draft.
This is often used to mean Source Connection ID in other contexts. A collision here is likely to cause confusion.
QUIC-LB has a bit of an incentive mismatch. The server infrastructure decides how linkable the CID algorithm is, but the client bears most of the cost of the CIDs being linkable. Worse yet, the client has no idea, without a lot of effort, what the servers are doing. Even worse, the servers have some incentives to pick something that's easily linkable.
In Section 8, it says:
Servers that are running the Plaintext CID algorithm SHOULD only use it to generate new CIDs for the Server Initial Packet and SHOULD NOT send CIDs in QUIC NEW_CONNECTION_ID frames
This is a concise way of not giving the client tools to link itself by trying an unsafe migration.
We could just stick with that. A richer way to go would be to create a new transport parameter (e.g. cid_is_linkable, cid_not_encrypted) that would explicitly communicate the risks to the client. We could have a different value for OCID or batch PCID and OCID together.
How does a dynamic framework survive an HA handover? It would seem to lose all the SID allocations and break all connections.
As discussed at IETF 108. Split from #8.
In Section 7:
"Retry services MUST forward all QUIC packets that are not of type Initial or 0-RTT. Other packet types might involve changed IP addresses or connection IDs, so it is not practical for Retry Services to identify such packets as valid or invalid."
MUST is too strong. If it keeps any state (i.e. tracking 4-tuples) it can drop non-initial packets. (However, this would make migration not work).
If the Retry Service is in front of a QUIC-LB load balancer, the LB will drop random 1-RTT packets but not Handshake and 0-RTT unless it is version-aware, so passing 1-RTT is "safe."
If the Retry Service is behind the load balancer, which is probably better because LBs are often NATs, then random 1-RTT is already dropped and it is safe to forward 1-RTT to preserve migration.
If there's a non QUIC-LB load balancers, migration doesn't work anyway; might as well drop it.
If it's a single server and the CIDs are random, admitting 1-RTT is weakening the DoS defense.
And then there is the issue with QUIC versions and admitting/dropping them, which is hard to adjudicate with short headers.
Hi Author:
I have a little confused about 'configuration agent', from the description of draft, I think it should be a centralized control plane of 'load balancer' and 'server', but from the name 'agent', it seems like it should be a agent component which was used to receive message from control plane. So, what is the most correct definition of 'configuration agent'?
Section 5.3 specifies a Shared-State Retry Service describes a token format in which the token include the ODCID, a client IP encoded on 128 bits, and a 20 octets data-time, plus additional data. The token is encrypted using AES ECB. This seems sub-optimal:
Using AES GCM or another AEAD format seems more natural. AEAD checks will immediately detect an invalid token, while using ECB forces reliance on invalidity heuristics.
If using AEAD, there is no need to encode the IP address in the token. It can be derived from the IP header and placed in a pseudo header. The pseudo header can then be authenticated as part of AEAD decryption.
The pseudo header approach can be used to authenticate other fields, e.g. verify that the DCID matches the SCID sent in the Retry packet.
Encoding the time as 64 bits time64_t seems more natural than ASCII encoding, and also shorter.
The diagrams indicate they are in the range 0..160. This is incorrect.
@huitema has done some nice analysis on the constraints of using a 96-bit random nonce to encrypt the Shared-state retry keys. This analysis should be in security considerations.
The encrypted CID format includes a zero-pad field that is used to detect whether the decryption succeeded or not. I suggest merging this field with the server ID field, and test whether the decryption succeed by checking whether the server ID is valid or not. This assumes that the server ID field is sparsely populated. For example, if there are just 256 servers, in theory a 1-octed field would be sufficient; instead, we could use a 4 or 5 octet server ID field that would be sparsely populated, allowing for error detection.
This would allow for unified validity detection across all supported methods:
It would also allows for simplification of the configuration for the encrypted method, by specifying just one field instead of two.
For low-config CID (LCID), which dynamically allocates server IDs, the current editor's draft has a heuristic to extract a server ID from a client-generated non-compliant CID. The fundamental issue is that the server has to get an SID from an incoming Initial, even if the SR bits of the Connection ID imply that it doesn't map to the LCID config.
If the DCID references the 4-tuple routing bits or an undefined configuration, use the following procedure to establish a predictable template for server ID extraction:
- Identify the instance of Low-Config CID configuration with the largest config rotation codepoint. For example, if configurations 0b10, 0b01, and 0b00 all use the low-config CID algorithm and have server ID lengths of 3, 5, and 7 octets, respectively, and a packet comes in with codepoint 0b11, the load balancer would extract 3 octets for the server ID.
- Extract the appropriate number of octets.
- If the server ID matches one already in the table, forward the packet to that server.
- If not, the load balancer runs the algorithm of its choosing and adds the extracted server ID to the table corresponding to the highest-value Low-Config CID Configuration codepoint.
This doesn't work with config rotation. The main principle in CR is that the load balancer needs to get the configuration first -- otherwise the server might generate CIDs that the LB can't route, and we break ongoing connections. Given that, we have a problem. Consider:
Stated more generally: in the current design, the LB can never be sure if the server has a given configuration and that makes it very hard for the LB to infer what the server is going to do with a given CID to get an SID for its table. We don't even know if it's non-compliant at the server or not!
There are some potential fixes here:
Unfortunately, this breaks even more badly it coexists with a configuration with static SIDs. Another example:
From this, I conclude that dynamically and statically configured SIDs can't safely coexist in the same config space.
This is all very hard to reason about. All the options are ugly but my instinct is to retreat to the first option and just abandon this dynamic design.
Some text on how a server cluster might support moving of connections from one server instance to another would be useful. The current design might permit portability under certain conditions, but there are things that might need to be considered, such as the way in stateless resets are generated.
The draft doesn't address the impact of each method of connection ID generation on how servers can use stateless resets.
Most of this is likely bound up in decisions stemming from #8. If you can guess a valid but unused connection ID, then you might be able to induce a stateless reset that could be used to kill an open connection.
As the draft only includes methods that include an explicit server identifier, it is possible that as long as valid values cannot be guessed, the effect is minimal and each server instance can have its own configured stateless reset key (or a shared key from which a per-server key is derived using a KDF).
The stream and block cipher configuration are locked to AES-ECB. AES-128-ECB too (this needs to be clear).
This is a fine design. If a better design is required, that can be achieved by adding a new arm to routing_algorithm
. It might pay to say that and cite https://tools.ietf.org/html/rfc7696 at the same time.
@martinthomson sayeth on the list:
That routing will rely on the stability of a subset of fields. I would select from (source IP, source port, destination IP, destination port, DCID) and no others.
It would be useful to include something like this in Section 4 as a non-normative hint on what fields to use, since our hint currently consists of an example that just uses the DCID.
Similarly, make that assumption clearer in Section 8.
Dear sir,
As you know, there is no bit in the cid to mark whether the cid encoded the server info or not. In this case, no matter the server info encoded or not, the load balancer needs to decrypt or decode the cid. Do you think this is a useless try when the packet is the first initial packet? Do you think this is a useful idea to expand the cid format to use the first bit to mark the cid encoded or not? However, in this case, the client need to obey the rule.
Today, an "lb_timeout" parameter tells LBs how long they need to save an SID allocation after it's last observed on an incoming packet. Servers may have to retire CIDs when they're approaching this limit. It's the only current method of getting rid of these, as we lack any sort of in-band mechanism for the two entities to communicate.
Advantages:
Disadvantages:
An alternative would configure servers to keep a small number of SIDs. This could be as low as 1 but might be 8 or 16. If an allocatable SID arrives at the server, if it does not instantly use that SID in the CID it sends with the server hello, it forfeits the allocation.
Example of acceptance:
Example of rejection
We could even make it so the server MUST accept the first n that it can; then, the LB need not even do a provisional allocation after that peer has reached the threshold.
Advantages:
Disadvantages:
Dear author:
As you know, in many production scenarios, quic-server need to know the real ip/port of client. But when there is a quic-lb in the middle(a fullnat quic-lb), there are not any standard way to implement this function. Actually this function is not difficult to implement, will quic-lb-draft suggest or define a standard way for this function later?
The editor's draft and the gh-pages
branch are currently empty.
QUIC-LB LBs will not have the ability to properly route ICMP PTB messages without some additional work.
Alternatively, it could keep track of client IPs/ports and their mapping to servers.
The length encoding is mainly there for crypto offload, and the server MAY use this option even if the load balancer and config agent don't need it.
This part is unclear to me:
A server SHOULD have a mechanism to stop using some server IDs if the list gets large relative to its share of the codepoint space, so that these allocations time out and are freed for reuse by servers that have recently joined the pool.
It is not obvious why these server IDs would be used by new server instances.
As the draft describes:
'The date-time string is a total of 20 octets and encodes
the time the token was generated. The format of date-time is
described in Section 5.6 of [RFC3339].'
this needs 20 octets in ascii,
may be the unix time which is the the number of seconds since the Unix epoch in 8 octets
is a better choice in transmission and computing ?
Martin Thomson
Christian Huitema
others I'm missing?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.