quicwg / datagram Goto Github PK

In-progress version of draft-ietf-quic-datagram

Makefile 100.00%

datagram's Introduction

An Unreliable Datagram Extension to QUIC

The datagram repository is is the historical home of the QUIC Datagram specification that was written by the QUIC Working Group.

The document has now been published as an RFC. Technical or editorial erratum can be reported to the RFC Editor using the errata tool.

datagram's People

Contributors

Stargazers

Watchers

Forkers

kjsisco hixio-mh martinthomson martinduke yyleeshine kaduk alagoutte

datagram's Issues

Specify Max Payload Size instead of Max Frame Size

Right now, the TP specifies a maximum frame size, including frame type, length and payload. This makes certain values invalid (0, 1?). Also, since this values practically is a kind of flow control, indicating how much data I'm willing to receive at a time, it's the payload length that's important here, not the framing.

For these reasons, I'm arguing to change this to specifying a maximum payload length. Then, the question of what a value of zero means. Should a value of 0 be the same thing as not present or should it mean that only 0 length datagrams are allowed? I think it is simpler to say that a value of zero is the same as not present (i.e. disabled).

(Issue copied from individual draft repo, by @nibanks on 2019-11-18)

For clarity, may this sentence be updated as follows? Original (comment 3)

For clarity, may this sentence be updated as follows?

Original:

   Identifiers used to multiplex different kinds of datagrams, or flows
   of datagrams, are the responsibility of the application protocol
   running over QUIC to define.

Perhaps:
Defining the identifiers used to multiplex different kinds of
datagrams or flows of datagrams is the responsibility of the
application protocol running over QUIC.

RFC Editor comment 1

Please insert any keywords (beyond those that appear in
the title) for use on https://www.rfc-editor.org/search.

Text on Datagram's Interaction with Loss Recovery is a bit Light

The draft has some good text in the "Acknowledgement Handling" and "Congestion Control" sections that essentially is stating that DATAGRAM is just like any other ACK-eliciting packet, but is not automatically retransmitted by the transport. That's all reasonable, but I think there needs to be more text on how the (suspected) loss of a packet with a DATAGRAM frame or a PTO with only an outstanding DATAGRAM packet should be handled.

I can see two possible models:

It's just like any other packet. The goal is to elicit some ACK from the peer to get accurate loss information about the outstanding packet as soon as possible. If there is nothing outstanding we could use to send in a new packet, just send a PING frame.
It's special. Because we don't necessarily intend to retransmit the data in the packet if it is actually lost, we don't actually care about immediate loss information/feedback. Don't force anything to be sent immediately to elicit the ACK.

So far as I have it coded up in MsQuic, I've assumed (1). This essentially results in an immediate PING frame/packet being sent out if I have nothing else to retransmit to try to elicit an ACK for the DATAGRAM frame/packet. This could result in a slightly noisier connection if the app doesn't care about all loss information about their datagrams, but, IMO makes for a cleaner design. I don't know what the general consequences to congestion control might be if we don't do this.

Assuming folks are in agreement, we should have some text on this topic in the draft.

Allow a Sender to Control Datagram ACKs

In discussions with a few parties using MsQuic we've come along scenarios were ACKs for datagrams were either not necessary or they should (almost must) be not sent until some other data was being sent as well. While thinking through these, the simplest solution I've been able to come up with would be for a sender to indicate to the peer that it should not treat datagram frames as ACK eliciting. How do folks feel about adding another (optional) transport parameter to this spec, when present indicates DATAGRAM frames are not ack eliciting. Obviously, the parameter is simply ignored if the peer does not advertise support to receive the frames.

Consider retransmission bit leakage

Rephrasing what I mentioned at the mic, imagine a scenario where an application uses DATAGRAM to send a single fixed message ("fire the missile"). An adversary on path can start selectively dropping packets and checking to see whether or not they're retransmitted to learn whether or not this special message was sent. (Retransmission detection could be done by looking at the size of the QUIC packet carrying the DATAGRAM, for example.)

I don't claim this is easy to do in practice, or useful, but I think it does raise interesting questions about how this new frame affects QUIC's security posture. Perhaps some text in the security considerations is needed?

(Issue copied from individual draft repo, by @chris-wood on 2019-11-19)

Clarify 0-RTT handling

Section 3 says:
“An endpoint MUST NOT send DATAGRAM frames until it has received the max_datagram_frame_size transport parameter with a non-zero value.”

I guess you assume that having received max_datagram_frame_size in a previous connection, when 0-RTT is used, counts but interpreting this MUST strictly, you cannot send datagrams in 0-RTT. Maybe that can be clarified.

Also if datagrams are used with 0-RTT but for some reason are not supported by the server anymore, it would actually be useful to have a more specific error message if the connection is closed by the server.

Exposing datagram acknowledgements

An issue came up in discussions around WebTransport: w3c/webtransport#168

The problem is that datagram frames may be acknowledged for the sake of congestion control, but dropped for the sake of flow control.

A receiver with a full datagram buffer would have two choices:

Drop any packets with datagram frames and do not acknowledge them.
Drop the datagram frames but acknowledge the packets.

The first approach seems terrible because it drop any other frames bundled with the datagram (ex. STREAM) and cause retransmissions. It would also be treated as network congestion by the sender, causing limited bandwidth for otherwise unrelated STREAM frames.

The second approach has ramifications for any implementations that MAY expose datagram acknowledgements. Specifically, the sender would believe that the datagram has been received (although not necessary processed), when in reality it could have been dropped and will never be processed.

Can DATAGRAM frame belong to stream?

The draft-ietf-quic-datagram has the following words: DATAGRAM frames belong to a QUIC connection as a whole, and are not strongly associated with any stream ID at the QUIC layer.
Can DATAGRAM belong to stream? And one stream has two type: one for unreliable transmission(DATAGRAM FRAME) and another one for reliable transmission(STREAM FRAME).

Not "strongly" associated

From list discussion:

  DATAGRAM frames belong to a QUIC connection as a whole, and are not 
  strongly associated with any stream ID at the QUIC layer

What does "strongly associated" mean in this context?

We should remove "strongly".

RFC Editor comment 2

Please review whether the "type" attribute should be set for
the sourcecode element in the XML file. If the current list of preferred
values for "type"
(https://www.rfc-editor.org/materials/sourcecode-types.txt) does not
contain an applicable type, then feel free to suggest a new one.
Also, it is acceptable to leave the "type" attribute not set.

Update acknowledgements section to include WG contibutors

We should update the acknowledgements

Question about scope: why only unreliable datagrams? Why not reliable datagrams too?

Some applications might want to send messages, and not care about the order of delivery, but still get reliability.

QUIC streams give them reliability, but within a stream it enforces order of delivery.

This datagram proposal gives unordered delivery of messages, but without reliability.

Why not provide datagrams with optional (opt-in) reliability?

It shouldn't be a lot of extra work for QUIC implementors to provide reliable datagrams as well as unreliable, given that retransmission is already there for QUIC streams.

(I guess an application could create a separate QUIC stream for each unordered reliable message. That might work, but seems ugly and may have some overhead.)

Please define the frame using RFC 9000 style

If you're receptive to this suggestion I am happy to file a PR

Awkward phrasing of pacing requirement

Implementations that use packet pacing SHOULD support delaying the transmission of DATAGRAM frames for at least the time it takes to send the paced packets allowed by the congestion controller to avoid dropping frames excessively.

I am having trouble parsing this. I think that this is saying that if you have a pacer, then an endpoint might delay DATAGRAM frames to avoid overriding the sending rate of the pacer.

However, this seems to imply that the maximum delay imposed here is the inter-packet interval of the pacer. I think that is a bad assumption, even if it might be a simplifying assumption that some implementations might choose to adopt. In some cases, the pacer interval will be too slow for the application (for low BDP, high RTT connections especially) or in some cases the pacer interval might be short enough to allow multiple intervals before dropping the packet (for high BDP, short RTT connections).

The goal should just be to avoid deferring DATAGRAM sending indefinitely, with controls over any delays being available to applications.

Maybe:

Implementations that use packet pacing ({{Section 7.7 of RFC9002}}) can delay transmission of DATAGRAM frames to avoid having packets sent faster than pacing would otherwise allow. Any transmission delays will need to allow for application-level constraints on frame delivery times.

Would you like to change "and" to "-" here? Current (comment 4)

Would you like to change "and" to "-" here?

Current: 0x30 and 0x31
Perhaps: 0x30-0x31

The latter would match the IANA registry and would
match how 0x12-0x13 (for example) appears in RFC 9000
(https://www.rfc-editor.org/rfc/rfc9000.html#section-12.4).

Is reliability really stream-based?

In the introduction, it says
"Reliability within QUIC is performed on a per-stream basis, so some frame types are not eligible for retransmission"

Is it really per-stream? Some frames are retransmitted even though they aren't associated with a stream.

Really, the detection mechanism is packet-based, the retransmission decision is made on a per-frame basis, and for stream-specific frames there is the added consideration of stream state (i.e. do not resend if reset)

I suggest replacing the sentence above with "Some QUIC frame types are not eligible for retransmission."

Instances of lowercase "may"

There are several instances of lowercase "may" in this document. It is ambiguous if these are normative requirements or not.

Congestion related information to the application

Real-time interactive media applications likely be one of the biggest consumer of the QUIC Datagram where application can have both reliable and unreliable data to send. Usually this kind of applications have rate controller which controls the media rate and it is possible to completely ignore the congestion control used underneath and develop something only looking a the sender queue. In that case, the QUIC datagram implementations must to provide such information to the applications. However, from my experience and my chat with other real-time application folks on this (see scream and nada algorithms from RMCAT working group), this will be not optimal to only look at the sender queue and will require congestion events ( such as loss, ECN) to be propagated to the application to make the rate controller work efficiently. The current text talks only about implementations MAY send sender buffer or drop at sender information to the application which does not seems sufficient here as this will lead to mediocre performance of the media application.

Here my suggestion would be to amend that the implementations always provide sender queue (and/or congestion related information to the application) or that the application the have possibility to query for those information. It is also worth saying that media applications need to think about their application need and tune their rate controller that they don't expect what ever congestion controller used for unreliable traffic is media aware. However, I don't think section 5.4 need to details the accurate behavior of the application's media rate controller.

Why do IANA considerations duplicate information from the body?

Noticed during Shepherd review.

All of the core types registered in RFC 9000 simply link to the relevant document (RFC 9000) and section in the document where the protocol element is defined.

In contrast, in datagram's IANA considerations section, the transport parameter and frame type registrations include the required specification field with some text that is already included in the document. This will manifest as a weird juxtaposition in the IANA table.

I suggest you simply reference the section 3 and section 4 respectively. If you really want some prose (I don't recommend that) then it can be added as a note to the table.

The

The draft-ietf-quic-datagram has the following words: This frame SHOULD be sent as soon as possible, and MAY be coalesced with other frames.If one DATAFRAM frame is coalesced with STREAM frame in one QUIC packet.And the QUIC packet is lost by the middlebox, according to the rfc9000, the quic packet must be retransmission, But the DATAFRAM frame do not need retransmission.

remind implementors of the "fun" parts of no flow control

Flow control is painful because getting it wrong is painful.

Not having flow control doesn't mean having it done right!

Would be great to have a brief reminder around this in the draft!

(Issue copied from individual draft repo, by @grmocg on 2019-11-19)

consequence of not protecting DATAGRAM with 0-RTT or 1-RTT

Section 5: says

DATAGRAM frames MUST be protected with either 0-RTT or 1-RTT keys.

I would suggest to add the "otherwise what happens" part. One/two like explanation will be helpful and I think that will improve the understanding of this requirement.

Why is the recommended `max_datagram_frame_size` 65536?

Current text says:

It is RECOMMENDED to send the value 65536 in the max_datagram_frame_size transport parameter as that indicates to the peer that this endpoint will accept any DATAGRAM frame that fits inside a QUIC packet.

This is nice, but I'm wondering if it should be 65535. The max UDP payload for a QUIC packet is 65527, which is lower than both. It's nice for an implementation to only need to store 16-bit integers for max datagram size to avoid excess memory, and it's of course easy to round down from 65536 to 65535, but it just seems arbitrary to have it be 1 more than the max value for a 16-bit integer...

What happens if an application wants to send a too large datagram?

Section 5 says:
“DATAGRAM frames cannot be fragmented; therefore, application protocols need to handle cases where the maximum datagram size is limited by other factors.“

However, this section does not say what the transport should do if the application tries to send a too large datagram. Is that datagram just dropped, eventually indicating an error to the application? Would be good to explicitly spell this out.

No streams in datagrams?

I finally sat down and read this draft (nice work, BTW) and was somewhat sad to learn that QUIC won't natively support stream multiplexing for datagrams.

It is true that applications can implement this if they choose. But one nice thing about QUIC is that it takes care of this machinery on behalf of the application. This is a somewhat aesthetic concern, but it feels awkward to have one layer of stream multiplexing in QUIC and then another layer in the application. It would be ugly in our implementation, at least.

Moving on to non-aesthetic issues: I don't have a full grasp of all the use cases for DATAGRAM, but if stream multiplexing is a common requirement, it would be good to move this into the transport. Applications that don't need this are free to do it all over one stream at a tiny loss in wire efficiency, or we could have a stream-less version if people are deeply concerned about one byte per frame.

Lack of application-defined format

Neither this draft nor H3-Datagrams says what the behavior should be when the transport supports DATAGRAM frames but no supported application format has been negotiated for them. While I think the correct behavior would be to drop them, since there's no ability to interpret their payload, no document actually says that. There's also a legitimate argument for closing the connection, since something was sent that the application layer can't deal with.

H3-Datagrams probably can't say that, since the possibility exists that a different datagram-payload format will be deployed for HTTP/3 in the future. Should this document have a sentence about what happens when the application-layer doesn't define what to do with incoming datagrams / doesn't consume them?

(There's also the question of simultaneously supporting two different formats, but I think that's an application-layer issue this document can ignore.)

State clearly the IANA registration type of TP and frame type

Noticed during the shepherd writeup of IANA considerations:

The document is attempting to register:

max_datagram_frame_size Transport Parameter with value 0x0020
DATAGRAM frame type with values 0x30 and 0x31

According to IANA (https://www.iana.org/assignments/quic/quic.xhtml) registrations can be of different types. Here the requested values are in the permanent, 0x00-0x3f | Standards Action or IESG Approval bracket.

So I suggest that the document state clearly that this is a request for permanent registration of these types.

Nit picking error condition for supported but not enabled

Sorry/not sorry for being pernickety. The current draft says:

An endpoint that includes this parameter supports the DATAGRAM frame types and is willing to receive such frames on this connection.

and

An endpoint that receives a DATAGRAM frame when it has not sent the max_datagram_frame_size transport parameter MUST terminate the connection with error PROTOCOL_VIOLATION.

which is slightly ambiguous if we consider an vanilla QUIC endpoint that does not implement this extension at all. Absence of the TP can indicate the parameter is totally unsupported, or is supported but is not desired for the current connection. In the totally unsupported case, the receiving endpoint is likely to act according to transport Section 12.4 which says

An endpoint MUST treat the receipt of a frame of unknown type as a
connection error of type FRAME_ENCODING_ERROR.

Maybe this doesn't need to be fixed because one shouldn't expect the requirements of an extension to apply if it is not implemented. But perhaps some editorial tweaks could tighten things up.

For the case of an endpoint does support DATAGRAM. You might also want to consider a wire message that more clearly describes the error condition of "extension supported but not enabled". This could be a new error code, or a reason phrase.

Question about DATAGRAM frame

The draft has the following words:
This document defines two new DATAGRAM QUIC frame types, which carry application data without requiring retransmissions.

The draft don't describe it like:When DATAGRAM frame is lost, the implemention should not retransmission it.That's to say, when DATAGRAM frame's lost, the implemention have two strategy，one for retransmissing, another for not.That's right?

Bandwidth distribution to media and non-media traffic - applicablity statements

There is another matter with available bandwidth distribution to media and non-media bulk traffic. if the application have these two sort of traffic mix on the same QUIC connection then the bulk traffic streaming can chock the latency sensitive media/ game control traffic unless there is priority applied or bandwidth distribution is provisioned (fixed for media/bulk traffic). The current draft does not talk about it. As this traffic mix is application dependent I think that is fine to leave out the details of the how to do the distribution and what priority to use. However, This is something the user of the QUIC should consider. This piece of information will be very important hence we may want to add this to applicability statements of QUIC datagram usage. I would like to know what others think about it? and where such applicability statements should go for QUIC extensions when the applicability draft is already published?

Can be acknowledged?

QUIC datagrams, while unreliable, can support acknowledgements, allowing applications to be aware of whether a datagram was successfully received.
-- https://www.ietf.org/archive/id/draft-ietf-quic-datagram-03.html#section-2-2.3

Isn't it the case that they are acknowledged?

(This one is just a nit, the next issue will be more serious.)

Is max_datagram_frame_size Unidirectional Configuration?

The spec clearly states the following:

The max_datagram_frame_size transport parameter is an integer value (represented as a variable-length integer) that represents the maximum size of a DATAGRAM frame (including the frame type, length, and payload) the endpoint is willing to receive, in bytes.

But it's not completely clear what happens if both sides send different values. Is the value purely a unidirectional configuration? For instance, if the server advertises a value of 500, and the client advertises a value of 100, can the client still send 500 byte datagrams? Or does this essentially negotiate the max value either side can use to 100?

If this is a unidirectional configuration, why require the peer to send the TP at all, if all they want to do is send datagrams, and not receive them? I'm loosely basing my thoughts on what the design could be on how we negotiate the number of streams an endpoint is willing to accept. Following that model, I'd recommend a design where, if an endpoint is willing to receive datagrams, it advertises a max_datagram_frame_size it's willing to accept. The TP has absolutely no meaning for the send direction. The protocol on top of QUIC decides how to interpret only a single direction allowing datagrams to be sent.

Question about: "not used for loss recovery"

Can someone explain why the ack from Datagram-only packets are not used for loss recovery?

Receivers SHOULD support delaying ACK frames (within the limits specified by max_ack_delay) in reponse to receiving packets that only contain DATAGRAM frames, since the timing of these acknowledgements is not used for loss recovery.

I would appreciate if someone can point me to an existing discussion or explanation. Thanks.

Anti-affinity for unreliable datagrams

Coming out of some discussion at the WebTransport BoF during IETF 107, @enygren created an issue on the WebTransport API w3c/webtransport#109 (comment):

There should be a way to specify that unreliable datagrams do not end up in the same packet, at least when the underlying QUIC or HTTP/3 interface is used. For HTTP/2, an equivalent behavior may be a way to indicate which packets get dropped or thinned when this is needed (eg, to disprefer adjacent packets from being dropped).

While I don't think the DATAGRAM draft itself should concern itself with the API too much, I do wonder if there are some guidance or considerations that could be captured about coalescing of DATAGRAM frames in packets.

Overstating value of acknowledgments

QUIC datagrams, while unreliable, can support acknowledgements, allowing applications to be aware of whether a datagram was successfully received.
-- https://www.ietf.org/archive/id/draft-ietf-quic-datagram-03.html#section-2-2.3

This is not an application-layer signal. While the peer might have received the packet and processed it at the QUIC layer, this does not guarantee that the DATAGRAM frame contents were processed by the application.

The draft already acknowledges this in Section 5.2. However, the treatment in the motivation could be misleading.

Consider dropping this item.

Is it obvious that Datagram frame can be aggregated in the same QUIC packet

I find no discussion about aggregating DATAGRAM frames in the same QUIC packets, with other DATAGRAM frames or other types. Is that so obvious that it is possible that it doesn't need mentioning?

I would be slightly worried that an QUIC stack that aggregate may cause fate sharing between different datagram frames, which the application wasn't expecting. In a classical UDP application clearly doing two calls for UDP packets will create two different packets. Two calls to transmit Datagram frames may not cause multiple QUIC packets to be sent, which is usually for the good. However, it goes back to maybe be clear that this may occurr, and a question if API needs consideration to indicate if aggregation is fine or not?

Document interaction of datagrams with pacing

Early in our VPN implementation, we sent QUIC datagrams without any queueing, assuming that if a datagram gets dropped, then we exceeded the cwnd (and thus the estimated channel capacity), so it would likely not reach the peer anyways. In practice, this turned out to be a terrible idea that lead to underutilization of the channel; our implementation always does pacing, so if you don't queue the datagram, it will get dropped unless it just happens to fit into the current pacing quantum.

Because pacing is RECOMMENDED for QUIC, I believe some form of queueing should be RECOMMENDED for datagrams, otherwise the users risk running into the same hard-to-debug problem that we had.

It would be nicer if we explain why we are recommending that pattern. Some lines to capture the valuable discussion on this would be valuable.