mikebishop / dns-alt-svc Goto Github PK
View Code? Open in Web Editor NEWDraft for listing Alt-Svc records in the DNS
License: Other
Draft for listing Alt-Svc records in the DNS
License: Other
We need a careful implementor review on Section 4, particularly regarding what to put in the Additional section for negative responses (with and without DNSSEC).
I also wonder how this would interact with the CHAIN query.
After looking at the examples, several readers came away with the impression that an alias is required to use SVCB. In fact, aliasing is bad for performance and should be avoided when possible, but we've included it in all of our examples to show how all the pieces of the system work.
We should include an optimized example, and make it clear that many common cases can be handled without adding any indirection.
We should clarify the behavior for how alpn=h2 and alpn=http/1.1 interact with ALPN negotiation over TLS with an origin, including in cases where only one is specified in the DNS but where only the other can get negotiated via TLA.
One option would be to call this out and recommend that "h2" can negotiate to http/1.1 and/or h2 if only one is specified in HTTPSSVC records.
It would be good to include an example for Alt-Used (or mention it in a previous example).
The {client-behavior} reference is also broken.
We should reconsider the default port for:
_8443._foo.api.example.com. 7200 IN HTTPS 0 svc4.example.net.
svc4.example.net. 7200 IN HTTPS 3 . transport="tls"
ie, where no port is specified. In particular, should the port be 8443 (specified for the scheme) or 443 (default for https+transport).
Using the port from the scheme (8443) when no port is specified may be less confusing.
Clients that do HTTPSSVC should be modern enough to do SNI.
Can we nail down the semantics of the different SvcParamKey ranges?
To start with:
I can't see any reason why we should accept records that contain reserved keys.
The draft currently says that recursives must ignore additional AliasForm records, because only one is allowed. (Following all aliases in parallel would allow some terrifying exponential blowup.) We should clarify that recursives should nevertheless forward the entire RRset intact, in order to preserve DNSSEC validity.
It seems the only parameters which allow this is ipv4hint and ipv6hint, which is not necessary because the value serialization supports multiple addresses (#94).
Why do we need this? I think we should only allow one value per parameter.
This means duplicate values can be rejected generically in the parser. Right now the spec says individual parameters silently ignore all but the first value. Silently ignoring things isn't great. Other protocols like Roughtime further require keys be sorted, which makes it easy for the parser to reject duplicates generically.
This also avoids questions when designing a new parameter as to whether to use multiple parameter values or to encode multiple structures in the value. We should have one way to do things, so let's use the latter.
The HSTS section currently says
If the HTTPSSVC query results in a SERVFAIL error, and the connection between the client and the recursive resolver is cryptographically protected (e.g. using TLS {{!RFC7858}} or HTTPS {{!RFC8484}}), the client SHOULD abandon the connection attempt and display an error message.
This is meant to defend against DoS on the recursive-authoritative leg, when the zone is DNSSEC-signed and the recursive is validating.
We should generalize this defense to include timeouts and transport errors, not just explicit SERVFAIL, and move it out of the HSTS section, because (1) this DoS also applies to ESNI, not just HTTPS-upgrade, and (2) an attacker might be able to mount this attack on the client-recursive leg by selectively dropping DNS query or response packets (e.g. only dropping the largest responses, so the client gets the A record but not the HTTPSSVC record).
Based on some feedback from @martinthomson we should consider switching back to ALPN rather than introducing "transport". As part of this we'd want to clarify that:
We'd then switch from having a default transport protocol to a default alpn (eg, "http/1.1" for HTTPSSVC) and have no-default-alpn as a param.
The order of the SvcFieldParams is not meaningful. Should we require (and enforce?) that they appear in key order? This makes serialization more annoying to implement, but also removes a source of entropy that could be used to fingerprint authoritative implementations.
Currently, in-band Alt-Svc always has precedence over HTTPSSVC, and the presence of a cached Alt-Svc is supposed to disable the HTTPSSVC query altogether. However, this could be a problem for Chrome.
Chrome currently implements only "local Alt-Svc", i.e. Alt-Svc where the origin hostname is empty. All other Alt-Svc headers are ignored.
Consider a site hosted in a multi-CDN configuration, using QUIC on both CDNs. Currently, that site could send a header like h3=:443
, and Chrome would upgrade to QUIC.
Now suppose that both CDNs support ESNI. The site could send additional Alt-Svc headers like h3=cdn1.example:443; esnikeys=ABC...
and h3=cdn2.example:443; esnikeys=123...
, but Chrome would ignore both of these, so Chrome users would get QUIC but not ESNI.
If some Chrome users can query HTTPSSVC, those users could get the full package for ESNI through the DNS. However, the presence of h3=:443
in the Alt-Svc cache would prevent them from even doing the HTTPSSVC query (in the current draft).
A straightforward solution here would be to declare that the precedence of HTTPSSVC vs. Alt-Svc is implementation-defined. This might reduce the ability of origins to fine-tune suboptimal HTTPSSVC, but it gives clients more flexibility to do what makes sense for them.
Inspired by the conversation in issue #112: If we're concerned about the ability to add and experiment with new transports if using SvcParamKey, that may mean SvcParamKey just has a more general issue because I wouldn't expect the transports to require change particularly more often than anything else.
At issue, it seems 1 octet worth of code points for "private use" may not be sufficient for experimentation with new fields (or new transports if we make transports into top-level fields). Too much chance of conflict with other uncontrolled usage of the private use block. Seems we wouldn't really be able to use the block at all for experimental new stuff unless we constantly grease with random values from that range and the spec encourages wonky practices like always checking for a checksum in the value to ensure it's actually the param type the client is looking for. And 2 octets is probably not enough availability overall to let anybody claim large chunks in the registered blocks for multiple experimental code points.
I suggest we increase to 4 octets, make the unregistered block much larger to reduce the chances of conflicts, and encourage temporary registration of experimental parameters. Feels to me that 4 octets would be worth it for the main extensibility mechanism of SVCB. We'd also be able to save a bit of those octets back when params like the EsniConfig could more reasonably create new parameters for new versions instead of spending octets to do versioning within the parameter value as is the current plan.
Similar to how HSTS super cookies are used to track users, I think the Alt SVC lifetime instructions could be used to create implicit long term identifiers for users.
B/c of local DNS caching, its unlikely that popular (e.g. Apple) protections for HSTS cookies could be applied here (since the context of the DNS request would be lost, there'd be no sense of 1p / 3p requests).
I'm not sure how this issue could be addressed given then proposal, but would be happy to think through one. This is less a problem with the existing markup / header ALT-SVC options, since the browser can control the life time of those.
One suggestion in DNSOP was to make the SvcRecordType implicit.
For example, remove the field and distinguish between the two forms by
whether the SvcRecordValue is empty or not.
Currently, the format is [SvcFieldPriority] [SvcDomainName] [SvcFieldValue]
. I think it should probably be [SvcDomainName] [SvcFieldPriority] [SvcFieldValue]
. Then in AliasForm, the redundant 0
priority can be omitted without making the parser more complex.
We could maintain the current wire format, or reverse the order there too.
It seems like it would be useful for DNS clients to be able to process the priority field directly, without having to gain Alt-Svc parsing capability. Conversely, "pri" is not needed in Alt-Svc, where values are considered to be ordered already. Therefore, we should consider a fourth value (SvcFieldPriority
?) in the RR, present only when SvcRecordType
is 1.
A question from the DNSOP WG discussion is whether to make this record somewhat more generic, inclduing whether to change the record name. Some options:
Leave as-is and clarify that HTTPS is not just about web browsers.
Rename to ALTSVC. Keep default behavior for https:// and http:// but clarify that _$label use-cases can be used more generically for other protocols.
Make even more generic (SRVBIS? SRV2?). Define that format of the SvcFieldValue is specific to the protocol/scheme. (ie, might be Alt-Svc for HTTPS.)
Define a generic format with SvcFieldValue being specific to protocol/scheme. But then define a set of RRTYPES that are specific instantiations of this generic format. For example: HTTPSSVC as an instantiation for HTTPS with Alt-Svc in SvcFieldFalue. SRV2 as something purely generic. NS2 for handling secure delegations to DoH/DoT authorities with specification of protocols, ESNIKEYS, ports, etc, in some format of SvcFieldValue.
Elsewhere, HTTPSSVC's redirect was referred to as HSTS and it occurred to me we're missing one of HSTS's properties. It redirects and then it directs the browser to suppress the certificate click through button.
We could do something similar and say that HTTPS connections made off an HTTPSSVC record are assumed to have a competent TLS config and don't get a bypass button.
Examples like this one (from section 1.1) use a notation where the SvcFieldValue is surrounded by parentheses.
; ServiceForm
svc.example.net. 7200 IN HTTPSSVC 2 svc3.example.net. ( alpn=h3
port=8003 esniconfig="..." )
It looks like the top-level production rule is this:
pair = display-key "=" value
Shouldn't there be a rule just one level higher? Something like this?
svc-field-value = [ "(" 1*pair ")" ]
I believe that either this draft or a revision of rfc7838 should clarify the client behaviour or provide stronger guidance for the scenarios:
rfc7838 was not explicit here either, which has (probably) led to the fail-back having low implementation. Defined/predictable client failover behaviour is a big win.
Happy to help with words if you think there's a place for this...
The current text recommends shortening the TTL to compensate for misbehaving clients. This is not a good recommendation, since some clients will just ignore the shorter TTL. Instead, we should just mention that servers cannot rely on prompt expiration.
Credit: @puneetsood
Currently, if a server publishes a QUIC-only HTTPSSVC RRSet with ESNI, there is no way for a client to fall back to a non-QUIC connection, because doing so would reveal the SNI. This could increase the likelihood of partial outages for server admins who haven't considered the small fraction of users whose network path does not support QUIC.
Reviewers have reported concerns that this creates an undesirable level of fragility. We should consider whether there is an alternative design that would be less likely to result in accidental breakage.
One minor issue is that wherever CNAME is referenced, you probably want to also include a reference to DNAME, including any implied or explicit chaining of CNAMEs (which could be sequences of CNAME and/or DNAME modulo their respective behavior.)
You might also want to explain the motivation for keeping the FQDN separate from the alt-svc parameters (to make it trivial to parse, and thus to do DNS substitutions like CNAME/DNAME). It is there, just not as up-front as it could be.
It might be a little clearer if the list of alt-svc values (h2, h3, etc) that can occur were to be listed in the document. In particular, the association between h3 and QUIC is inferred but not explicitly called out (at least not that I noticed.)
We should finalize the record names with community input (along with pronunciation for SVCB).
It's possible that "SVCB" is just fine, but "HTTPSSVC" is a constant pain due to the double-S.
"SVCHTTPS" might be slightly better. "HTTPS" is great but makes text less readable.
From Mark Andrews:
================================
Introductory Example: The example record
example.com. 2H IN HTTPSSVC 0 0 svc.example.net.
does not match the description of the record (missing last field). It should be:
example.com. 7200 IN HTTPSSVC 0 0 svc.example.net. “”
Similarly in 2.4. HTTPSSVC records: alias form
Also don’t use 2H for the TTL. While some servers will accept it, it is not RFC compliant.
Unless these is a real reason for the record to be class agnostic please specify that
it is class IN specific.
We should clarify whether the HSTS implication applies to the fallback IPs, and also clarify that it does apply to AliasForm.
We should add a note to Security Considerations indicating that HTTPSSVC is unauthenticated in many cases (ie, unless DNSSEC is present and verified) and thus care should be taken around Alt-Svc parameters that imply trust.
I wonder if we should be explicit to say that Alt-Svc parameters must opt-in to indicate that they can be used in HTTPSSVC? (As even with "ma" we've had to define constraints.)
It is ambiguous how a client selects which transport to use if multiple are provided in a set, all of which the client supports. For example with:
alpn=h3,h2
or:
alpn=h2,h3
(with the implicit http/1.1).
Since we don't want to make this a list, the best way to handle this may be to specify that clients should apply their own heuristics for selecting which underlying transport (eg, TLS or QUIC) to try connecting with first as well as whether to try racing to both.
Operators wishing to force an ordering should then use different SVCB records at different priority levels.
If decoupling from Alt-Svc, having an alternative to Alt-Used would be valuable. This should take lessons from challenges with Alt-Used adoption and should minimize the privacy impact.
Some options include:
I'm leaning towards (3) above as this bounds the amount of additional entropy to be significantly less than what could be done already by using an alternate port number or IP(v4/v6) address but still allows some level of signalling without requiring servers to have to go through the complexity of needing to use distinct ports/IPs (or ESNI key IDs), all of which are possible but which leak more to passive adversaries.
Clients using a proxy want special handling. Rather than simply doing a CONNECT through the proxy to the origin hostname, clients should attempt to resolve HTTPSSVC and then issue the CONNECT to the terminal SvcDomainName (ie, still the same name that a CONNECT for an Alt-Svc received via a header would have used).
For background rfc7838 says:
A client configured to use a proxy for a given request SHOULD NOT
directly connect to an alternative service for this request, but
instead route it through that proxy.
One caveat here is that environments requiring a proxy may not allow clients to do DNS resolution. (Although clients doing HTTPSSVC resolutions via DoH through the proxy may not have this issue.)
The proxy case is another good reason/case to NOT inline the address records into the HTTPSSVC record. (or into an ESNI record)
Right now SvcDomainName appears to be in presentation format (i.e. dot-delimited), which is inconvenient for recursive operators. We should switch it to the usual DNS name encoding (but still uncompressed).
Some of the CamelCase names could be improved. For example, SvcFieldPriority is now used in AliasForm, so maybe it should just be SvcPriority or SvcIndex.
Define a way to include/inline a list of A and AAAA values to optimize for when SvcDomainName has not yet been resolved.
The current proposal is to create a separate draft for this purpose defining an "ips" SvcParamKey that can be used as a hint while waiting on SvcDomainName.
This is likely needed to retain parity with optimizations in the ESNI draft.
Given that conversion between HTTPSSVC and Alt-Svc is now potentially quite lossy (any unmapped or unrecognized keys are dropped) and has unclear security properties, we should probably de-emphasize the idea of converting between them, and focus on their parallel structure instead.
I'm pulling esnikeysref out of the existing version.
This also allows DNS servers to treat the Params as opaque which I've added a SHOULD regarding.
The removed text is here in-case we wish to re-insert it:
For ServiceForm, recursive DNS servers
MAY also include names referenced by SvcParamValue (such
as "esnikeysref") when those records are available and fit
within the response.
For ServiceForm, authoritative DNS
servers MAY also include in-bailiwick names referenced by
SvcParamValue (such as "esnikeysref").
A SVCB parameter "esnikeysref" is also defined for specifying a
reference to ESNI keys. This allows for both separation
of ESNI keys operational management as well as allows
ESNI keys to be cached with a longer TTL than the
SVCB record.
The value is a domain name which references a TXT RRSet containing
exactly one RR with a base64-encoded ESNIKeys structure.
The presentation format of the SvcParamValue is a fully qualified
domain name. The wire format of the SvcParamValue is the domain name
represented as a sequence of length-prefixed labels as in Section 3.1
of {{!RFC1035}}.
To translate this parameter to Alt-Svc, an "esnikeys"
parameter should be generated with the contents of the
TXT record pointed to by the domain name in the SvcParamValue.
If both "esnikeys" and "esnikeysref" parameters are specified in a
SVCB RR, the "esnikeysref" parameter MUST be ignored.
TODO: what happens if the TTL of the esnikeysref target is shorter
than that of the SVCB record? Requiring replacement adds lots
of complexity. Perhaps a SHOULD on relative TTLs with a warning
that clients may not reconstruct the Alt-Svc?
TODO: add logic on failure handling, perhaps also on when
to wait, as well as on prefering entries with literal "esnikeys"
when no "esnikeysref" value is in the DNS cache.
TODO: does this add to much complexity? This is in this draft to
expore the viability of external references since some people seemed
interested.
We've gotten a few questions about why these aren't a single RR type. The text could be more explicit on the design rationale for this.
basic-visible = %x21 / %x23-5B / %x5D-7E ; VCHAR minus DQUOTE and "\"
escaped-char = "\" (VCHAR / WSP)
contiguous = *(basic-visible / escaped-char)
quoted-string = DQUOTE *(contiguous / WSP) DQUOTE
value = quoted-string / contiguous
pair = display-key "=" value
fails to account for ';' which is the comment leader in master files.
We changed the ESNI configuration from a single ESNIConfig to a list of ESNIConfigs in tlswg/draft-ietf-tls-esni#200. HTTPSSVC/SVCB needs to be updated to match.
Despite having their own scheme (ws://, wss://), WebSockets should actually use HTTPSSVC. This has a firm basis on the Fetch spec, which says that each WebSocket has an equivalent http(s) URL, but it has come up as a point of confusion so I think it deserves a sentence in the draft.
The draft originally said (prior to #66) said the Alt-Svc cache overrides HTTPSSVC. This had problems (issue #58 and #60), so #66 downgraded it to a MAY. It seems this still has problems.
HTTPSSVC vs Alt-Svc
Consider a server which uses both QUIC and ESNI. It configures both in HTTPSSVC. It also cares about HTTPSSVC-less clients (older client or legacy DNS resolver), so it configures QUIC in Alt-Svc. Is it required to configure ESNI in Alt-Svc, or can it leave things alone (with the understanding that ESNI will be limited to clients that support HTTPSSVC)?
The spec currently says a client (that would otherwise support HTTPSSVC) MAY skip an HTTPSSVC lookup given an Alt-Svc cache entry. That means, for ESNI to work, the server MUST configure it in both. This is not obvious and should be written down. More importantly, it has deployment consequences.
HTTPSSVC records apply to the current HTTP request. If the client has no cached DNS record, it still queries DNS and gets HTTPSSVC. That means HTTPSSVC TTLs may be set more-or-less freely depending on the site's performance vs. flexibility needs. Let's say it's O(1 hour).
Alt-Svc headers apply to subsequent HTTP requests. If the client has no Alt-Svc entry cached, it will send the HTTP request without Alt-Svc. That means Alt-Svc TTLs must cover the time to the next HTTP request for Alt-Svc to be used at all. For reference, I see google.com currently uses 30 days.
Commitments
For the duration of the HTTPSSVC or Alt-Svc lifetime, the server operator has made a commitment to the client. ESNI is a soft commitment that the server understands this ESNI key and a hard commitment that the server is colocated with the public name. The first lower-bounds key lifetime and rotation on the server. There is a recovery mechanism, but it is expensive, so this is a soft commitment. The second is roughly a commitment to use a particular hosting provider. ESNI's retry mechanism requires the public name, so this is a hard commitment. Breaking this will knock out your site.
(Note Alt-Svc without ESNI was not a hosting provider commitment. A provider-specific Alt-Svc may fail if the site changes providers, but the client could still connect without Alt-Svc. ESNI must take this fallback away to prevent network downgrade.)
HTTPSSVC and Alt-Svc commitment timescales are qualitatively different. Saying ESNI servers must advertise in both, as implied by the spec today, means servers must incur a long-lived hosting provider commitment to deploy ESNI at all. (Or send no Alt-Svc headers and lose QUIC on non-HTTPSSVC clients.) It also means ESNI keys must be long-lived, which makes them more sensitive.
Proposed fix
Given the above, I don't see how allowing Alt-Svc to override HTTPSSVC is tenable. That suggests changing the spec so a client that makes HTTPSSVC queries makes them even if Alt-Svc is available. If it gets an HTTPSSVC record, it ignores Alt-Svc and uses those instead. Otherwise, it may freely use Alt-Svc.
This is fussy because Alt-Svc itself allows replacing the origin hostname with an alternate name. Clients would likely want to query the alternate's A/AAAA records, the origin's A/AAAA records, and the origin/s HTTPSSVC records in parallel. However, the alternate may leak ESNI, so the alternate connection must wait for whether HTTPSSVC aborts it before proceeding past that query.
That adds even more complexity to the prospect of actually implementing remote Alt-Svc. Personally, I think all these name indirections are seeming more and more like a mistake and questionably worthwhile.
Whither ESNI in Alt-Svc?
With the above, it is no longer strictly necessary to allocate a way to spell ESNI in Alt-Svc. I don't know whether we still want to. This issue means ESNI in Alt-Svc is very different from ESNI in HTTPSSVC. At minimum, we must clearly call out the implications of the longer lifetime in the spec. We could decide this is not worth the trouble. On the other hand, it's likely a number of clients won't make HTTPSSVC queries for some time, and perhaps those clients getting ESNI for the subset of servers willing to make a longer-term public name commitment is worthwhile.
Parting thought
I think a lesson here is we cannot completely abstract ESNI from its delivery mechanism. Pulling ESNI into HTTPSSVC is reasonable so we only have one record to query, but HTTPSSVC's decisions still have implications for ESNI. (@chris-wood, I dunno if you watch this repo, so CC'ing you in here explicitly.)
@miekg has asked for two related clarifications:
Concrete changes to make based on todays' discussion with Mike, Erik, Tommy, Chris, Ben, and Tim:
Have one draft defining two rrtypes, making it easier to keep it all in DNSOP.
Define a separate HTTPSSVC RRTYPE in the draft
Add a separate draft that adds an optional A/AAAA stapling parameter (and altsvc parameter).
Keep using expert review for the key type registry. (No changes needed.)
Clarify that recursives should be able to treat key types as opaque.
Remove the "esnikeysref". (But perhaps move it back to an issue.) Helpful for separating operational management, but pull this out for now.
Remove general parameter type specifications.
Ben: remove the "can ignore" optimization. (That Ben added in)
Bike shed: what to name the records. Keep with them for now, and get consensus later on what to rename them with.
For delimiters between parameters and their encoding:
Instead of an "alpn" or "transports" list, perhaps each transport should be its own SvcParamKey. Then the transport is considered supported for a record if the parameter appears in the SvcFieldValue parameter list for the record. Maybe h1/TLS is still considered supported by default unless there's a parameter saying to not use defaults.
Advantages:
From Ilari Liusvaara [email protected]
From Ilari Liusvaara [email protected]
It's not clear that we have a use case for SVCB chains containing more than one AliasForm record. The purpose of AliasForm is mostly for aliasing the apex; everything else can pretty much be handled with CNAME (e.g. AliasForm -> CNAME -> CNAME -> SvcForm)
At a minimum, limiting the chain length to 1 would seem to reduce the likelihood of a performance footgun.
In digging into production use-cases, there may be scenarios where the "HTTPSSVC means HTTPS-only" is problematic. For example, it may prevent a CDN from turning it on by default (as this means forcing everything to HTTPS-by-default which may not be possible for some customer content) which in-turns prevents turning on Encrypted SNI by default.
Leaving the HSTS-like behavior as the default makes sense, but this asks the question as to whether an optional "allow-insecure" parameter should be included (with the default still being secure). This opens up cans of worms from a security perspective but could ease deployment in some cases.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.