Giter Club home page Giter Club logo

Comments (11)

jedisct1 avatar jedisct1 commented on May 28, 2024 4

There has never been a formal definition of a non-logging resolver, but this is a very important topic, and something that we should define all together.

Logging the client IP address, even temporarily, should probably clear the 'non-logging' bit immediately.

Now, what about logging queries and responses?

Even without client IP addresses, this can leak sensitive information.

While a unique sequence of queries does not reveal the client IP, it reveals when that device is online.

More importantly, DNS queries, even to nonexistent names, reveal information about the network, what software is being used and more.

For example, queries for testing-secret-internal-project.bankofamerica.com could reveal the address of something that was originally not supposed to be public.

Another issue is that when a query for a nonexistent name is made, operating systems can be configured to retry using the "default" domain (or even a set of domains, e.g. with the search property in resolv.conf). So, a Bank of America employee trying to access hardcorefishrubbingfetish.com would send a query for that name first, and fall back to a second query for hardcorefishrubbingfetish.com.bankofamerica.com.

While the first query doesn't reveal much information about the identity of the client, the second does.

A third issue, similar to the previous one, is browser autocompletion, that can also trigger the default suffix. So that search queries can end up as queries for <search query>.bankofamerica.com.

Unfortunately, this information is already public. Sensors recording queries and responses sent to authoritative servers are everywhere. Companies such as Cisco and Farsight log everything the see and sell access to their database. This data is stored forever. There are also many free services doing the same. This is very useful for security and marketing purposes.

Even data sent to a resolver that doesn't log may end up in these databases, because the sensors are placed between the resolvers and the authoritative servers, not between the client and the authoritative servers.

So, the consensus in the DNS community, maybe as a way to downplay the fact that DNSSEC doesn't provide any confidentiality, or that names can be brute-forced, has always been that "DNS data should be considered public".

If we agree with that, maybe the definition of "doesn't log" can just be "doesn't log the client IP, even temporarily".

from dnscrypt-resolvers.

irtefa avatar irtefa commented on May 28, 2024 2

Hi,

I am the product manager for the 1.1.1.1 team. I can see why this can be confusing. We don't store anything that can actually tell us how many unique users we have for the public DNS resolver. We do internally sometimes make rough estimates based on the number of queries.

Here's what we actually log:

  • Timestamp
  • IP Version (IPv4 vs IPv6)
  • Cloudflare Resolver IP address + Destination Port
  • Protocol (TCP, UDP, TLS or HTTPS)
  • Query Name
  • Query Type
  • Query Class
  • Query Rd bit set
  • Query Do bit set
  • Query Size
  • Query EDNS enabled
  • EDNS Version
  • EDNS Requested Max Buffer Size
  • EDNS Nsid
  • Response Type (normal, timeout, blocked)
  • Response Code
  • Response Size
  • Records in Response
  • Response Time in Milliseconds
  • Response served from Cache
  • DNSSEC Validation State (secure, insecure, bogus, indeterminate)
  • PoP ID
  • Server ID
  • Autonomous System Number
    (source: https://developers.cloudflare.com/1.1.1.1/commitment-to-privacy/privacy-policy/privacy-policy/)

We will work on making this clearer in our privacy policy.

from dnscrypt-resolvers.

publicarray avatar publicarray commented on May 28, 2024 1

Good question!
I actually never read their policy but from this I'm not sure. I suppose it depends on an individuals thread model. To play it safe I agree we could remove the non-logging label.

Just as a reference here is what I'm doing on my server: https://dns.seby.io/stats.html All this really shows is how the server and the clients are behaving. I'm pretty sure that it's impossible to identify someone from these graphs. This is the only data I have. I use it to see how popular the service is and if I need to take manual action (e.g. when the graphs go down and stay at 0 or sky-rocket and someone is abusing the service)

From my graphs I could get aggregate data on the following :

Timestamp (in a few minute increments)
Query Type
Query Class
Query Rd bit set
Query Do bit set
Query EDNS enabled
Response Code
Response Time in Milliseconds
Response served from Cache
DNSSEC Validation State (secure, bogus)

I don't consider this as logging but I'm technically logging some information so maybe I should remove the no-logging label too? I don't know. It depend on an individuals thread model.

Maybe we should define logging such that if it's possible to identity a unique user or query from the logs it's logging else its non-logging? That definition still doesn't help much though.

For Cloudflare I think they may use unique identifiers to determine unique users in the 24 hour period. Than after 24h they just increment the "Number of unique users" counter. I don't know but I'm speculating. I do think they are pushing the no-logging envelope a bit though.

@jedisct1 What do you think

from dnscrypt-resolvers.

jedisct1 avatar jedisct1 commented on May 28, 2024 1

The information Cloudflare logs doesn't seem to be enough to passively link queries to users, so the Number of unique users mention in their privacy policy is a bit concerning.

Maybe they make an rough estimate based on the number of queries, and the fact that on average, a user makes x queries per day.

Or maybe they temporarily use client IP addresses, independently from the payloads they send and receive, for throttling and DoS mitigation. That can be implemented at any layer, but a firewall rule that prevents a single client IP to send tons of queries in a short time fits in this category. Using client IP addresses that way is probably fine and should not void the "non logging" flag.

Number of unique users in their policy may refer to this.

Rather than speculating, maybe @vavrusa can clarify what exactly gets logged and what Number of unique users refers to?

from dnscrypt-resolvers.

jedisct1 avatar jedisct1 commented on May 28, 2024 1

Thanks a lot for chiming in and for the clarification, Mohd!

So, shall we define "non-logging" as "doesn't log or use the client IP address, except for rate limiting, and without correlation with DNS queries"?

What do you think?

The "non-logging" bit is important, if only because by default, dnscrypt-proxy ignores resolvers having that bit set (and we probably shouldn't change that).

from dnscrypt-resolvers.

publicarray avatar publicarray commented on May 28, 2024 1

Yes Iā€™m happy with that šŸ‘

from dnscrypt-resolvers.

irtefa avatar irtefa commented on May 28, 2024 1

That's correct. We may use the IP address for rate limiting but we don't log them. Furthermore, they are not associated with DNS queries.

from dnscrypt-resolvers.

brainscar avatar brainscar commented on May 28, 2024

Thank you both so much for your responses, I really appreciate the open discussion we're having.

I think this topic goes beyond just cloudflare, and that was not my intention to single them out.

In terms of what is considered logging I think there are at least 3 instances that we're dealing with:

  • no logging.
  • no logging, except for some un-identifying information.
  • not logging ip, but taking the rest to an extreme (in my opinion, cloudflare does this).

Which, begs the question: at what point does it become too much?

I agree with @jedisct1 about this:

Unfortunately, this information is already public.

For example, testing-secret-internal-project.bankofamerica.com could also be found by things like:

  • bruteforcing subdomains
  • checking for issued ssl certs, for example, github has ssl certs for these subdomains: https://crt.sh/?q=%25.github.com. As you can see, some of them are not for end-users.

(sorry @jedisct1 no fish rubbing at github yet.)

So in that sense I would agree with "ip logging is considered logging".

However, I think when we look at the list cloudflare logs, I do believe there is more to worry about than just queries and responses.

And that's where I would love to get your input about @jedisct1 and @publicarray.

You see if the query is public data but the ip address isn't, one could argue:

anyone could have made that request.

However if we look at that list, I don't think that statement applies anymore.
After all, if you narrow it down, that list is essentially an unique fingerprint, which then becomes attached to the query. And that's my concern: being able to put query and person together.

Thank you guys again, I hope we can continue this conversation.

from dnscrypt-resolvers.

irtefa avatar irtefa commented on May 28, 2024

"non-logging" as "doesn't log or use the client IP address, except for rate limiting

Yes. IMO, that's fair.

from dnscrypt-resolvers.

brainscar avatar brainscar commented on May 28, 2024

@irtefa could you please confirm the end of the sentence applies to cloudflare too?

doesn't log or use the client IP address, except for rate limiting, and without correlation with DNS queries.

Then as far as my opinion goes, I'm good with it too, as my only concern left was the one @jedisct1 mentioned here: #128 (comment)

from dnscrypt-resolvers.

captn3m0 avatar captn3m0 commented on May 28, 2024

How about changing "log" to retain?

"doesn't retain the client IP address, except for rate limiting, and without correlation with DNS queries"?

For DoH resolvers, even things like User-Agent + ASN might be enough to identify users. so changing client IP address to "user identifiable information" might be better.

The Mozilla DoH resolver policy takes it up nicely: https://wiki.mozilla.org/Security/DOH-resolver-policy

from dnscrypt-resolvers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.