beryju / gravity Goto Github PK

View Code? Open in Web Editor NEW

412.0 6.0 9.0 13.43 MB

Fully-replicated DNS and DHCP Server with ad-blocking powered by etcd

Home Page: https://gravity.beryju.io

License: GNU General Public License v3.0

Go 51.66% Dockerfile 0.25% HTML 0.24% JavaScript 0.29% TypeScript 44.28% CSS 2.33% Makefile 0.67% Shell 0.28%

dhcp dhcp-server dns-server api etcd replicated webui blocky dns dns-cache

gravity's Introduction

'ello

I build software around Identity and SSO, and also other things sometimes.

I mainly work on authentik, an IDP focused on being easy to use and flexible, and also make a couple tools to test Identity protocols:

Also for some reason I decided to make my own DHCP and DNS Server, Gravity.

I also like to use a lot of IaC workflows for my lab, like infrastructure with Ansible/Puppet/Terraform and k8s with Flux.

gravity's People

Contributors

Stargazers

Watchers

Forkers

xlanor zelaf hmidani-abdelilah xoraingroup yonasbsd wildone69 mvandermeulen daudo l1kw1d

gravity's Issues

pull dns list from git repo periodically

first off, project looks really cool!

Wanting to give this a run but have a requirement that i need to be able to periodically pull a list / multiple lists of DNS entries (for local DNS routing) from one or more private git repos. Would this be possible with Gravity?

DHCP client roaming across scopes doesn't work correctly

First, thanks for your work. I rely on this for an internal DNS resolver and it's great.

I tried migrating my MS DHCP server and ran into trouble.

I have two wireless SSIDs, each which tags clients on VLAN 20 and VLAN 40 respectively. A Ruckus/Brocade switch has a VE on each VLAN and the "helper-address" is set relay DHCP traffic the Gravity instance(s). In Gravity, I have two scopes 10.0.20.x and 10.0.40.x.

Using the hardware MAC:
If I connect a client to VLAN 20, it (successfully) obtains a lease like 10.0.20.100/24 (with router/gateway 10.0.20.1). If I then connect this client to VLAN 40, it fails to get a lease on 10.0.40.x.

If I get the client to use a random MAC, this appears to work successfully.

Is Gravity enforcing that the MAC address is unique across the two scopes? Is there a way to not enforce this, if so?

Can't read config files with night light filter on.

Uhh... Yeah I guess I'll just take some photos of my screen lol

Support for running as recursive resolver

I typically run my top level resolver using Unbound and root hints only, rather than relying on a public DNS resolver like 1.1.1.1 or 8.8.8.8. This can be done at the moment by running Unbound separately and using forward_blocky or forward_ip to direct queries, but it seems like this could be done within CoreDNS using some existing plugins. I'm not sure what the most appropriate way to do this is, but it does appear there is an plugin maintained by the CoreDNS team for Unbound.

DEBUG: true causes web assets to 404

With a brand new install of Gravity in compose.
I set DEBUG: true, then start the container.
Upon loading the web ui, all assets are erroring due to invalid mimetype, due to the page really being a 404 page.

internal error with Websever

Hi
I cant access the webinterface with the compose file.
I edited it to match my infrastructure

Attached my compose file
I changed the network_mode to assig a static ip in my docker network that i manage behind a firewall.
Also tried config it with network_mode: host

---
version: "3.4"

services:
  gravity:
    hostname: gravity1
    image: ghcr.io/beryju/gravity:stable
    restart: unless-stopped
    networks:
      frontend:
         ipv4_address: 10.20.2.20
    volumes:
      - gravity1:/data
    environment:
      INSTANCE_IDENTIFIER: gravity1
      LOG_LEVEL: debug
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"


volumes:
  gravity1:
    driver_opts:
      type: "nfs4"
      o: "addr=10.10.1.210,rw,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs_docker/gravity1"
networks:
    frontend:
      name: vlan-2002
      external: true

Error i get when i start the compose file

2023-09-11T15:23:06.721554771Z DBG | ts=1694445786.7212613 msg=failed to get IPs from interface instance=gravity1 version=0.6.12-7241ad96 error=interface is loopback if=lo 
2023-09-11T15:23:06.721652120Z DBG | ts=1694445786.7214658 msg=Detected IP of instance instance=gravity1 version=0.6.12-7241ad96 ip=10.20.2.20 
2023-09-11T15:23:06.721796468Z DBG | ts=1694445786.721569 msg=failed to get IPs from interface instance=gravity1 version=0.6.12-7241ad96 error=interface is loopback if=lo

When i access the ui

2023-09-11T15:28:10.720966492Z ERR | ts=1694446090.7204232 logger=role.api msg=recover in API handler instance=gravity1 version=0.6.12-7241ad96 error=runtime error: invalid memory address or nil pointer dereference stacktrace=beryju.io/gravity/pkg/roles/api.New.NewRecoverMiddleware.func2.1.1
	/workspace/pkg/roles/api/middleware_recover.go:19
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:914
runtime.panicmem
	/usr/local/go/src/runtime/panic.go:261
runtime.sigpanic
	/usr/local/go/src/runtime/signal_unix.go:861
github.com/getsentry/sentry-go.(*Client).SetSDKIdentifier
	/go/pkg/mod/github.com/getsentry/[email protected]/client.go:573
github.com/getsentry/sentry-go/http.(*Handler).Handle-fm.(*Handler).Handle.(*Handler).handle.func1
	/go/pkg/mod/github.com/getsentry/[email protected]/http/sentryhttp.go:93
net/http.HandlerFunc.ServeHTTP
	/usr/local/go/src/net/http/server.go:2136
beryju.io/gravity/pkg/roles/api.New.NewRecoverMiddleware.func2.1
	/workspace/pkg/roles/api/middleware_recover.go:35
net/http.HandlerFunc.ServeHTTP
	/usr/local/go/src/net/http/server.go:2136
github.com/gorilla/mux.(*Router).ServeHTTP
	/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210
net/http.serverHandler.ServeHTTP
	/usr/local/go/src/net/http/server.go:2938
net/http.(*conn).serve
	/usr/local/go/src/net/http/server.go:2009 
2023-09-11T15:28:10.721068206Z WRN | ts=1694446090.7207634 logger=role.api msg=failed to write error message instance=gravity1 version=0.6.12-7241ad96

Maybe some have a clue and can help.

Docs labels background color

It looks like <pre> tags have been given a background color by accident.

[Enhancement] Verbose DNS Metrics

Hi,

I've been using gravity as my primary DNS server running on k8s with multiple replicas for a while now and it seems to work great so far.

However, it would be nice to have more verbose metrics for DNS queries. I know that the DNS queries are logged with the client IP and the request:

{"level":"info","ts":1700391001.1684859,"logger":"role.dns","msg":"DNS Query","instance":"gravity-dns-1","version":"0.6.17-2219b6b2","runtime":9,"client":"10.2.0.16","response":"NOERROR","queryNames":["mobile.events.data.microsoft.com."],"queryTypes":["A"],"answerRecords":["mobile.events.data.trafficmanager.net.","onedscolprdcus09.centralus.cloudapp.azure.com.","13.89.179.9"],"answerTypes":["CNAME","CNAME","A"]}

but having it in prometheus would be great to visualize DNS queries made per client or DNS queries made per Host etc.

URLs loaded into blocklist rather than URL content after blocky bump

After 7e3b66d
Example: https://pastebin.com/raw/kZScZrgR

[2023-12-19 23:42:45]  WARN list_cache: parse error: line 1: 3 errors occurred:
 * invalid domain name: https://big.oisd.nl/domainswild
 * invalid ip: https://big.oisd.nl/domainswild
 * unsupported wildcard 'https://big.oisd.nl/domainswild': must start with '*.' and contain no other '*'
, trying to continue count=0 source=item #0 of group block: https://big....

Adding "*.testdomain.com" as the blocklists value: https://pastebin.com/raw/ze2JJE11

[2023-12-19 23:41:28]  INFO list_cache: import succeeded count=1 source=item #0 of group block: *.testdomain...
[2023-12-19 23:41:28]  INFO list_cache: group import finished group=block total_count=1

[Feature request] Description fields for DHCP reservations

Some devices (most often IoT devices) provide an empty hostname or some form of random/unique device ID during the DHCP handshake.

For example, my NVIDIA Shield TV provides an empty string and my Marantz amplifier provides a null-separated form of its MAC address.

This can make such devices pretty difficult to identify without additional documentation/IPAM.

It would be really useful to have a description field which is configurable for each reservation and is displayed as a column (optionally, perhaps?) in the leases list for each scope.

Remove/update all related DNS records when DHCP lease is updated/removed

This could be considered an extension of the functionality we spoke about a while ago, implemented in e809e84.

When a DHCP lease is updated or removed, it would be good if the DNS record is also updated or removed accordingly. This seems to work as I'd hope in some cases, but not others.

Does work:

Client updates lease, A record is updated
Lease expires, A & PTR records also expire (as added in e809e84, tbh I haven't monitored this closely, but I'm pretty sure it works as expected)

Doesn't work:

Client updates lease, PTR record is not updated
User deletes lease, A and PTR records remain

To me, the lack of PTR updates is the priority. It's easy enough to clean up records after manual removal of a lease.

Since the removal of PTRs seems to rely exclusively on the expiry time, reverse lookup zones end up looking like this:

Clustering Failed, main instance refuses to start

I have deployed gravity through docker compose and had it working perfectly working for a week :) Thought I would switch my second server to gravity as well and tried to set it up using compose generated from the webui.

The initial clustering ended with both instances erroring out and had to be restarted. After the start, login to the second instance failed with 'unauthorized' and the first instance is stuck in a bootloop.


WRN ts=1707061064.6685095 logger=role.etcd msg=prober detected unhealthy status instance=gravity.ts version=0.8.1-b50a27b6 round-tripper-name=ROUND_TRIPPER_SNAPSHOT remote-peer-id=27d616d355ea12ce rtt=0 error=dial tcp 10.95.26.23:2380: connect: connection refused
WRN ts=1707061064.6685143 logger=role.etcd msg=prober detected unhealthy status instance=gravity.ts version=0.8.1-b50a27b6 round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE remote-peer-id=27d616d355ea12ce rtt=0 error=dial tcp 10.95.26.23:2380: connect: connection refused
WRN ts=1707061068.5682836 logger=role.etcd msg=failed to publish local member to cluster through raft instance=gravity.ts version=0.8.1-b50a27b6 local-member-id=16f01e79800886bf local-member-attributes={Name:gravity.ts ClientURLs:[http://localhost:2379]} request-path=/0/members/16f01e79800886bf/attributes publish-timeout=7000 error=etcdserver: request timed out
WRN ts=1707061069.6695778 logger=role.etcd msg=prober detected unhealthy status instance=gravity.ts version=0.8.1-b50a27b6 round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE remote-peer-id=27d616d355ea12ce rtt=0 error=dial tcp 10.95.26.23:2380: connect: connection refused
WRN ts=1707061069.669593 logger=role.etcd msg=prober detected unhealthy status instance=gravity.ts version=0.8.1-b50a27b6 round-tripper-name=ROUND_TRIPPER_SNAPSHOT remote-peer-id=27d616d355ea12ce rtt=0 error=dial tcp 10.95.26.23:2380: connect: connection refused
WRN ts=1707061074.6700313 logger=role.etcd msg=prober detected unhealthy status instance=gravity.ts version=0.8.1-b50a27b6 round-tripper-name=ROUND_TRIPPER_SNAPSHOT remote-peer-id=27d616d355ea12ce rtt=0 error=dial tcp 10.95.26.23:2380: connect: connection refused
WRN ts=1707061074.6700528 logger=role.etcd msg=prober detected unhealthy status instance=gravity.ts version=0.8.1-b50a27b6 round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE remote-peer-id=27d616d355ea12ce rtt=0 error=dial tcp 10.95.26.23:2380: connect: connection refused

Any help would be appreciated. Btw thank you for building this amazing piece of software.

Resolving mixed or upper case names does not match internal zones of differing case

Demonstration

Gravity logs

{"level":"info","ts":1700050679.4566321,"logger":"role.dns","msg":"DNS Query","instance":"gravity-lon","version":"0.6.16-f5c953a1","runtime":0,"client":"10.52.20.33","response":"NOERROR","queryNames":["www.google.com."],"queryTypes":["A"],"answerRecords":["1.2.3.4"],"answerTypes":["A"]}
{"level":"info","ts":1700050684.4664123,"logger":"role.dns","msg":"DNS Query","instance":"gravity-lon","version":"0.6.16-f5c953a1","runtime":137,"client":"10.52.20.33","response":"NOERROR","queryNames":["www.Google.com."],"queryTypes":["A"],"answerRecords":["216.58.201.100"],"answerTypes":["A"]}

Let me know if there's any other logs/info that'd help.

runtime.errorString: runtime error: invalid memory address or nil pointer dereference

Sentry Issue: GRAVITY-6

runtime.errorString: runtime error: invalid memory address or nil pointer dereference
  File "/workspace/pkg/roles/dns/middleware.go", line 22, in (*Role).Start.(*Role).recoverMiddleware.func4.1
  File "/workspace/pkg/roles/dns/zone.go", line 99, in (*Zone).resolve
  File "/workspace/pkg/roles/dns/dns_handler.go", line 85, in (*Role).Handler
  File "/workspace/pkg/roles/dns/middleware.go", line 72, in (*Role).Start.(*Role).loggingMiddleware.func3
  File "/workspace/pkg/roles/dns/middleware.go", line 36, in (*Role).Start.(*Role).recoverMiddleware.func4
...
(4 additional frame(s) were not displayed)

Conditional DHCP options

Currently dhcp options can only be set statically per scope, which makes it hard to implement things such as PXE boot (different bootfile options depending on client architecture) and other vendor-specific things (unifi access points option 43 when vendor is ubnt)

Clearer documentation for configuration

Hello, just spun up this project to use as an alternative to AdGuard since I want to try something different and would like SSO on top!

I did notice however that the configuration for the DNS Zones are manually done and would like some example configs for configuring the zones. For example, with the Matrix Synapse documentation they show the options in a similar manner to what you have but also have a text block with most of the config options and changed the default ones to what they're not usually set to so you can copy paste and edit and understand the formatting of the config files easier.

I'd put in an example like

- cache_ttl: "3600"
  to: 2620:fe::fe;2620:fe::9;
  type: forward_blocky
  blocklists: https://raw.githubusercontent.com/notracking/hosts-blocklists/master/hostnames.txt;https://raw.githubusercontent.com/lassekongo83/Frellwits-filter-lists/master/Frellwits-Swedish-Hosts-File.txt;
- cache_ttl: "3600"
  to: 2620:fe::fe;2620:fe::9;
  type: forward_ip

I did notice that there was a nice config example for the DHCP config but could also be expanded a tad.

DNS not always resolving

So I've run into an issue and I'm not sure if its something I've not configured correctly or if its a bug.
This particular time (I've had the same issue with other DNS names), when I try and go to www.reddit.com I get a "Hmm. We’re having trouble finding that site." in the browser and from the command line (ping) it says (ping: www.reddit.com: Name or service not known)

The log has entries like this
{"level":"info","ts":1702770267.1026013,"logger":"role.dns","msg":"DNS Query","instance":"dns01","version":"0.7.0-a864f302","runtime":6,"client":"192.168.53.200","response":"NOERROR","queryNames":["www.reddit.com."],"queryTypes":["AAAA"],"answerRecords":["reddit.map.fastly.net."],"answerTypes":["CNAME"]}

So its getting the request, knows that its a CNAME, but doesn't resolve it any further. If I manually ping the CNAME record "reddit.map.fastly.net", I get a response and from the logs it finds the A records

{"level":"info","ts":1702770297.6442485,"logger":"role.dns","msg":"DNS Query","instance":"dns01","version":"0.7.0-a864f302","runtime":10,"client":"192.168.53.200","response":"NOERROR","queryNames":["reddit.map.fastly.net."],"queryTypes":["A"],"answerRecords":["151.101.1.140","151.101.65.140","151.101.129.140","151.101.193.140"],"answerTypes":["A","A","A","A"]}

After that, if I try and ping www.reddit.com, it resolves correctly and I see this in the logs:
{"level":"info","ts":1702770299.9097157,"logger":"role.dns","msg":"DNS Query","instance":"dns01","version":"0.7.0-a864f302","runtime":4,"client":"192.168.53.200","response":"NOERROR","queryNames":["www.reddit.com."],"queryTypes":["A"],"answerRecords":["reddit.map.fastly.net.","151.101.1.140","151.101.65.140","151.101.129.140","151.101.193.140"],"answerTypes":["CNAME","A","A","A","A"]}

Any idea if this is a bug or do I have some misconfiguration?

DNS resolving A records from upstream servers

Just wondering if there was a way of doing something similar to AGH in that you can use an "A" or "AAAA" as a special value in DNS rewrite to retrieve an address from the upstream server rather than using the local DNS value?

DHCP not working on Windows

The DHCP on Windows does not work for me.

Every other device get's a valid lease. But all three of my Windows devices do not.

I have tried everything:

new network driver install
restart network adapter
troubleshoot helper
restart pc
try commands : "netsh winsock reset" and "netsh int ip reset"

I also tried manually acquirering a new ip address with "ipconfig /release" and "ipconfig /renew" over the windows command line

but everytime I tried that I got the error (unfortunatly not usefull): "An error occurred while renewing interface Ethernet : The data is invalid."

I have installed gravity with docker-compose:

version: "3.8"

services:
  gravity:
    container_name: gravity
    hostname: gravity
    image: ghcr.io/beryju/gravity:stable
    restart: unless-stopped
    volumes:
      - gravity:/data
    network_mode: host

    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

volumes:
  gravity:
    external: true

Is this problem known and are there workarounds?

DHCP Relay does not appear to work

It doesn't appear that Gravity works with DHCP relays.

gravity_1  | {"level":"info","ts":1692744765.2445304,"logger":"role.dhcp","msg":"DHCP packet","app":"gravity.beryju.io","instance":"dns01","version":"0.6.10-6ae1a926","request":"aaa6048c-7187-4fbb-b9de-efcf40035c3d-0xa92485f2","deviceIdentifier":"00:50:56:ae:63:fb","opCode":"BootRequest","hopCount":1,"transactionID":"0xa92485f2","flagsToString":"Unicast","clientIPAddr":"0.0.0.0","yourIPAddr":"0.0.0.0","serverIPAddr":"0.0.0.0","gatewayIPAddr":"192.168.202.1","hostname":"","clientIdentifier":"01005056ae63fb","messageType":"DISCOVER","client":"192.168.200.1:68"}
gravity_1  | {"level":"info","ts":1692744765.2449088,"logger":"role.dhcp","msg":"no scope found","app":"gravity.beryju.io","instance":"dns01","version":"0.6.10-6ae1a926","request":"aaa6048c-7187-4fbb-b9de-efcf40035c3d-0xa92485f2"}

Here is my gravity configuration for this scope:

CNAME records acting unlike how I would expect

Hello! First of all, thanks for this project, it seems to be the closest solution yet to the problem I am trying to solve right now.

I have a question about CNAME records that very well could be my own fault, but I cannot figure out: how do I actually use them? I have a zone that Gravity is authoritative for, and have set up some A records that point to various hosts in my network. I would expect to be able to set CNAME records to reference various services, however the CNAMEs never resolve properly. nslookup returns no answer, same with dig. Only way I can get an answer from Gravity is by specifically requesting a CNAME using dig CNAME addr.ess.here, which isn't great since that means the name doesn't work at all in any of my applications.

Example:

Zone: sub.domain.com

(Type -> Name -> Value)
A -> host1 -> 192.168.1.1
CNAME -> gravity -> host1.sub.domain.com

nslookup for gravity.sub.domain.com should result in 192.168.1.1 (canonical name = host1.sub.domain.com)

If more detail is needed please let me know

Reserved IP allocations do not always persist

I have not yet been able to identify steps to reproduce, but it has occurred more than once on different scopes.

A device with an identifier allocated to a reserved IP can be given a different IP from with the scope's range to allocate. When this happens the lease previously marked as "Reserved" disappears from the list.

Workaround is to edit the lease to the desired IP, mark as Reserved and then renew lease on the device, then it is issued the reserved IP.

Working on latest release, docker via docker-compose. 3 nodes.

Happy to help with steps to troubleshoot.

Wrong scope used with unbound and multiple interfaces

The wrong scope may be used when the host has multiple interfaces and the container is set to unbound (INSTANCE_IP: 0.0.0.0 and INSTANCE_LISTEN: 0.0.0.0).

This is because the Instance.IP is used to determine the scope rather than using the IP of the interface that received the request (see the following patch). This bug may be present in other functions, but I did not review any others.

PATCH

--- scopes.go.bug       2024-01-30 05:49:52.000000000 -0500
+++ scopes.go   2024-01-30 18:59:15.000000000 -0500
@@ -99,6 +99,21 @@
        // To prioritise requests from a DHCP relay being matched correctly, give their subnet
        // match a 1 bit more priority
        const dhcpRelayBias = 1
+       // Use the instance ip unless the the interface is not bound
+       ip := extconfig.Get().Instance.IP
+       if req.oob != nil {
+               if ief, err := net.InterfaceByIndex(req.oob.IfIndex); err == nil {
+                       if addrs, err := ief.Addrs(); err == nil {
+                               for _, addr := range addrs {
+                                       if ipv4Addr := addr.(*net.IPNet).IP.To4(); ipv4Addr != nil {
+                                               ip = ipv4Addr.String()
+                                               req.log.Debug("Unbound interface found", zap.String("ifname",  ief.Name), zap.String("ip",  ip))
+                                               break
+                                       }
+                               }
+                       }
+               }
+       }
        for _, scope := range r.scopes {
                // Check based on gateway IP (highest priority)
                gatewayMatchBits := scope.match(req.GatewayIPAddr)
@@ -106,12 +121,12 @@
                        req.log.Debug("selected scope based on cidr match (gateway IP)", zap.String("scope", scope.Name))
                        match = scope
                        longestBits = gatewayMatchBits + dhcpRelayBias
                // Handle local broadcast, check with the instance's listening IP
                // Only consider local scopes if we don't have a match already
-               localMatchBits := scope.match(net.ParseIP(extconfig.Get().Instance.IP))
+               localMatchBits := scope.match(net.ParseIP(ip))
                if localMatchBits > -1 && localMatchBits > longestBits {
-                       req.log.Debug("selected scope based on cidr match (instance IP)", zap.String("scope", scope.Name))
+                       req.log.Debug("selected scope based on cidr match (instance/interface IP)", zap.String("scope", scope.Name))
                        match = scope
                        longestBits = localMatchBits
                }

DHCP Reservations not honored

This issue appears to be the same as #871 and #872, but I wanted to provide my own detailed write up as a new issue.

I was running 2 Windows server VMs with DHCP and DNS services only. I'm a network security engineer, so my lab was mainly focused on my enterprise firewall cluster with multiple protected VLANs. While I could turn on DHCP server on the firewall, it isn't really meant for managing DHCP scopes, so it is configured with DHCP Relay (ip helper) to forward requests to the 2 windows servers. This was all working fine, but I didn't like having to run 2 VMs just to provide these services. Dynamic DNS was also problematic because of the weird IoT device hostnames, so I had to turn that off. I just wasn't a good fit for a homelab, but it was functional.

I recently found a reddit post about Gravity and it seemed to be exactly what I was looking for. I haven't learned much about Kubernetes yet, but I have become familiarized with docker enough to get it to do what I wanted. I initially started out with 1 docker container running gravity with the default configuration from the docs page. Initial DNS testing was perfect. I then configured a DHCP scope for my guest VLAN which was empty at the time. This worked well also. I kept following the docs to export my Windows config, figured out how to get the file copied into docker, then ran the export. I had a few issues with some of the IoT reservation names being blank or having special characters, but I was able to clean those up in the json file and the import was successful. Everything was great up until this point.

Although all my reservations imported successfully, as I rebooted devices to pull a new IP, they did not honor the reservation and pulled a new different IP from Gravity. To make it more confusing, it was still listed as a reservation with the same hostname, but the IP was wrong. I would correct the reservation IP, reboot the device again and then the IP would stick. So I continued with this process on my entire network and had everything correct.

I woke up the next morning to find all my devices having wrong IP addresses again. Some of them had the wrong IP but were given an address from the correct scope, but some were given an IP on the VLAN gravity was on. Some are also given the same IP. This is a huge problem for me because every device on my network has a reservation. My IoT VLAN specifically needs reserved IPs so Home Assistant can turn each device on or off.

I probably should have stopped here, but I really like everything about Gravity, so I fixed everything again. I also wanted to figure out how to give the gravity docker it's own IP address, I got this working by creating a macvlan network in docker and assigning a static IP to the container. Since this worked, I also deployed 2 more gravity containers and clustered them.

The DHCP reservation issue still persists. Because it was a problem before the cluster, I don't think the problem is cluster related. I have my DHCP relays pointing to all 3 gravity container IP addresses and devices always grab an IP. When I manually fix the reservation and have them pull an IP again, it always works. It's just that they randomly grab a new IP address some time later and everything breaks again. I have DHCP turned on for the local network gravity is on and I read that could be a problem with the DHCP master, but I can easily disable DHCP for that VLAN as it is just a fall back for servers that don't use a static IP by default. Although the scope is active, there are no active DHCP clients. If anyone thinks there is a problem with this particular configuration, please let me know and I can change it. I am willing to troubleshoot.

I really want to use gravity, but at this point I need to switch back to my Windows servers. I need DHCP reservations to be reliable and that is not the case right now. I plan to leave the servers up for my Guest VLAN only. Its the only VLAN where I don't need reservations.

Any help is appreciated. This is a great product with some really awesome features. I am also impressed that there is a terraform provider. I'm planning on adding all of my reservations via terraform so I can more easily fix them, but I'm hoping the product becomes more usable for me.

HTTP req to blocklist URL defaults to AAAA recrod address, no failback to A record

On d6148a7

{"level":"info","ts":1703031274.8847294,"logger":"role.dns","msg":"starting blocky async","instance":"gravity-0","version":"0.8.0-d6148a7f","zone":".","handler":"forward_blocky"}
[2023-12-20 00:14:34]  WARN list_cache: Can't download file: Get "https://big.oisd.nl/domainswild": dial tcp [2001:41d0:701:1100::5b10]:443: connect: network is unreachable attempt=1/3 link=https://big.oisd.nl/domainswild
[2023-12-20 00:14:34]  WARN list_cache: Can't download file: Get "https://big.oisd.nl/domainswild": dial tcp [2001:41d0:701:1100::5b10]:443: connect: network is unreachable attempt=2/3 link=https://big.oisd.nl/domainswild

There is no IPv6 connectivity on my instance and it does not seem to attempt the A record address.

DHCP Server not responding

The DHCP server does not look to be responding to any DHCP requests on any member.

Netstat -a does not show anything listening on 67 or 68

alsenior@gravityv2-1:~/gravity$ sudo netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 gravityv2-1.gravit:2380 0.0.0.0:*               LISTEN
tcp        0      0 localhost:2379          0.0.0.0:*               LISTEN
tcp        0      0 gravityv2-1.grav:domain 0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN
tcp        0      0 gravityv2-1.gravit:8009 0.0.0.0:*               LISTEN
tcp        0      0 gravityv2-1.gravit:8008 0.0.0.0:*               LISTEN
tcp        0      0 localhost:38172         localhost:2379          ESTABLISHED
tcp        0      0 gravityv2-1.gravit:2380 192.168.22.30:35790     ESTABLISHED
tcp        0      0 localhost:46090         localhost:2379          ESTABLISHED
tcp        0      0 gravityv2-1.gravit:2380 192.168.22.30:35788     ESTABLISHED
tcp        0      0 gravityv2-1.gravi:53542 192.168.22.30:2380      ESTABLISHED
tcp        0      0 localhost:2379          localhost:38170         ESTABLISHED
tcp        0      0 gravityv2-1.gravit:2380 192.168.22.30:44560     ESTABLISHED
tcp        0      0 gravityv2-1.gravit:2380 192.168.22.20:48902     ESTABLISHED
tcp        0      0 localhost:2379          localhost:38172         ESTABLISHED
tcp        0      0 gravityv2-1.gravit:2380 192.168.22.20:48912     ESTABLISHED
tcp        0      0 gravityv2-1.gravi:39140 192.168.22.20:2380      ESTABLISHED
tcp        0      0 gravityv2-1.gravi:53504 192.168.22.30:2380      ESTABLISHED
tcp        0      0 gravityv2-1.gravit:2380 192.168.22.30:52248     ESTABLISHED
tcp        0    172 gravityv2-1.gravity:ssh 192.168.1.12:18980      ESTABLISHED
tcp        0      0 gravityv2-1.gravi:39164 192.168.22.20:2380      ESTABLISHED
tcp        0      0 localhost:2379          localhost:46090         ESTABLISHED
tcp        0      0 gravityv2-1.gravit:2380 192.168.22.20:48906     ESTABLISHED
tcp        0      0 gravityv2-1.gravi:53514 192.168.22.30:2380      ESTABLISHED
tcp        0      0 localhost:38170         localhost:2379          ESTABLISHED
tcp        0      0 gravityv2-1.gravi:39134 192.168.22.20:2380      ESTABLISHED
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
udp        0      0 gravityv2-1.grav:domain 0.0.0.0:*

I can see the DHCP request come in on the Primary node but only Port unreachable sent in response to the relay

tcpdump: https://pastebin.com/uvZus89m
gravity config: https://pastebin.com/sFfCJHm1

I have Tried upgrading Gravity -> No fix
Rebooting all nodes -> No fix

Support HA DHCP with "leader"

Currently all gravity instances that have the DHCP role configured will listen and reply to DHCP requests. This is not ideal when using multiple gravity instances in a single L2 network, where ideally a single instance should be a "leader" using the etcd keep-alive feature, and then another node can take over if needed.

The somewhat difficult part is figuring out which nodes are in the same L2 network, not sure if we can rely on trying to find the overlap based on INSTANCE_IP, the local interface and netmask and seeing what other nodes have, made especially more difficult since we're running in a container.

DHCP. TTL of leases

v0.6.10

I set ttl to 1440 seconds. I receive a lease from the client with the following parameters:

And in a second I have on the interface

Sometimes the lease period is 30 years. but this is a very rare occurrence.

Edit DHCP Reservation doesn't

Using 0.6.15-30eaa09f
I created a DHCP reservation using F0-2F-74-21-58-22 format, which it seemed to accept and recognize the vendor and such, but when that NIC tried to get a DHCP, it was recognized as f0:2f:74:21:58:22 instead, and not given its reservation. When I go to try and 'edit' the current registration the title of the window that comes up is 'Update Zone', and all of the fields are blank, so there is no way for me to exit existing records.

Running in k8s?

I want to run this in k8s, as that's all infrastructure I have at home. I'd need to loadbalance the correct ports, but not sure which they are? Are they static or dynamic? Does the image really require host networking? Thank you

Blocky version / wildcard lists

The integrated Blocky looks to be on commit 6e69d46c6abc from January: 0xERR0R/blocky@6e69d46.

The mainline now supports wildcard lists, as of this PR: 0xERR0R/blocky#1233

It would be great to have this feature. OISD will no longer support domains and hosts-style lists in 2024. I tried regexp, but the memory usage made it untenable.

StatefulSet clustering in Kube

I have been trying to create a Gravity cluster with a 3 replica StatefulSet, but the methods documented for joining the cluster are not possible in this configuration. Various methods I have tried to manually create the cluster have all failed. I was wondering how difficult it would be to have some environment variable, for instance, that could be set that would bootstrap the cluster.

One thing I have done is set the INSTANCE_IDENTIFIER for each pod dynamically to it's DNS name from a headless service (manifest below), which works, and then clustering manually via etcdctl, but I haven't been able to get that part working.

gravity-statefulset.yml: https://pastebin.com/raw/2AgK5xGV

Cannot run docker compose on raspberry pi

You mention on reddit this should work with a pi but when I run the docker compose stack, as is, I get an error that there is "no matching manifest for linux/arm/v7 in the manifest list entries."

Lease update may result in the client in one scope but assigned an address from another scope

When a client with an existing lease changes networks resulting in a different scope that should apply, the lease will remain in the existing scope, but be assigned an address from the new scope.

This appears to be because the code in leases.go does not update the key for the scope (see the following patch). This bug may relate to existing issues, but I did not review any others.

PATCH

--- leases.go.bug       2024-01-27 17:54:29.312552703 -0500
+++ leases.go   2024-01-27 17:13:13.000000000 -0500
@@ -53,6 +53,7 @@
        if expectedScope != nil && lease.scope != expectedScope {
                // We have a specific scope to handle this request but it doesn't match the lease
                lease.scope = expectedScope
+               lease.ScopeKey = expectedScope.Name
                lease.setLeaseIP(req)
                lease.log.Info("Re-assigning address for lease due to changed request scope", zap.String("newIP", lease.Address))
                go func() {

Add support for wildcard DNS records

due to the current way of looking up keys from etcd, first and 2nd level wildcard records are not supported

Things that should be supported:

Record "*" in zone foo.bar for query baz.foo.bar
Record "*.foo" in zone bar for query baz.foo.bar

Apparent bug in Overview query graph

It appears that sometimes there is an additional data point added in the graph which causes a line to be added back to the bottom left corner of the graph. See the memory and etcd lines below.

Bind to more than 1 ipv4 address

Is there or will there be a way to bind to more than one address? Example is binding to the machine’s static ip as well as a Tailscale ip.

Blocklists do not appear to load after blocky bump

On latest commit 5e76c90 blocklists do not appear to load.

Log: https://pastebin.com/raw/KfWJfzGh

Test config used to produce log: https://pastebin.com/raw/ZV4JVsEr

Ability to remove a node

There doesn't appear to be a way (in the gui at least) to remove an old node from the cluster.

How to change name server ip in DHCP scope ?

How can i change or use different name server ip for dhcp scope or use multiple ips.
I tried this in DHCP options:

- tagName: router
  value: 10.78.0.1
- tagName: name_server
  value: 1.1.1.1

but looks like its ignored. Also is there a way to add other DHCP options like unifi controller etc. ? Or just these ? https://gravity.beryju.io/docs/dhcp/scopes/

thanks

Please give us a dark mode..

The yellow face, it burns us!
It's 2023 and most "dark readers" browser plugins still butcher websites.

Is there any way to get some sort of dark mode for this most excellent software?

slices.Delete

Hello, while researching for this Go proposal I noticed this code:

if role == "backup" {
	slices.Delete(roles, idx, idx+1)
}

which probably doesn't behave as intended because the returned value of slices.Delete is ignored.

Deleting elements inside the range roles loop is tricky to reason about, so I would suggest using slices.DeleteFunc.

Sort discovered devices by IP

The 'Discovered Devices' section would be a lot easier to use if the results were sorted by IP.

DNS blocklits

If the "to:" field for forward_ip and forward_blocky is the same.
The forward_blocky seems to be skipped.

Tested using different DNS server and the forward_blocky starts working again.

This is the default for "root zone" and set everything (.) in the name field.

zone configuration preset
Forwarder (Blocky)

Etc...

Not working

blocklists: https://adaway.org/hosts.txt;https://dbl.oisd.nl/
cache_ttl: "3600"
to: 1.1.1.1:53;1.0.0.1:53
type: forward_blocky
cache_ttl: "3600"
to: 1.1.1.1:53;1.0.0.1:53
type: forward_ip

Working

blocklists: https://adaway.org/hosts.txt;https://dbl.oisd.nl/
cache_ttl: "3600"
to: 1.1.1.1
type: forward_blocky
cache_ttl: "3600"
to: 1.0.0.1
type: forward_ip

DHCP scope option 6 (`name_server`) breaks when using `value64` attribute

When applying the name_server option as shown in the example below, it appears to break the data served to DHCP clients.

- tagName: name_server
  value64:
    - "MTAuNTEuMTAuMjA="
    - "MTAuNTIuMTAuMjA="

For example, dhclient successfully retrieves a lease, but warns: domain-name-servers 2 extra bytes at end of array

The result is that dhclient appears to parse this as 49.48.46.53,49.46.49.48,46.50.48.49,48.46.53.50,46.49.48.46 rather than the configured base64 representation of [10.51.10.20,10.52.10.20].

The following configuration works fine, but means I can only pass a single DNS server to the client (unless I'm unaware of some syntax which makes this possible?).

- tagName: name_server
  value: 10.51.10.20

Note: the issue appears to be present regardless of the number of servers specified (tested with 1-3). The only difference is the warning from dhclient notes a differing number of extra bytes at the end of the array.

dns over tls ?

great work !
will we be able to use gravity for dns over tls resolution ?
thanks

Create a node without etcd

Hi,

I don't know if it's possible, but if I want to create a gravity cluster within a kubernetes cluster that has the etcd port 2380 already occupied on each of the physical nodes, it doesn't start.

It has occurred to me to create the gravity cluster within the kubernetes cluster but without exposing the etcd ports in each node, exposing them internally, so I can create the nodes without problem only by bootstrapping the following services:

gravity1: api;etcd;backup;monitoring;debug;tsdb(ingress)
gravity2:etcd
gravity3:etcd

creating three nodes with the etcd service to form the cluster.

Now I want to create a fourth node that has the following services:

gravity4: dns;dhcp;discovery

and this present it with hostnetwork so that it shows all the services in the physical node.

But when I put those services on it, it just doesn't finish booting, is it possible? if not, any solution about the etcd port already occupied.

DHCP server hands out same IP when multiple clients connect simultaneously

I have seen this happen a few times now, on more than one scope.

It appears to happen if more than one device is, for instance, restarted or added to the network at the same time.

The devices are listed separately under the scope, with distinct identifiers/ MAC addresses, different host names, but the same IPs.

Have seen this with up to 3 devices being given the same IP.

Workaround is to delete the 2nd, 3rd etc leases and prompt those devices to renew their DHCP lease. Then they are allocated a unique IP each.

I have three nodes set up.

Latest releases, running in docker using docker-compose.

Happy to supply any further info to help reproduce.

DNS bind to IPv6

Noticing that there’s not a way for binding to ipv6. Can that be specified in the docker-compose or is it not supported at the moment?

Better blocklist management

The DNS blocking support is currently very rudimentary. As far as I can tell there is no (easy) way to:

Add individual domains to a custom blacklist
Whitelist domains
Determine which list is blocking a particular request

It's also rather difficult to manage lists, all being in a single line in the configuration. My suggestion would be to rework the configuration as such:

Blocklists are an array rather than a single line separated by semicolons
Two more arrays, perhaps domains_allowed and domains_blocked, are available for adding individual domains

It appears blocky already supports this, so it should (mostly) be a matter of just passing these parameters on appropriately.