Giter Club home page Giter Club logo

tunneldigger's Introduction

Tunneldigger

L2TPv3 VPN tunneling solution

About

Tunneldigger is one of the projects of wlan slovenija open wireless network. It is a simple VPN tunneling solution based on the Linux kernel support for L2TPv3 tunnels over UDP.

Tunneldigger consists of a client and a server portion.

The client is written in C for minimal binary size and optimized to run on embedded devices such as wireless routers running OpenWrt.

The server portion, referred to as the broker, is written in Python.

Installation and Use

Information on set up and use of Tunneldigger can be found in the documentation:

https://tunneldigger.readthedocs.org/

Source Code and Issue Tracker

Development happens on GitHub and issues can be filed in the Issue tracker.

License

Tunneldigger is licensed under AGPLv3.

Contributions

We welcome code and documentation contributions to Tunneldigger in the form of Pull Requests on GitHub where they can be reviewed and discussed by the community. We encourage everyone to check out any pending pull requests and offer comments or ideas as well.

Tunneldigger is developed by a community of developers from many different backgrounds.

You can visualize all code contributions using GitHub Insights.

tunneldigger's People

Contributors

fungur69 avatar holymoly avatar kaechele avatar kostko avatar lynxis avatar max-b avatar mitar avatar niccokunzmann avatar papazoga avatar pmelange avatar ralfjung avatar robwei avatar rohammer avatar valentt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tunneldigger's Issues

The client can get stuck in a high-frequency retry loop despite working brokers

Sometimes a client seems to be stuck in a high-frequency retry loop, establishing a new connection every 2-5s and immediately abandoning it. #143 is resolved, which means the servers do not have heavy load from such a loop any more, but the question remains what is going on with those clients.

Unfortunately, so far I was not able to acquire a log from one of the affected nodes. @kaechele you said you also experienced this problem; were/are you able to get a logfile from the problematic node?

Frequent reconnection of clients

I have the tunneldigger-broker running on openwrt (x86-64 Virtual Machine) as a test of the latest commit to tunneldigger-broker and so far we have 2 routers connecting to the broker. The first router is doing fine, with a stable connection. The second unfortunatly is reconnecting constantly.

root@b-bbb-bvpn:~# logread | grep Closing
Thu Aug 10 19:37:36 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 101 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 18 seconds (reason=0x5)
Thu Aug 10 19:38:48 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 102 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 12 seconds (reason=0x5)
Thu Aug 10 19:40:09 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 103 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 19 seconds (reason=0x5)
Thu Aug 10 19:41:21 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 104 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 8 seconds (reason=0x5)
Thu Aug 10 19:42:39 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 105 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 15 seconds (reason=0x5)
Thu Aug 10 19:44:03 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 106 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 22 seconds (reason=0x5)
Thu Aug 10 19:45:22 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 107 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 20 seconds (reason=0x5)
Thu Aug 10 19:46:30 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 108 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 6 seconds (reason=0x5)
Thu Aug 10 19:47:58 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 109 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 25 seconds (reason=0x5)
Thu Aug 10 19:49:08 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 110 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 8 seconds (reason=0x5)
Thu Aug 10 19:50:32 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 111 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 21 seconds (reason=0x5)
Thu Aug 10 19:51:50 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 112 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 17 seconds (reason=0x5)
Thu Aug 10 19:53:15 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 113 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 23 seconds (reason=0x5)
Thu Aug 10 19:54:35 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 114 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 20 seconds (reason=0x5)
Thu Aug 10 19:55:59 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 115 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 23 seconds (reason=0x5)
Thu Aug 10 19:57:25 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 116 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 25 seconds (reason=0x5)
Thu Aug 10 19:58:52 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 117 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 28 seconds (reason=0x5)
Thu Aug 10 20:00:23 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 118 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 28 seconds (reason=0x5)
Thu Aug 10 20:01:33 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 119 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 11 seconds (reason=0x5)
Thu Aug 10 20:02:51 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 120 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 26 seconds (reason=0x5)
Thu Aug 10 20:03:57 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 121 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 6 seconds (reason=0x5)
Thu Aug 10 20:05:11 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 122 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 17 seconds (reason=0x5)
Thu Aug 10 20:06:33 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 123 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 19 seconds (reason=0x5)
Thu Aug 10 20:07:59 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 124 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 24 seconds (reason=0x5)
Thu Aug 10 20:09:22 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 125 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 21 seconds (reason=0x5)
Thu Aug 10 20:10:34 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 126 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 12 seconds (reason=0x5)
Thu Aug 10 20:11:51 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 127 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 15 seconds (reason=0x5)

And just one failed connection with all associated log entries

Thu Aug 10 20:11:35 2023 daemon.err python[3148]: [INFO/tunneldigger.broker] Creating tunnel (b6:e5:b7:75:f8:b4:6d:3d:9f:41) with id 127.
Thu Aug 10 20:11:35 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Set tunnel 127 MTU to 1446.
Thu Aug 10 20:11:35 2023 daemon.err python[3148]: [INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
Thu Aug 10 20:11:35 2023 daemon.notice netifd: bridge 'br-digger1446' link is up
Thu Aug 10 20:11:35 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity
Thu Aug 10 20:11:35 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/send_redirects
Thu Aug 10 20:11:35 2023 daemon.info olsrd[2611]: Writing '0' (was 0) to /proc/sys/net/ipv4/conf/br-digger1446/rp_filter
Thu Aug 10 20:11:35 2023 daemon.info olsrd[2611]: Adding interface br-digger1446
Thu Aug 10 20:11:36 2023 kern.info kernel: [ 9945.787796] br-digger1446: port 1(l2tp127-127) entered blocking state
Thu Aug 10 20:11:36 2023 kern.info kernel: [ 9945.787799] br-digger1446: port 1(l2tp127-127) entered disabled state
Thu Aug 10 20:11:36 2023 kern.info kernel: [ 9945.787834] device l2tp127-127 entered promiscuous mode
Thu Aug 10 20:11:36 2023 kern.info kernel: [ 9945.787863] br-digger1446: port 1(l2tp127-127) entered blocking state
Thu Aug 10 20:11:36 2023 kern.info kernel: [ 9945.787864] br-digger1446: port 1(l2tp127-127) entered forwarding state
Thu Aug 10 20:11:51 2023 daemon.err python[3148]: [INFO/tunneldigger.tunnel] Closing tunnel 127 (b6:e5:b7:75:f8:b4:6d:3d:9f:41) after 15 seconds (reason=0x5)
Thu Aug 10 20:11:51 2023 daemon.err python[3148]: [INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
Thu Aug 10 20:11:51 2023 kern.info kernel: [ 9960.798393] device l2tp127-127 left promiscuous mode
Thu Aug 10 20:11:51 2023 kern.info kernel: [ 9960.798428] br-digger1446: port 1(l2tp127-127) entered disabled state
Thu Aug 10 20:11:51 2023 daemon.notice netifd: bridge 'br-digger1446' link is down
Thu Aug 10 20:11:51 2023 daemon.notice netifd: Interface 'digger1446' has link connectivity loss
Thu Aug 10 20:11:51 2023 daemon.info olsrd[2611]: Removing interface br-digger1446

Unfortunately I do not have physical access to the failing router (belongs to @Noki). What can we do to get more information to help debug this?

This issue is what I was mentioning in #126 (comment) and @kaechele suggested opening a new issue.

tunneldigger-broker: connection fails with `Error: Invalid handle.`

Hi, I'm just trying to upstream the packages to openwrt/packes:
openwrt/packages#21308

it seems to work fine until I connect to the broker on localhost (I currently just use 1 router) and the broker just throws the error "Error: Invalid handle.":

daemon.err python[13789]: [INFO/tunneldigger.tunnel] Set tunnel 103 MTU to 1280.
daemon.err python[13789]: [INFO/tunneldigger.hooks] Running hook 'session.up' via script '/usr/lib/tunneldigger-broker/hooks/setup'.
daemon.err python[13789]: [INFO/tunneldigger.limits] Setting downstream bandwidth limit to 1024 kbps on tunnel 103.
daemon.err python[13789]: Error: Invalid handle.

Any idea what's the reason or how to fix this? I can not even find where the error printing is happening in the code.

Failed to create tunnel while processing prepare request

After successfully establishing a single tunnel (sometimes none), I get these errors and hundreds of dead tunnel interfaces:

May 30 13:54:26 gateway01.hw.freifunk.net python2[975]: [INFO/tunneldigger.broker] Creating tunnel (30b5c2382c36) with id 129.
May 30 13:54:26 gateway01.hw.freifunk.net systemd-udevd[2118]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]: [ERROR/tunneldigger.broker] Unhandled exception while creating tunnel 129:
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]: [ERROR/tunneldigger.broker] Traceback (most recent call last):
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]:   File "/usr/lib64/python2.7/site-packages/tunneldigger_broker/broker.py", line 115, in create_tunnel
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]:     tunnel.setup_tunnel()
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]:   File "/usr/lib64/python2.7/site-packages/tunneldigger_broker/tunnel.py", line 156, in setup_tunnel
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]:     raise TunnelSetupFailed
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]: TunnelSetupFailed
May 30 13:54:26 gateway01.hw.freifunk.net python2[975]: [WARNING/tunneldigger.protocol] Failed to create tunnel (30b5c2382c36) while processing prepare request.

Understanding tunneldigger vs ip l2tp

I'm playing around with the ip l2tp tool, trying to understand how this stuff works. I thought I would do a simple test:

sudo ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 encap udp local 1.2.3.4 remote 5.6.7.8 udp_sport 5000 udp_dport 6000

And I see:

RTNETLINK answers: Cannot assign requested address

What does this RTNETLINK error mean? It seems like it shows up in a lot of cases (I sometimes see it when using tunneldigger too), but I don't have a good understanding of what it really /means/. In this case, I'm thinking my local and remote ips might be unacceptable?

Looking at how tunneldigger creates a tunnel, it passes a socket into create_tunnel, and does not explicitly pass in any local and remote addresses and ports. I haven't dug in any deeper yet, but my hunch is that this socket already describes the those addresses and ports. Is that right?

If there's a better place to post questions like this, please let me know! Thanks.

Newer Kernels log error "recv short packet" for every broker packet

This is due to the fact that the control packets we use to communicate with the broker are shorter than 14 bytes (the maximum header size of an L2TP packet).
Recently (I upgraded from 5.9.16 to 5.10.11, haven't had time to pinpoint exactly) the kernel throws a warning in dmesg when a short packet is received leading to lots of spam in dmesg: https://github.com/torvalds/linux/blob/master/net/l2tp/l2tp_core.c#L811 and torvalds/linux@5ee759c

Functionality is not impacted because the error case for this scenario is to forward the packet to user space where the broker is able to pick it up and process it as usual.
The only thing that changed is that the warning is now visible in dmesg.

My first idea for a fix would be to pad our control packets to 14 bytes. Thoughts?

Sample error:

[  630.674133] l2tp_udp_recv_core: 20 callbacks suppressed
[  630.683240] l2tp_core: tunl 100: recv short packet (len=12)

Some Kernels have broken SO_REUSEPORT handling

This bug is meant for future reference.

Newer versions of the Tunneldigger broker use SO_REUSEPORT to process multiple tunnels on one single port. In the past Tunneldigger used a NAT-based workaround to make this work. To simplify the code and remove unnecessary dependencies this workaround was removed.
Unfortunately there are several kernel bugs that prevent SO_REUSEPORT for UDP sockets from working properly, that are only fixed in fairly recent kernels.
This means that the change in conjunction with the bug has some peculiar implications for which Kernel versions can be used for brokers. (Tunneldigger clients are unaffected by all of this.)

Kernel versions 5.10.152 and newer exhibit the correct behaviour and should work.

You have probably landed here because you still use an older Linux distribution or haven't updated to a working Kernel version. If you are experiencing this issue you have two options:

  1. Update your kernel to a supported version or upgrade your distribution to one that has a supported kernel version. In particular, Fedora 35 and newer as well as Debian 11 (Bullseye) and newer with the latest updates applied should work.
  2. If you cannot upgrade the kernel, switch to the legacy branch that still carries the NAT hacks.

Kernel fixes

For the curious among you, the two fixes that are needed are:

TC/Traffic Control does not always work

A user is reporting that traffic control is not working for them. I found this in the logfile:

Dez 23 19:20:44 gw4.saar.freifunk.net tunneldigger[3089]: [INFO/tunneldigger.limits] Setting downstream bandwidth limit to 40000 kbps on tunnel 331.
Dez 23 19:20:44 gw4.saar.freifunk.net tunneldigger[3089]: Error: Invalid handle.
Dez 23 19:20:44 gw4.saar.freifunk.net kernel: HTB: quantum of class 10001 is big. Consider r2q change.

The "invalid handle" is harmless, as far as I know (see #154). However, that other error about "quantum of class" is new.

Unfortunately I know basically nothing about TC on Linux -- I don't have any clue what "quantum of class" or "r2q change" mean, and maybe they have nothing to do with the problem.

GSoC: Should tunneldigger participate

If you would like tthat tunneldigger participates with some ideas in GSoC this year, consider adding those ideas to this list of projects. You could list it under wlan slovenija network or some other one, whatever you prefer.

So main question is if there is anyone willing to mentor somebody else. And what are those ideas they could work on. Maybe better instrumentation, some better debugging/status printing, some things like that?

If you already know of anyone with student status who would be interested in contributing and getting payed for that a bit, they can also apply and this is valid as well.

Proposal: Broker usage check on reconnect

Right now it seems that the usage/selection process on reconnect gets only triggered after 15-20s.
While i understand that this was probably introduced to improve reconnect time after dialup/connection loss.

This behaviour leads to a problem in case you have atleast two brokers running on different machines.
If you do maintenance on one of them and shutdown the broker for a while, the clients will move to the
remaining broker. When you finish maintenance and start the broker, you will have to shutdown all other
brokers for atleast 30-40s to trigger selection on all clients, otherwise the clients wont move to the other brokers.

Would it be possible to have broker selection on every reconnect or would it be better to have the broker send a "restarted"
signal to trigger selection on the client ?

Kindly Regards

Documentation about netfilter

The documentation states:

In addition the kernel must support network address translation via netfilter, otherwise the tunnels will not work as Tunneldigger uses translation to achieve that all tunnels operate over the same external port.

Here, I read that this is included in Kernel 2.4.x+ to have this if we have iptables installed.

netfilter.org is home to the software of the packet filtering framework inside the Linux 2.4.x and later kernel series. Software commonly associated with netfilter.org is iptables.

As I am new to the project in such internals, I am a bit unsure: If I have iptables on my computer, will it work? Will it work if I just have a kernel later than 2.4.x?

Recent kernels enforce system-wide unique L2TPv3 session IDs

When running tunneldigger on the current Debian stable kernel (4.9.51), only one client can connect. The second client fails because the l2tp tunnel interface does not appear. After fixing a bug in the netlink interface (#50), one can see that the kernel sends an EEXIST in reply to the session_create.

A lot of digging through the linux kernel sources uncovered the source of the issue: L2TPv3 session IDs have to be unique system-wide. Tunneldigger hard-codes a session ID of 1 for every connection. That used to work due to a bug in the kernel, which meant that the kernel failed to actually ensure uniqueness of the session ID. That bug got fixed by https://github.com/linux-stable/linux-stable/commit/dbdbc73b44782e22b3b4b6e8b51e7a3d245f3086, which was backported to a few stable series, in particular, to 4.9.36.

Proposed fix

Fixing this in a compatible way will require protocol changes: Both ends of the tunnel have to know each others session ID, so they have to negotiate whether they use 1 or something more unique. I started working on a fix at https://github.com/freifunk-saar/tunneldigger/tree/wlanslovenija. The approach is summarized in the commit message over there, copied here for reference:

This patch adds unique session IDs to tunneldigger in a backwards-compatible
way.  If both ends of the tunnel agree to use a unique session ID, they both
will use the tunnel ID as the session ID.  To manage this mutual agreement, two
messages in the protocol are changed:

CONTROL_TYPE_PREPARE gains a new optional byte at the end that clients use to
indicate to the server whether they want to use a unique session ID.  Old
servers will just ignore this additional byte.  New servers now know they are
talking with a modern client, and use unique session IDs for this connection.
New servers talking with old clients will notice the absence of this request and
use 1 as the session ID.

Furthermore, CONTROL_TYPE_TUNNEL gains a new optional byte at the end that
servers use to tell clients that they acknowledge using unique session IDs.  Old
clients will never see this additional byte, as the server only sends it if
unique session IDs were requested in CONTROL_TYPE_PREPARE.  New clients know,
upon seeing this byte, that they are talking to a new server, and will hence use
unique session IDs.  If a new client talks to an old server, it will receive an
old-style CONTROL_TYPE_TUNNEL and hence know that it has to use session ID 1.

So, both old a new clients can talk with both old and new servers.  However, of
course, if the server has a recent enough kernel, even though it can communicate
with old clients, it still can only support one old client at a time.

I am running two of our four servers with this fix, so compatibility with old clients is tested already. However, due to #55, I can't say anything about long-time stability yet. I also couldn't yet test new clients as I am still fighting my firmware build system. (The client uses such an ancient version of libnl that I can't build it on the host.)

Open problems

As the last paragraph in the commit description says, there still is a potential problem: Once we upgrade one of our servers to a kernel including the problematic bugfix, only one old client will be able to talk to it at a time. There is nothing we can do about this, but what want to avoid is a client trying to connect to a new server and failing, while there are old servers (with higher usage) that could still support this client. I first tried to (ab)use CONTROL_TYPE_USAGE to let the client indicate whether it supports unique session IDs, so that the server could report "I am full" to old clients and steer them elsewhere. However, clients actually seem to send some rather arbitrary data alongside that message (UUUUUUUU, to be precise -- wtf?!?), so I am worried that attaching meaningful bytes here will not work very well. We could introduce a CONTROL_TYPE_USAGE2, but I think I have a better idea.

Clients already have a retry loop to connect again if the connection to the broker failed. I think clients should remember which broker failed, and exclude that one in the next round. Only once all brokers got excluded that way, they will be enabled again. This will, I think, improve client behavior in general, not just for this particular issue. It will also solve this issue as (after #50), brokers will send an error to clients when the session ID is already used, making the client try some other server. So, as long as one of the available brokers still has an old kernel, old clients will reliably be able to connect. Furthermore, even if all servers are on a new kernel, there can still be N old clients connected at the same time (and hopefully, they will fetch an auto-update and then become new clients).

I started implementing this, but got stuck yesterday due to the aforementioned build system issues.

How to route packets once tunnel is setup

Hey, there. I'm trying to test tunneldigger by tunneling from one linux box to another (no mesh network involved). The tunnel itself seems to establish properly, but packets aren't routing as I expect them to.

The l2tp0 interface gets created on the client, and a corresponding l2tpxxx-xxx interface gets created on the broker. Once I bring both interfaces up ip link set <if-name> up, I can send pings into one interface and see ARPs come out the other:

# on the client
ping -I l2tp0 8.8.8.8
# on the broker
tcpdump -i l2tp101-101
00:36:29.925663 ARP, Request who-has google-public-dns-a.google.com tell <client-ip>, length 28

These ARPs don't seem to go anywhere else though. I must be missing some routing config to get these packets out to the internet. I guess I'm wondering, what's the minimum config (presumably in session hooks) I need to be able to test tunneldigger in this way?

I guess this is more of a basic networking question than a tunneldigger question.

tunneldigger dies somtimes

Original bug report from @Sunz3r under freifunk-gluon/gluon#1188


If internet lost connection (like reboot of router or plug off lan-cable) then the tunneldigger sometime dies.After strace the tunneldigger i found that a assert is triggered: https://github.com/wlanslovenija/tunneldigger/blob/master/client/asyncns.c#L884https://pastebin.com/R6CQmGXr


I was not able to report this bug on there site ...

this bug happens when dns-resolution failed and despite sending data through a socket:

 td-client: Failed to send() control packet (errno=89, type=1)!

when dns-resolution suddenly works in this situation then the tunneldigger will crash.

I think there is a bug in the broker selection. The selector should wait for successful address-resolution.

To workaround this bug you can use ip's instead of names.

tunneldigger broker Puppet module (install and config)

I have created a Puppet module for installing and configuring the broker on the servers used by freifunk-berlin. I tried to make the config as true as possible to the default values in this repo.

Maybe this is useful for wlanslovenija.

https://github.com/freifunk-berlin/puppet-tunneldigger

And an example of how we are using this module can been seen https://github.com/freifunk-berlin/puppet-communitytunnel and https://github.com/freifunk-berlin/puppet-bbbdigger

[Docs] max_tunnels

Heiho,

for configuration, users choose an upper limit for max_tunnels
https://github.com/wlanslovenija/tunneldigger/blob/master/broker/l2tp_broker.cfg.example#L9

From a UX perspective, presenting a configuration option requires the user to choose a valid value.
Nevertheless, neither the comment in the sample config nor https://tunneldigger.readthedocs.org/ help to do so

IMHO the docs should explain, in what way platform (CPU cores, memory, etc.) and kernel perameters (i.e. sysctl.conf) influce reasonable configurations.

If 1024 is a sane configuration in almost all situations - I've no experience, here - it will be helpful to document this.

MTU auto-detection does not work properly

Ever since we upgraded two of our machines from the old (pre-rewrite) tunneldigger to current master, we have MTU issues on those servers. With the old setup, we used to disable the automatic MTU discovery, setting the MUT to a pretty conservative (low) value of 1406 manually.

I have no experience debugging MTU issues, so for now, I am going to re-implement the option to disable MTU discovery. But that's not solving this issue.


In case that matters, here are some more details of our setup: We are adding all the tunnels to a single bridge. We set that bridge's MTU to 1406 after it is created; however, as far as I understand, that value does not have any effect once a device is added to the bridge.

I am aware that to use the MTU properly, we would need one bridge per MTU. However, putting them all into the same bridge will use the minimum MTU, so things should still work, right?

Get it working with IPv6

It would be really great to be able to use native IPv6 on the WAN side, since more and more broadband-consumers get "IPv6 + DSlite".
Meaning that IPv4 is only "via CarrierGradeNat/CGN", where we have to observe very often strange effects like dying connections and/or packet loss. While IPv6 is working fine.

As far as i understand, it's not a simple job like for IPv4, since the packet header sizes are not that static in V6, but it would be really helpful, even it would end up as a total rewrite.

Change rate limiting to be per-UUID

@RobWei recently contributed per-IP rate limiting. I was excited to try and see if this helps mitigate #143, but unfortunately that is not the case: the bad node reconnects with a rate that is too low for reasonable rate limiting, in particular when considering that people could run multiple nodes under the same IP.

Hence I propose to change that rate-limiting to be per-UUID instead of per-IP. The motivating log in #138 shows the same UUID over and over, so this approach should still help @RobWei, but at the same time it means we can set the rate limit much lower because we do not have to worry about the same IP being used by multiple clients.

TC/Traffic Control: Error: Invalid handle.

OS: CentOS 8
TC: iproute-tc 5.3.0-1.el8
Tunneldigger: Master (installed 1 week ago)

Problem:
If a node with Traffic Limits connects to the tunnelbroker i see this inside the logs:

Okt 08 09:14:45 broker2.ff-en.de python[1699]: [INFO/tunneldigger.limits] Setting downstream bandwidth limit to 5000 kbps on tunnel 61000.
Okt 08 09:14:45 broker2.ff-en.de python[1699]: Error: Invalid handle.

TC Output also looks different.

CentOS 8 / New Tunneldigger

qdisc fq_codel 0: dev l2tp58025-58025 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp58026-58026 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp58027-58027 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp81062-81062 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp58030-58030 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp58031-58031 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp58033-58033 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp22068-22068 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp58042-58042 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp58043-58043 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75004-75004 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75011-75011 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75017-75017 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75024-75024 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75033-75033 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75035-75035 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75041-75041 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75050-75050 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75052-75052 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75056-75056 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75063-75063 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75064-75064 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75065-75065 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75068-75068 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75081-75081 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev l2tp75083-75083 root refcnt 2 limit 10240p flows 1024 quantum 1434 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

Ubuntu 16.04 / Old Tunneldigger (2017/18)

qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev ens160 root
qdisc pfifo_fast 0: dev ens160 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ens160 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc mq 0: dev ens192 root
qdisc pfifo_fast 0: dev ens192 parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ens192 parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc noqueue 0: dev br-ha root refcnt 2
qdisc pfifo_fast 0: dev tun-map root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc noqueue 0: dev bat-ha root refcnt 2
qdisc pfifo_fast 0: dev l2tp10781 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10611 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10861 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp11071 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10421 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10711 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10921 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10601 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10871 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10821 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10961 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10441 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10031 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10011 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10301 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev l2tp10841 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Is there a specific version of TC we have to install, to get this working again?

Regards

EDIT: Old rules stay forever and wont get removed.

Client behavior on broker restart is suboptimal

Currently, if the broker is restarted (i.e., sent a SIGINT and then started again), the following happens on the client:

  • It starts out in STATE_KEEPALIVE, receives the error, switches to STATE_REINIT
  • In context_process, this leads to calling context_reinitialize and then context_start_connect. In particular, standby_only gets set to true and the state switches to STATE_GET_COOKIE.
  • The client sends a cookie request, and receives the response from the server that has come back up in the mean time.
  • Upon receiving the reply, the client notices that standby_only is set, and does nothing.
  • 30 seconds later, the main loop notices that the client has not been connected for 30 seconds, and restarts full broker selection, which eventually selects a sensible broker again.

This is probably not the intended behavior. However, I have a hard time figuring out the intended behavior, because there are two things in the main loop for detecting an error, and two things the other code does in case of error: The main loops checks for STATE_REINIT, and it starts a timer if the state is not STATE_KEEPALIVE. The other code sometimes goes to STATE_GET_COOKIE, sometimes to STATE_REINIT -- but in the latter case, context_process gets called before the main loop notices this state.

My inclination is to: (a) remove the STATE_REINIT check in the main loop; it doesn't work anyway; instead maybe the main connection timer should be lowered so that we get away from a bad broker faster. (b) make it so that when we receive an error in a running tunnel, we stick with this broker (like we do now), but do not set standby_only (which is clearly wrong). I think this requires another parameter to context_reinitialize saying whether the new context should be standby-only or not.

Failed to send() control packet

I have just noticed this message for the first time in tunneldigger client logs:

Mon May 25 20:04:21 2020 daemon.warn td-client: Failed to send() control packet (errno=1, type=5)!

errno 1 seems to be EPERM. The error is evidently printed here:

syslog(LOG_WARNING, "[%s:%s] Failed to send() control packet (errno=%d, type=%x)!",

Type 5 is CONTROL_TYPE_KEEPALIVE. The message appeared twice, then not any more... but still, this seems odd? @kaechele have you ever seen this?

teardown script crashes tunneldigger-broker

Sometimes I am able to crash tunneldiger-brooker with the teardown script:

Sun Jun 11 09:02:54 2023 daemon.err python[4535]: [INFO/tunneldigger.hooks] Running hook 'session.pre-down' via script '/usr/lib/tunneldigger-broker/hooks/teardown'.
Sun Jun 11 09:02:57 2023 daemon.err python[4535]: [INFO/tunneldigger.broker] Creating tunnel (abcd) with id 104.
Sun Jun 11 09:02:57 2023 kern.info kernel: [317719.482982] do_page_fault(): sending SIGSEGV to python for invalid read access from 00000004
Sun Jun 11 09:02:57 2023 kern.info kernel: [317719.500022] epc = 77c2b455 in libpython3.11.so.1.0[77aba000+259000]
Sun Jun 11 09:02:57 2023 kern.info kernel: [317719.512813] ra  = 77c2c16b in libpython3.11.so.1.0[77aba000+259000]

Broker crashes, running out of file descriptors

After 5-6h of uptime, the tunneldigger broker quits with the following error:

Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: Traceback (most recent call last):
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: "__main__", fname, loader, pkg_name)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: exec code in run_globals
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/main.py", line 113, in <module>
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: event_loop.start()
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/eventloop.py", line 59, in start
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: pollable.read(file_object)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/network.py", line 98, in read
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: callback()
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/tunnel.py", line 230, in pmtu_di
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: self.create_timer(self.pmtu_discovery, timeout=random.randrange(2, 5))
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/network.py", line 83, in create_
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: timer = timerfd.create(timerfd.CLOCK_MONOTONIC)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/timerfd.py", line 117, in create
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: ret = libc.timerfd_create(clock_id, flags)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/timerfd.py", line 103, in errche
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: raise OSError(errno, os.strerror(errno))
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: OSError: [Errno 24] Too many open files

This has now happened twice. The first time, there were 150 tunnels connected constantly. The second time, the number of tunnels slowly went up from 0 to 70 before the crash.

setup.py is deprecated

Seems like what we are doing with setup.py is deprecated. I see warnings like this when running setup.py install:

Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]: /opt/tunneldigger/lib/python3.7/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]: !!
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         ********************************************************************************
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         Please avoid running ``setup.py`` directly.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         Instead, use pypa/build, pypa/installer or other
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         standards-based tools.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         ********************************************************************************
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]: !!
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:   self.initialize_options()
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]: /opt/tunneldigger/lib/python3.7/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]: !!
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         ********************************************************************************
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         Please avoid running ``setup.py`` and ``easy_install``.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         Instead, use pypa/build, pypa/installer or other
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         standards-based tools.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         See https://github.com/pypa/setuptools/issues/917 for details.
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:         ********************************************************************************
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]: !!
Aug 15 20:55:51 gw1.saar.freifunk.net tunneldigger[98775]:   self.initialize_options()

Strangely I only see them on the server with Python 3.7. The server with Python 3.9 shows no warning...

Also the deprecation warning doesn't tell me what to do instead. :/ python setup.py install got replaced by... what?

Hotplug style waiting for an interface to go up.

I am using the tunneldigger client in openWrt (freifunk-berlin firmware). We expect situations where people install the firmware while meshing islands through a tunnel. It is therefore important that tunnels are not made over the mesh network and only though the wan interface. Thus we want to bind to the interface.

An issue arrises when router is not connected to a wan and only meshes. The wan is not connected to anything, yet tunneldigger tries to make a connection anyhow. There were 80 log entries in 10 seconds, which repeat five seconds later.

Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.info td-client: Performing broker selection...
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:59 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:59 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:59 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:02 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:02 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:02 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:02 2018 daemon.err td-client: No suitable brokers found. Retrying in 5 seconds

I see two possible solutions.

  1. Change l2tp_client.c at line 434 to wait for a timeout when the bind fails and try again. And without it being an error.
  2. Change the init to only start those which do not bind. Create a hotplug script to handle those which do bind.

I am willing to make the changes, but I want to know what the development group thinks before I go making changes.

Silence `tc` output when `ignore_fails` is true

The traffic_control code uses the tc command to interact with the Linux kernel traffic shaping infrastructure. The way we run it leads to harmless but confusing errors being printed in the log (see e.g. #167). It would be better to suppress tc output when errors are harmless (as indicated by the ignore_fails flag).

Broker: Wait for interface to have an IP before listening

If the configured port gets it's IP addr per DHCP, then it may not be configured by the time the broker starts.

Here is a log excerpt

Sat Aug 19 12:31:24 2023 daemon.err python[3096]: [WARNING/tunneldigger.broker] Failed to listen on :8943, skipping.

No client connections are possible in this state.

session.pre-down is run after the interface is removed

We have the following line in our pre-down hook:

/sbin/brctl delif saarVPN $INTERFACE

With the old version of the broker (before the rewrite, see e.g. https://github.com/ffrl/tunneldigger/) on old kernels (3.16.0), that worked just fine. However, on kernel 4.9.30 and with the latest broker, we now have errors in the log:

(session.pre-down/26205) interface l2tp2221 does not exist!

I am not seeing these errors with the new broker on an old kernel. So it seems the kernel update is the reason here, not the broker update.

That is strange, isn't it? How would the kernel even know already that the tunnel is dead? Does L2TP have in-band signaling for tearing down the tunnel?

ImportError: No module named configparser

Got following error when starting tunneldigger

root@voreifel1:/etc/tunneldigger# /srv/tunneldigger/env_tunneldigger/bin/python -m tunneldigger_broker.main /etc/tunneldigger/hood-rheinbach.cfg
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/srv/tunneldigger/env_tunneldigger/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/main.py", line 1, in <module>
    import configparser
ImportError: No module named configparser

Incoming messages get incorrectly dispatched to broker

We re-landed @kaechele's NAT removal, but things are still going very wrong: once I have more than just 1 or 2 clients (I am not sure if there is a fixed limit, but with 2 clients things still seemed to work fine), connections start to drop. The server thinks that the client timed out.

I added some extra logging, and from what I can see, incoming messages from the client often arrive at the (2-tuple) broker socket, not at the (4-tuple) tunnel socket. So despite what @kaechele said here, there does not seem to be a guarantee that messages with a matching 4-tuple socket do indeed get delivered to that socket.

This is on Debian buster:

$ uname -a
Linux gw1.saar.freifunk.net 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux

Broker blacklisting can lead to endless loop if some brokers are fully offline

The tunneldigger clients tries to avoid selecting the same broker again and it again if it gets a connection error. It blacklists a broker upon error and tries the others, until all failed and they all get unblacklisted again.

Unfortunately, I just observed that this can lead to an endless loop: Say two brokers are configured, of which one is offline and one errors. The erroring one will get blacklisted, but the one that is offline will not. Now the client just keeps trying the offline broker, so even if the one that is online stops erroring, it does not get used.

broker throwing OSError on creation of timers

I have the broker running on OpenWrt and am seeing the following in the log file:

Sat Jul 29 16:06:10 2023 daemon.err python[32619]: [ERROR/tunneldigger.broker] Unhandled exception while creating tunnel 100:
Sat Jul 29 16:06:10 2023 daemon.err python[32619]: [ERROR/tunneldigger.broker] Traceback (most recent call last):
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:   File "/usr/lib/python3.11/site-packages/tunneldigger_broker/broker.py", line 144, in create_tunnel
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:     tunnel.setup_tunnel()
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:   File "/usr/lib/python3.11/site-packages/tunneldigger_broker/tunnel.py", line 138, in setup_tunnel
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:     self.create_timer(self.keepalive, timeout=random.randrange(3, 15), interval=5)
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:   File "/usr/lib/python3.11/site-packages/tunneldigger_broker/network.py", line 85, in create_timer
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:     timerfd.settime(timer, 0, timerfd.itimerspec(value=timeout, interval=interval))
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:   File "/usr/lib/python3.11/site-packages/tunneldigger_broker/timerfd.py", line 124, in settime
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:     ret = libc.timerfd_settime(ufd, flags, ctypes.pointer(new_value), ctypes.pointer(old_value))
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:   File "/usr/lib/python3.11/site-packages/tunneldigger_broker/timerfd.py", line 103, in errcheck
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:     raise OSError(errno, os.strerror(errno))
Sat Jul 29 16:06:10 2023 daemon.err python[32619]: OSError: [Errno 22] Invalid argument
Sat Jul 29 16:06:10 2023 daemon.err python[32619]:
Sat Jul 29 16:06:10 2023 daemon.err python[32619]: [WARNING/tunneldigger.protocol] Failed to create tunnel (b6:86:a3:a9:cd:b3:fa:8e:07:c9) while processing prepare request.

and with strace I see this

timerfd_settime64(9, 0, {it_interval={tv_sec=21474836480, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=18446744072498689248}}, 0xb7288d28) = -1 EINVAL (Invalid argument) 

any help getting this working would be appreciated.

client: suspicious assertion in asyncns.c

Building the client local shows a warning that seems legit at first glance:

In file included from /home/r/src/freifunk/tunneldigger/client/libasyncns/asyncns.c:23:
/home/r/src/freifunk/tunneldigger/client/libasyncns/asyncns.c: In function ‘asyncns_setuserdata’:
/home/r/src/freifunk/tunneldigger/client/libasyncns/asyncns.c:1484:12: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
     assert(q->asyncns = asyncns);
            ^
/home/r/src/freifunk/tunneldigger/client/libasyncns/asyncns.c: In function ‘asyncns_getuserdata’:
/home/r/src/freifunk/tunneldigger/client/libasyncns/asyncns.c:1492:12: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
     assert(q->asyncns = asyncns);
            ^

This needs further investigation to determine if the = should be ==.

High CPU load due to a single misbehaving client

We occasionally see a client misbehave and establish multiple connections at the same time to all our servers. For some reason, even when there are just around 20 connections per 10 minutes, this causes 100% CPU load by tunneldigger. Python is not the most efficient language, but this seems eccessive -- I'd like to better understand where in the broker all that CPU time is spent. Unfortunately, so far I found no good way to do such an analysis for python (what I am looking for is something like callgrind).

Binding to an interface which is not up yet causes many log entries

I am using the tunneldigger client in openWrt (freifunk-berlin firmware). We expect situations where people install the firmware while meshing islands through a tunnel. It is therefore important that tunnels are not made over the mesh network and only though the wan interface. Thus we want to bind to the interface.

An issue arrises when router is not connected to a wan and only meshes. The wan is not connected to anything, yet tunneldigger tries to make a connection anyhow. There were 80 log entries in 10 seconds, which repeat five seconds later.

Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:52 2018 daemon.info td-client: Performing broker selection...
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:53 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:53 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:53 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:56 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:56 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:56 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:58 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:58 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:58 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:04:59 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:04:59 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:04:59 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:01 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:01 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:01 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:02 2018 daemon.info td-client: Reinitializing tunnel context.
Mon Oct 22 01:05:02 2018 daemon.err td-client: Failed to bind to device!
Mon Oct 22 01:05:02 2018 daemon.err td-client: Unable to reinitialize the context!
Mon Oct 22 01:05:02 2018 daemon.err td-client: No suitable brokers found. Retrying in 5 seconds

I see two possible solutions.

  1. Change l2tp_client.c at line 434 to wait for a timeout when the bind fails and try again. And without it being an error.
  2. Change the init to only start those which do not bind. Create a hotplug script to handle those which do bind.

I am willing to make the changes, but I want to know what the development group thinks before I go making changes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.