weaveworks / weave Goto Github PK

View Code? Open in Web Editor NEW

6.6K 230.0 663.0 26.6 MB

Simple, resilient multi-host containers networking and more.

Home Page: https://www.weave.works

License: Apache License 2.0

Shell 21.56% Go 75.91% Ruby 0.15% Makefile 1.84% Python 0.10% Dockerfile 0.44%

kubernetes container-networking docker go

weave's Introduction

Weave Net - Weaving Containers into Applications

About Weaveworks

Weaveworks is the company that delivers the most productive way for developers to connect, observe and control Docker containers.

This repository contains Weave Net, the first product developed by Weaveworks, with over 8 million downloads to date. Weave Net enables you to get started with Docker clusters and portable apps in a fraction of the time required by other solutions.

Weave Net

Weave Net creates a virtual network that connects Docker containers across multiple hosts and enables their automatic discovery. With Weave Net, portable microservices-based applications consisting of multiple containers can run anywhere: on one host, multiple hosts or even across cloud providers and data centers. Applications use the network just as if the containers were all plugged into the same network switch, without having to configure port mappings, ambassadors or links.

Services provided by application containers on the Weave network can be exposed to the outside world, regardless of where they are running. Similarly, existing internal systems can be opened to accept connections from application containers irrespective of their location.

Getting help

If you have any questions about, feedback for or a problem with Weave Net:

Read the Weave Net docs.
Invite yourself to the Weave Users Slack.
- The channel for contributors and developers of Weave Net is #net-dev.
Join the Weave User Group and get invited to online talks, hands-on training and meetups in your area.
File an issue.

We follow the CNCF Code of Conduct.

Your feedback is always welcome!

Further Information

weave's People

Contributors

Stargazers

Watchers

Forkers

wang-jian bboreham dpw mffrench damianszymanski pbc0x00 binarytemple gopster85 kusl ippy04 esaul yekeqiang bussiere infra-structure thatchristoph neuroradiology michft no2key bwlv abioy li-ch jiangzhonghui charles820 rubandeventhiran changliwei wizard-cxy zenlambda djuric42 satanders yuwentao kolbrn4ik rocknrollmarc sacheendra adieu bradparks dot-sean alexislitool nivim8 atp15 qlw binocarlos wellyhuang christianberg grkvlt joenonestring fastrom silence2012 squaremo chenyf yudupi hesco mapbased th3architect mohitsethi rainsome-org1 rbg errordeveloper wdxxs2z rade lipaozi vlaxy rhelfan manojkumarmc cloud66 alemic rayleyva firstclassfunc haikuowuya brennangaunce sheepcat ansonism dvusboy phlizik xuzhaokui yutianwu toshisam jainvipin inercia rmoorman aiyi sw1nn fanyeren coolljt0725 rabi mtsuk jfmwz webdev1001 chaimaghribi viirya piaohai zj8487 sysalexis leowmjw lemonhall 99plus2 vorstellung watermen cesarnog mathias-dietrich sttts

weave's Issues

Document HA scenarios for router

Contributing guide

I have a generic contributing guide for go projects if you want to use it: https://github.com/cloudescape/gowsdl/blob/master/CONTRIBUTING.md

document how to have a container that's just attached to the weave network

If I do C=$(weave run 10.0.1.2/24 -t -i --net=none ubuntu) then I get a container that doesn't have a regular eth0 interface connected to the normal docker0 bridge, which might be desired behaviour for a bunch of use cases

don't lie about source of injected icmp 3.4

In order to fix issue #1, we are now "lying" about the origin of icmp 3.4 packets. This is somewhat distasteful and could also cause real issues since now a sender may get multiple icmp 3.4 purporting to come from the same ip, successively lowering the mtu, when in fact they were issued from the same hop.

As @msackman commented in issue #1, we could equip weave with an ip in each sub-net, and then pick the ip based on the the sub-net of the dst. The obvious downside here is that we need to add an ip whenever a sub-net is added, so sub-net addition suddenly becomes more involved.

So we really want to come up with a different solution. One starting point might be to investigate further why the packets got dropped in the first place. After all, their dst ip was in the correct sub-net, and the src ip was in the weave sub-net.

sha256 session key formation may be insecure

Currently formSessionKey just mixes the password into the shared secret with sha256. That might be ok, or it might not, but it's safer just to change it to something like bcrypt (https://godoc.org/code.google.com/p/go.crypto/bcrypt).

However, the bcrypt.GenerateFromPassword doesn't make clear that the incoming password is the same length as the outgoing. Also, the incoming and outgoing is a slice, whereas we need a [32] array. So some extra conversion will need to take place.

I think the "right" thing to do here is to continue to use the sha256 to mix in the password and get things to the right length. Then pass the result of that through to bcrypt.GenerateFromPassword (with the necessary conversions to/from slice). I think this is "right" because I can't see how otherwise to combine the shared secret and the password via bcrypt, and end up with the right length array/slice.

set up app container networking before containers start

Currently 'weave run' sets up the app container's interface into the weave network after the container has been launched with 'docker run -d'. That means the network may not be available to the container process straight away. Depending on what the container is doing, that can be benign, annoying, or disastrous.

Containers can themselves ensure that the interface is available, by running s.t. like https://github.com/jpetazzo/pipework/blob/master/pipework#L30, i.e.

while ! grep -q ^1$ /sys/class/net/ethwe/carrier 2>/dev/null
do sleep 1
done

before starting the actual container process, but this of course requires containers to have been constructed with weave in mind, which is limiting.

There is no way around this issue w/o some changes to docker.

Cannot ping containers in another host

/ # ping 10.0.1.1
PING 10.0.1.1 (10.0.1.1): 56 data bytes
64 bytes from 10.0.1.1: seq=0 ttl=64 time=0.164 ms
64 bytes from 10.0.1.1: seq=1 ttl=64 time=0.087 ms
64 bytes from 10.0.1.1: seq=2 ttl=64 time=0.088 ms
^C
--- 10.0.1.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.087/0.113/0.164 ms
/ # ping 10.0.1.2
PING 10.0.1.2 (10.0.1.2): 56 data bytes
^C
--- 10.0.1.2 ping statistics ---
8 packets transmitted, 0 packets received, 100% packet loss

weave status to show IP addresses

Currently the output of the weave status command shows MAC addresses.
Can we add a feature to show IP addresses?

document weave crypto

Some folks are curious how weave crypto works. And the code isn't quite self-explanatory, at least not at first glace. So we should jot down some notes.

eliminate static compilation warnings

static compilation throws up some scary warnings...

make -k weave
go build -ldflags '-extldflags "-static"'
# github.com/zettio/weave/weave
/var/tmp/go-link-6g6P7K/000000.o: In function `_cgo_14616c423265_Cfunc_freeaddrinfo':
/tmp/makerelease197226928/go/src/pkg/net/cgo_unix.go:97: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libpcap.a(nametoaddr.o): In function `pcap_nametoaddr':
(.text+0x5): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libpcap.a(nametoaddr.o): In function `pcap_nametonetaddr':
(.text+0xa5): warning: Using 'getnetbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libpcap.a(nametoaddr.o): In function `pcap_nametoproto':
(.text+0x2c5): warning: Using 'getprotobyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libpcap.a(nametoaddr.o): In function `pcap_nametoport':
(.text+0xe9): warning: Using 'getservbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

I am pretty sure these warnings are harmless since we don't invoke the functions in pcap that require dynamically loaded components of glibc. However, we really want to get rid of them if we can.

One option might be to compile libpcap ourselves with musl, and then weave too. That would probably also shrink the container image further.

'weave' script displays blank lines instead of docker progress output

If I discard the cached weave image with

# docker rmi zettio/weave

then 'weave launch ...', I see:

Unable to find image 'zettio/weave' locally
Pulling repository zettio/weave






54face33e6d7582e803ee5fdffabd9bd4c88667946658054cc93a1de6ee3c3e3

Those blank lines are where I would normally see docker's progress output as it downloads the image. I get the same thing with 'weave run' too.

rolling password update

If, for whatever reason, one wants to change the password given to weave in order to set up encrypted communication, one needs to restart all weave routers. That in itself isn't too bad, but routers with the new password will not be able to talk to routers with the old one and vice versa. So we end up with a sort of rolling partition, until every router has been bounced.

We could allow two passwords to be valid simultaneously. Weave routers would attempt to connect with one and if that fails due to a mismatch then try the other.

reduce latency

Weave latency is perhaps not as low as it could be, and it varies quite a bit.

I have set up a simple 2-weaver weave network running all on a single host, i.e. without containers or VMs, in order to reduce factors which might affect performance and allow good performance analysis tools to be applied.

The latency, as reported by ping, still varies a lot. Here is a distribution I get for "ping -c 100 -i 0.5":

min: 0.289ms
25%ile: 0.742ms
50%ile: 0.864ms
75%ile: 0.970ms
max: 1.16ms

Typically when measuring the latency of a system, one sees a cluster of low values close to a baseline, with some high outliers. Weave currently manages the opposite: Mostly high values, with the occasional low outlier. This suggests room for improvement.

More results of my investigations to follow.

concurrent connection attempts to same address

This behaviour can give rise to logs like this:

weave 2014/09/04 16:03:07.319208 Local identity is 7a:0e:28:0c:dc:9e
[...]
weave 2014/09/04 16:05:19.571106 dial tcp4 54.76.115.81:6783: connection timed out
weave 2014/09/04 16:05:24.566093 dial tcp4 54.76.115.81:6783: connection timed out
weave 2014/09/04 16:05:29.683107 dial tcp4 54.76.115.81:6783: connection timed out
weave 2014/09/04 16:05:39.672227 dial tcp4 54.76.115.81:6783: connection timed out
weave 2014/09/04 16:05:49.653700 dial tcp4 54.76.115.81:6783: connection timed out
weave 2014/09/04 16:06:04.627102 dial tcp4 54.76.115.81:6783: connection timed out

The TCP connection timeout takes about 2 minutes, but these reports are coming in faster that that because multiple connections were started in parallel.

In the main loop in ConnectionMaker.queryLoop, as the next 'tryAfter' time is reached for a failedConnection, it fires off a set of goroutines to attempt connection, then schedules the next attempt.
But no account is taken of any previous attempts which are still running, e.g. because they are waiting 2 minutes for a TCP connection timeout.

abysmal throughput on exposed IPs - PMTU ignored

In issue #37, @adieu observed abysmal throughput when running a performance test involving exposed IPs, i.e. having the sender, or receiver, or both sitting on the host and sending to / listening on an IP exposed with 'weave expose'.

I have reproduced the same results when running two VMs on my laptop:

$HOST1:

# weave launch 10.0.0.1/16
# weave expose 10.0.1.1/24
# qperf

$HOST2

# weave launch 10.0.0.2/16 $HOST1
# weave expose 10.0.1.2/24
# qperf 10.0.1.1 tcp_bw
tcp_bw:
  bw  =  32.8 KB/sec

Inspecting the weave logs, on the sending side I see a deluge of

weave 2014/09/13 16:49:15.633482 Sending ICMP 3,4 (10.0.1.1 -> 10.0.1.2): PMTU= 1438

So this suggest that the PMTU is being ignored by the sender, which is odd, since the kernel does appear to have the correct information:

# ip route get 10.0.1.1
10.0.1.1 dev weave  src 10.0.1.2 
    cache  expires 389sec mtu 1438

support systems with older versions of `iptables` and `ip`

I got error RTNETLINK answers: Operation not supported while running weave launch 10.0.0.1/16. Weave script use ip and iptables command to create bridge and nat rule. It seems ip command doesn't support bridge type on CentOS 6.5. And iptables doesn't support -C flag also.

Maybe we can use brctl as a backup method to create a bridge and check nat rule with delete then append to mitigate the issue.

failure to re-establish connections afer bouncing weave container

Say I have two hosts set up pretty much as in the example in the readme, i.e. $HOST2 initially connecting to $HOST1. When I bounce the weave container on $HOST1 with

host1# docker kill weave; docker rm weave
host1# weave launch 10.0.0.1/16

I would expect the weave on $HOST2 to eventually re-establish the connection to the re-launched weave. But sometimes that doesn't happen.

multi-hop routing breaks pmtu discovery

I haven't actually tested this, but...

When a weave router forwards a packet to another peer, it uses the same logic for dealing with too big frames as if the packet had been captured from the local interface. In particular, an ICMP 3.4 packet will be injected on the interface. Now, if weave subsequently actually captured that same packet then things might sort of work, but a) it probably won't since we tell pcap to capture inbound packets only, and b) this is a rather round-about and inefficient way of handling pmtu discovery in the forwarding scenario.

Instead we should directly sent the generated ICMP 3.4 packet to the appropriate peer(s).

determine what version of the code is in published images

We need an easy way to determine what version of the weave code users are running, at least for the container images we publish on the docker hub.

Perhaps add a 'publish' Makefile target that tags the source and publishes the image with the same tag?

Adding a 'weave -v' to display such version information would be great too.

iptables v1.4.19.1: unknown option "-w"

Just tested the latest weave script on my a fully updated Fedora 20 machine, and:

weave launch 10.0.0.1/16

iptables v1.4.19.1: unknown option "-w"
Try `iptables -h' or 'iptables --help' for more information.

use ovs as an alternative to pcap

ie. you just want to route the flows not process every packet in userspace.

off-by-8 error in pmtu verification?

weave 2014/08/27 10:39:54.588585 Peer 7a:42:7d:36:27:a1 established active connection to remote peer 7a:c1:9c:25:be:c4 at 31.52.138.103:35482
weave 2014/08/27 10:39:54.588986 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
weave 2014/08/27 10:39:54.589039 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 8939
weave 2014/08/27 10:39:54.863824 Discovered remote MAC b6:bc:ee:58:06:a8 at 7a:c1:9c:25:be:c4
weave 2014/08/27 10:39:59.589446 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 8931
weave 2014/08/27 10:39:59.589638 EMSGSIZE on send, expecting PMTU update (IP packet was 8973 bytes, payload was 8965 bytes)
weave 2014/08/27 10:39:59.590031 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 1438
weave 2014/08/27 10:40:04.590380 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 1430
weave 2014/08/27 10:40:04.623980 ->[7a:c1:9c:25:be:c4]: Client PMTU verified at 1430

So weave fails to verify the pmtu returned by getsockopt(fd, IPPROTO_IP, IP_MTU). Verification succeeds for that pmtu-8. (and yes, it really is 8; I've changed the verification code to drop the pmtu by 1 at a time instead of 8 and success still only occurs at original_pmtu-8).

Now, it is of course possible that the returned value is indeed wrong. After all, one reason we have pmtu verification at all is to cope exactly with that possibility.

However, it is also possible that our calculations of overheads are wrong, and we actually produce packets that exceed the pmtu by 8.

deal more gracefully with pre-existing container named weave

This problem isn't helped by Docker having a recurrent regression where containers aren't restarted when a host is rebooted.

If I've run weave, and the container called 'weave' is stopped (by a host reboot or for any other reason) then next time I run weave launch I'll get an error message Conflict, The name weave is already assigned to 123456789abc

How this could be better:

check for a container called weave and just restart it if it's there (might cause issues if different launch parameters are passed)
check for a container called weave and remove it before attempting to start a new one

'pfring' instead of 'pcap'

Similar to #4... let's see whether pfring gives us better performance than pcap.

pmtu verification can take ages, for no good reason

Seen this when connecting from a weave at home to one running on EC2...

weave 2014/08/26 18:51:47.415385 Peer 7a:42:7d:36:27:a1 established active connection to remote peer 7a:c1:9c:25:be:c4 at 31.52.138.103:35085
weave 2014/08/26 18:51:47.415727 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
weave 2014/08/26 18:51:47.415772 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 8939
weave 2014/08/26 18:51:48.097208 Discovered remote MAC 82:54:bc:97:2b:81 at 7a:c1:9c:25:be:c4
weave 2014/08/26 18:51:52.416220 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 8929
weave 2014/08/26 18:51:52.416438 EMSGSIZE on send, expecting PMTU update (IP packet was 8971 bytes, payload was 8963 bytes)
[repeated 10 times]
weave 2014/08/26 18:51:57.416839 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 8919
weave 2014/08/26 18:51:57.417022 EMSGSIZE on send, expecting PMTU update (IP packet was 8961 bytes, payload was 8953 bytes)
[repeated 10 times]

and so on, with the pmtu dropping by 10 every five seconds.

But when I decide to run a 'ping -s 9000' from one of my containers on EC2 to a container at home then this happens

weave 2014/08/26 18:52:12.608854 Discovered local MAC c6:1b:07:f0:67:11
weave 2014/08/26 18:52:12.609110 Sending ICMP 3,4 (10.0.1.3 -> 10.0.1.4): PMTU= 8889
weave 2014/08/26 18:52:13.609391 EMSGSIZE on send, expecting PMTU update (IP packet was 8910 bytes, payload was 8902 bytes)
weave 2014/08/26 18:52:13.609528 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 1438
weave 2014/08/26 18:52:17.639305 Discovered remote MAC 76:11:4c:e7:97:b3 at 7a:c1:9c:25:be:c4
weave 2014/08/26 18:52:18.610059 ->[7a:c1:9c:25:be:c4]: Client PMTU set to 1428
weave 2014/08/26 18:52:18.630817 ->[7a:c1:9c:25:be:c4]: Client PMTU verified at 1428

So all of a sudden weave has figured out the correct pmtu, and the ping does in fact succeed.

explict MAC address assignment

Right now the OS generates random MAC addresses for the container interfaces bound to it. To prevent clashes, we should provide an optional explict MAC address assignment, e.g. with a "--mac ..." option to 'weave run'.

'weave status'

Running

kill -USR1 `docker inspect --format='{{ .State.Pid }}' $WEAVE`

followed by inspecting the container logs is a tad inconvenient.

Would be good to have a 'weave status'.

re-launching weave on different network breaks container communication

Symptoms are: ping gets 'stuck' and after a while it says 'destination unreachable'

Transcript from 'host2' (using boot2docker):

docker@boot2docker:~$ sudo ./weave launch 10.2.0.2/16 54.165.176.121
80a6daf4ae31ae54fe94262faf1136d06cf946923fe260f7b48574405347dbea
Cannot find ethtool; please install it. Continuing without it.
docker@boot2docker:~$ sudo ./weave status
Local identity is 7a:fc:47:1a:f5:f3
Sniffing traffic on &{70 65535 ethwe 4a:51:56:a6:cc:0a up|broadcast|multicast}
MACs:
4a:51:56:a6:cc:0a -> 7a:fc:47:1a:f5:f3 (2014-09-11 05:14:37.61617449 +0000 UTC)
de:71:28:af:56:f3 -> 7a:fc:47:1a:f5:f3 (2014-09-11 05:14:37.99452193 +0000 UTC)
Peers:
Peer 7a:fc:47:1a:f5:f3 (v1) (UID 4065814941315835731)
   -> 7a:0b:ff:aa:4a:99
Peer 7a:0b:ff:aa:4a:99 (v1) (UID 11685789748452829478)
   -> 7a:fc:47:1a:f5:f3
Topology:
unicast:
7a:fc:47:1a:f5:f3 -> 00:00:00:00:00:00
7a:0b:ff:aa:4a:99 -> 7a:0b:ff:aa:4a:99
broadcast:
7a:fc:47:1a:f5:f3 -> [7a:0b:ff:aa:4a:99]
7a:0b:ff:aa:4a:99 -> []
Reconnects:
docker@boot2docker:~$ C=$(sudo ./weave run 10.2.1.2/24 -t -i ubuntu)
Cannot find ethtool; please install it. Continuing without it.
docker@boot2docker:~$ sudo docker attach $C
root@287d1d147efa:/# 
root@287d1d147efa:/# ping 10.2.1.1
PING 10.2.1.1 (10.2.1.1) 56(84) bytes of data.

docker@boot2docker:~$ docker logs weave
weave 2014/09/11 05:14:36.460746 Waiting for interface ethwe to come up
weave 2014/09/11 05:14:37.463429 Interface ethwe is up
weave 2014/09/11 05:14:37.464378 Local identity is 7a:fc:47:1a:f5:f3
weave 2014/09/11 05:14:37.616058 Sniffing traffic on &{70 65535 ethwe 4a:51:56:a6:cc:0a up|broadcast|multicast}
weave 2014/09/11 05:14:37.616188 Discovered our MAC 4a:51:56:a6:cc:0a
weave 2014/09/11 05:14:37.994542 Discovered local MAC de:71:28:af:56:f3
weave 2014/09/11 05:14:38.631532 Peer 7a:fc:47:1a:f5:f3 established active connection to remote peer 7a:0b:ff:aa:4a:99 at 54.165.176.121:6783
weave 2014/09/11 05:14:38.631922 EMSGSIZE on send, expecting PMTU update (IP packet was 60042 bytes, payload was 60034 bytes)
weave 2014/09/11 05:14:38.632015 ->[7a:0b:ff:aa:4a:99]: Effective PMTU set to 1438
weave 2014/09/11 05:14:38.796625 ->[7a:0b:ff:aa:4a:99]: Effective PMTU verified at 1438
weave 2014/09/11 05:15:05.134650 Discovered local MAC 46:22:fc:54:e3:90
weave 2014/09/11 05:15:09.911981 Discovered remote MAC 7a:0b:ff:aa:4a:99 at 7a:0b:ff:aa:4a:99
weave 2014/09/11 05:15:13.477022 Discovered remote MAC f2:9f:a4:77:e5:92 at 7a:0b:ff:aa:4a:99

Transcript from 'host2':

core@ip-10-0-4-14 /opt/weave $ sudo ./weave launch 10.2.0.1/16
a5c55839464f4c4a69b0dee2f02215e8a71bce5c3d416966ed01b049c7942f64
core@ip-10-0-4-14 /opt/weave $ sudo ./weave status
Local identity is 7a:0b:ff:aa:4a:99
Sniffing traffic on &{55 65535 ethwe fa:1f:2e:17:fb:26 up|broadcast|multicast}
MACs:
7a:0b:ff:aa:4a:99 -> 7a:0b:ff:aa:4a:99 (2014-09-11 16:14:12.787083833 +0000 UTC)
de:71:28:af:56:f3 -> 7a:fc:47:1a:f5:f3 (2014-09-11 16:14:43.000595201 +0000 UTC)
fa:1f:2e:17:fb:26 -> 7a:0b:ff:aa:4a:99 (2014-09-11 16:14:10.274634931 +0000 UTC)
6a:be:85:71:ef:03 -> 7a:0b:ff:aa:4a:99 (2014-09-11 16:14:11.03357848 +0000 UTC)
Peers:
Peer 7a:0b:ff:aa:4a:99 (v1) (UID 11685789748452829478)
   -> 7a:fc:47:1a:f5:f3
Peer 7a:fc:47:1a:f5:f3 (v1) (UID 4065814941315835731)
   -> 7a:0b:ff:aa:4a:99
Topology:
unicast:
7a:0b:ff:aa:4a:99 -> 00:00:00:00:00:00
7a:fc:47:1a:f5:f3 -> 7a:fc:47:1a:f5:f3
broadcast:
7a:0b:ff:aa:4a:99 -> [7a:fc:47:1a:f5:f3]
7a:fc:47:1a:f5:f3 -> []
Reconnects:
core@ip-10-0-4-14 /opt/weave $  C=$(sudo ./weave run 10.2.1.1/24 -t -i ubuntu)
core@ip-10-0-4-14 /opt/weave $ sudo docker attach $C
root@3c910300ed69:/# 
root@3c910300ed69:/# ping 10.2.1.2
PING 10.2.1.2 (10.2.1.2) 56(84) bytes of data.

core@ip-10-0-4-14 /opt/weave $ docker logs weave
weave 2014/09/11 16:14:10.259305 Local identity is 7a:0b:ff:aa:4a:99
weave 2014/09/11 16:14:10.274524 Sniffing traffic on &{55 65535 ethwe fa:1f:2e:17:fb:26 up|broadcast|multicast}
weave 2014/09/11 16:14:10.274640 Discovered our MAC fa:1f:2e:17:fb:26
weave 2014/09/11 16:14:11.033620 Discovered local MAC 6a:be:85:71:ef:03
weave 2014/09/11 16:14:12.787102 Discovered local MAC 7a:0b:ff:aa:4a:99
weave 2014/09/11 16:14:42.755327 Peer 7a:0b:ff:aa:4a:99 established active connection to remote peer 7a:fc:47:1a:f5:f3 at 37.142.186.21:49429
weave 2014/09/11 16:14:42.755896 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
weave 2014/09/11 16:14:42.756009 ->[7a:fc:47:1a:f5:f3]: Effective PMTU set to 8939
weave 2014/09/11 16:14:42.766294 EMSGSIZE on send, expecting PMTU update (IP packet was 8981 bytes, payload was 8973 bytes)
weave 2014/09/11 16:14:42.766365 ->[7a:fc:47:1a:f5:f3]: Effective PMTU set to 1438
weave 2014/09/11 16:14:43.000608 Discovered remote MAC de:71:28:af:56:f3 at 7a:fc:47:1a:f5:f3
weave 2014/09/11 16:14:43.097358 ->[7a:fc:47:1a:f5:f3]: Effective PMTU verified at 1438
weave 2014/09/11 16:15:09.420420 Discovered remote MAC 46:22:fc:54:e3:90 at 7a:fc:47:1a:f5:f3
weave 2014/09/11 16:15:17.600425 Discovered local MAC f2:9f:a4:77:e5:92

pointless 'Reference to unknown peers' log message

Seen in #28 (comment), when a peer first connected to us. Is this just stating the obvious, in which case we should remove the log message, or is there more to it?

'weave reset' to remove traces of weave

Weave creates a bunch of networking related stuff - namely the weave bridge, and an iptable masquerading rule - that deliberately do not get removed by 'weave stop' since we want them to survive bouncing of the weave container.

We probably do want to give users a convenient way to remove all traces of weave, with s.t. like a 'weave reset' command.

Deal with reboot of the underlying machine

If you have the weave container running and reboot the underlying machine, then Docker restarts the container but it sits there saying 'Waiting for interface ethwe to come up'.

'weave status' says nothing at all.

'weave launch' says "Weave is already running."

'weave stop' does stop the container, at which point you can run 'weave launch' again successfully.

Obviously if the network bridge isn't up then the weave router can't do very much, but it seems wrong that the only way to make progress is to stop it and start it again

weave not working on GCE

Try as I may I cannot get weave to work on GCE with --image container-vm-v20140826. I think maybe GCE cannot use UDP on its internal networks

connection attempts to new peers delayed by a few seconds

The first time a peer notifies us of another peer, there is a delay of around 5 seconds before we create a connection. For example:

weave 2014/09/05 17:21:12.851820 Interface ethwe is up
weave 2014/09/05 17:21:12.853510 Local identity is 7a:c9:8a:4b:e9:04
weave 2014/09/05 17:21:12.913305 Sniffing traffic on &{56 65535 ethwe 3e:67:56:4c:dd:62 up|broadcast|multicast}
weave 2014/09/05 17:21:12.913352 Discovered our MAC 3e:67:56:4c:dd:62
weave 2014/09/05 17:21:13.410682 Peer 7a:c9:8a:4b:e9:04 established active connection to remote peer 7a:41:c7:a4:e2:bb at 54.213.246.123:6783
weave 2014/09/05 17:21:13.412840 Reference to unknown peers
weave 2014/09/05 17:21:13.613054 Discovered local MAC 8a:f2:5b:f0:ae:0f
weave 2014/09/05 17:21:17.829608 Peer 7a:c9:8a:4b:e9:04 established active connection to remote peer 7a:17:ff:ed:a4:48 at 192.168.15.152:48818

pmtu verification takes ages on broken networks

Following on from issue #12, in setups where the network is, say, swallowing icmp 3.4 packets, it can take a very long time to discover a working pmtu. Take the issue I saw with EC2 in #13 (comment). We started with a pmtu of 8939. The working pmtu is 1430. Dropping the pmtu by 8 every 5 seconds, as we currently do, it would take 78 minutes to discover that. And if the initial pmtu had been near the max, then 10 hours.

Firstly, I think reducing the verification interval to 2 seconds would be reasonable - networks on which people will want to run weave are unlikely to have roundtrip times greater than that.

And perhaps we should employ binary search, which would have a worst case duration of 16*2s = 32s.

'weave version'

As of the resolution to issue #10, we tag published images with the git revision. 'weave version' should display that information, specifically the revision of the image of a running 'weave' container, and, failing that, the revision of the zettio/weave image.

eliminate tx offload disabling in application container

As of 6eb7e1a we disable tx offload on the 'weave' bridge. It is possible that as a result we no longer need to disable it on the container's ethwe interfaces, since these are connected to that bridge. All depends on when the kernel decides to do offloading.

"bind: permission denied" error on selinux

The default selinux policy in Fedora 20 (and probably RHEL, CentOS and other derivatives also) prevents docker containers being able to bind raw sockets. This leads to dialIP failing:

weave 2014/08/22 21:10:05.985122 ->[7a:98:61:f3:57:35]: encountered fatal error dial ip4:UDP 192.168.121.2: bind: permission denied

When this happens, a corresponding record appears in '/var/log/audit/audit.log':

type=AVC msg=audit(1408742307.053:22561): avc:  denied  { node_bind } for  pid=7616 comm="weaver" saddr=172.17.0.14 scontext=system_u:system_r:svirt_lxc_net_t:s0:c209,c668 tcontext=system_u:object_r:node_t:s0 tclass=rawip_socket

This can be worked around by making the weaver container priviledged, or by setting selinux to permissive mode:

# setenforce 0

It is possible to load a policy module that will allow the bind to succeed, but it seem this can only be done for all docker containers. I can't see a way to grant the permission to the weaver container only.

Cannot launch weave after stopping weave

root@ubuntu:# weave stop
root@ubuntu:# weave launch 10.0.0.2/16 $172.16.1.86
2014/09/10 13:03:37 Error response from daemon: Cannot start container 2b5cd9c4aa552c0acc11c13091693ce517818668e2fbcd574703fadc1dd12a38: Bind for 0.0.0.0:6783 failed: port is already allocated

weave doesn't run on Mac

Docker "runs on Mac (OSX)" in the following sense, if you follow http://docs.docker.com/installation/mac/: you install VirtualBox, then run a very small (23MB) Linux installation in a VM, then run the 'docker' command-line binary (built as a native OSX binary) which communicates with the Linux Docker running inside the VM.

So, what is the analogous thing to do with Weave? We don't want to be running 'ip' commands to set up network links on the OSX side, because Docker is running in a VM that is totally isolated from OSX.

Maybe have an OSX-specific 'weave' command that communicates with things running inside the VirtualBox?

'tap' instead of 'pcap'

We currently capture and inject packets with pcap. 'tap' devices are an alternative that might offer better performance.

run non-privileged container

Following on from issue #7, it would be desirable to be able to run weave in a non-privileged container. The sole reason we are doing so is because selinux (et al?) prevents binding of raw ip sockets otherwise. So the burning question is whether we could somehow avoid doing that...

Why do we need raw IP sockets?

A peer that connects to us may sit behind a firewall that does not permit "unsolicited" inbound udp packets. So in order to communicate with that peer we must send udp packets with a source ip address & port that matches the destination the peer connected to. For ordinary upd packets we accomplish that by simply sending them on the listening socket. But we also need to be able to send udp packets with 'DF' ("do not fragment") set, which requires setting of the IPPROTO_IP.IP_MTU_DISCOVER=IP_PMTUDISC_DO socket option. So we need a separate socket for that. Furthermore, we need to be able to catch the EMSGSIZE errors that may be encountered when sending on such a socket, and retrieve the pmtu from the IPPROTO_IP.IP_MTU socket option. So we really need one socket per peer, so we can associate the error & pmtu with the correct peer.

We could create those per-peer DF udp sockets with net.DialUDP. However, we need to specify the same source ip address & port as our main udp listener. DialUDP attempts to bind to the source address & port, which fails since it is already bound by the listener.

That's why we create a raw IP socket instead. Ports don't feature in IP, so there is no issue with binding.

What else could we do?

Use the listener socket to send all udp packets, setting the appropriate socket options per packet. That requires a global lock in order to make the set_option, send_packet, check_error, perhaps_retrieve_pmtu sequence atomic, thus effectively serialising all outbound peer communication and also introducing one or more possibly expensive syscalls into the critical path. However, note that a) inbound peer communication is already serialised, b) outbound non-df communication gets serialised (since writing on socket takes a lock). So we should measure the performance impact before discarding this solution.
somehow set SO_REUSEADDR when creating the extra udp sockets. Unfortunately there appears to be no way to do this in the go networking API, so we'd have to roll our own. It is also not clear whether this is safe. Would it interfere with the listening socket? Would the exta sockets somehow interfere with each other, especially when it comes to handling the EMSGSIZE errors?

sub-nets break pmtu discovery

When we place application containers in a sub-net, say 10.0.1.[1..n]/24, with weave on 10.0.0.[1,2]/16, then the icmp 3.4 packets injected by weave don't make it to the containers since we set their src IP to that of weave, which is not in the app container sub-net. [it's actually not 100% clear why the packets don't make it through; after all, their dst IP is in the app container sub-net]

Setting the src IP to the original packet's dst IP (ditto for MACs) fixes that.

I've looked at the RFCs and cannot find anything that says what IP should be used as the source IP. I'm guessing normally, when packets are routed via gateways etc, it would indeed be the IP of the entity that creates the icmp 3.4 packet. But in case of weave, that entity is really hidden; it's not part of the application network and doesn't route to the application network; instead it captures and injects.

So IMO setting thr src IP of the icmp 3.4 packet to the dst IP of the original packet is actually the closest to being right.

support IPv6

Weave currently only works over IPv4. Main areas that need attention in order to support IPv6 are

PMTU discovery, which currently relies on icmp3.4 detection & injection, which is IPv4 specific.
Fragmentation, i.e. when weave performs fragmentation because it cannot trust the stack to do it. This is IPv4 specific. Though we may not need it at all for IPv6, since in IPv6 all fragmentation is supposed to happen at the source, based on the PMTU.
Various overhead calculations, e.g. udp header size. These are IPv4 specific.
Peer connections. These currently use udp4/tcp4 addressing.
Bridge, interface, firewall and docker port forwarding configuration, i.e. all the stuff we do in the 'weave' script. These might just work when given IPv6 addresses, but we can't be certain at this stage.

weave run is tightly coupled to docker via run cli usage

provide an api or additional cli for allocating an address for a named container and its mtu.

automatic IP assigment

It is a chore to have to specify a CIDR whenever starting a container with weave. DHCP to the rescue. Or something like that. Arguably DHCP is the wrong way round. We want to go from a name/domain to an IP in a particular subnet.

Note that as well as for application containers, we also need IP addresses for the weave container and weavedns, though we could defer that to a separate issue.

display weave script version

Issue #10 covers versioning of the weave container. But the weave script also needs versioning.

make connection address selection more promiscuous

Following on from pull #44, weave only attempts to establish connections to addresses it was given on start, and addresses that other peers are currently (well, as per the latest topology information) connected to.

This is potentially too restrictive...

we may have had a connection to an address that we learnt via the topology, i.e. because some other peer was connected to it, then that connection drops, and if no other peer has a connection to that address at the time, we do no retry.
peers may have been given a whole bunch of addresses on their command line that they aren't currently connected to, e.g. because the corresponding peer established the connection the other way. Currently other peers do not include those addresses in their selection, thus potentially missing out on establishing connections to some peers.
a peer may recently have had a connection to a peer on some address but then that connection dropped. If, for whatever reason, it doesn't get re-established, and no other peer refers to that address, then the address is never re-tried by any peer.

if initial connect to peer fails, then that peer is never retried

The initial connect is done in main() via Peer.CreateConnection, and if this fails then no further attempts are made to connect.
It would be better to use the similar code with retry loop in ConnectionMaker.

weave script and container version compatibility check

Issue #10 and #33 deal with weave container and script versioning. There is some coupling here. i.e. certain weave script versions only work with certain container versions. Would be nice if we checked that, e.g. the script could check whether the weave container is of the expected version.

post some benchmarks

We should post some performance benchmarks, comparing communication over the weave network with alternative ways containers on multiple hosts can be made to talk to each other, e.g. over exposed ports that have been made accessible by the respective firewalls.