Giter Club home page Giter Club logo

netlink's Introduction

netlink Test Status Go Reference Go Report Card

Package netlink provides low-level access to Linux netlink sockets (AF_NETLINK). MIT Licensed.

For more information about how netlink works, check out my blog series on Linux, Netlink, and Go.

If you have any questions or you'd like some guidance, please join us on Gophers Slack in the #networking channel!

Stability

See the CHANGELOG file for a description of changes between releases.

This package has a stable v1 API and any future breaking changes will prompt the release of a new major version. Features and bug fixes will continue to occur in the v1.x.x series.

This package only supports the two most recent major versions of Go, mirroring Go's own release policy. Older versions of Go may lack critical features and bug fixes which are necessary for this package to function correctly.

Design

A number of netlink packages are already available for Go, but I wasn't able to find one that aligned with what I wanted in a netlink package:

  • Straightforward, idiomatic API
  • Well tested
  • Well documented
  • Doesn't use package/global variables or state
  • Doesn't necessarily need root to work

My goal for this package is to use it as a building block for the creation of other netlink family packages.

Ecosystem

Over time, an ecosystem of Go packages has developed around package netlink. Many of these packages provide building blocks for further interactions with various netlink families, such as NETLINK_GENERIC or NETLINK_ROUTE.

To have your package included in this diagram, please send a pull request!

flowchart LR
    netlink["github.com/mdlayher/netlink"]
    click netlink "https://github.com/mdlayher/netlink"

    subgraph "NETLINK_CONNECTOR"
        direction LR

        garlic["github.com/fearful-symmetry/garlic"]
        click garlic "https://github.com/fearful-symmetry/garlic"
    end

    subgraph "NETLINK_CRYPTO"
        direction LR

        cryptonl["github.com/mdlayher/cryptonl"]
        click cryptonl "https://github.com/mdlayher/cryptonl"
    end

    subgraph "NETLINK_GENERIC"
        direction LR

        genetlink["github.com/mdlayher/genetlink"]
        click genetlink "https://github.com/mdlayher/genetlink"

        devlink["github.com/mdlayher/devlink"]
        click devlink "https://github.com/mdlayher/devlink"

        ethtool["github.com/mdlayher/ethtool"]
        click ethtool "https://github.com/mdlayher/ethtool"

        go-openvswitch["github.com/digitalocean/go-openvswitch"]
        click go-openvswitch "https://github.com/digitalocean/go-openvswitch"

        ipvs["github.com/cloudflare/ipvs"]
        click ipvs "https://github.com/cloudflare/ipvs"

        l2tp["github.com/axatrax/l2tp"]
        click l2tp "https://github.com/axatrax/l2tp"

        nbd["github.com/Merovius/nbd"]
        click nbd "https://github.com/Merovius/nbd"

        quota["github.com/mdlayher/quota"]
        click quota "https://github.com/mdlayher/quota"

        router7["github.com/rtr7/router7"]
        click router7 "https://github.com/rtr7/router7"

        taskstats["github.com/mdlayher/taskstats"]
        click taskstats "https://github.com/mdlayher/taskstats"

        u-bmc["github.com/u-root/u-bmc"]
        click u-bmc "https://github.com/u-root/u-bmc"

        wgctrl["golang.zx2c4.com/wireguard/wgctrl"]
        click wgctrl "https://golang.zx2c4.com/wireguard/wgctrl"

        wifi["github.com/mdlayher/wifi"]
        click wifi "https://github.com/mdlayher/wifi"

        devlink & ethtool & go-openvswitch & ipvs --> genetlink
        l2tp & nbd & quota & router7 & taskstats --> genetlink
        u-bmc & wgctrl & wifi --> genetlink
    end

    subgraph "NETLINK_KOBJECT_UEVENT"
        direction LR

        kobject["github.com/mdlayher/kobject"]
        click kobject "https://github.com/mdlayher/kobject"
    end

    subgraph "NETLINK_NETFILTER"
        direction LR

        go-conntrack["github.com/florianl/go-conntrack"]
        click go-conntrack "https://github.com/florianl/go-conntrack"

        go-nflog["github.com/florianl/go-nflog"]
        click go-nflog "https://github.com/florianl/go-nflog"

        go-nfqueue["github.com/florianl/go-nfqueue"]
        click go-nfqueue "https://github.com/florianl/go-nfqueue"

        netfilter["github.com/ti-mo/netfilter"]
        click netfilter "https://github.com/ti-mo/netfilter"

        nftables["github.com/google/nftables"]
        click nftables "https://github.com/google/nftables"

        conntrack["github.com/ti-mo/conntrack"]
        click conntrack "https://github.com/ti-mo/conntrack"

        conntrack --> netfilter
    end

    subgraph "NETLINK_ROUTE"
        direction LR

        go-tc["github.com/florianl/go-tc"]
        click go-tc "https://github.com/florianl/go-tc"

        qdisc["github.com/ema/qdisc"]
        click qdisc "https://github.com/ema/qdisc"

        rtnetlink["github.com/jsimonetti/rtnetlink"]
        click rtnetlink "https://github.com/jsimonetti/rtnetlink"

        rtnl["gitlab.com/mergetb/tech/rtnl"]
        click rtnl "https://gitlab.com/mergetb/tech/rtnl"
    end

    subgraph "NETLINK_W1"
        direction LR

        go-onewire["github.com/SpComb/go-onewire"]
        click go-onewire "https://github.com/SpComb/go-onewire"
    end

    subgraph "NETLINK_SOCK_DIAG"
        direction LR

        go-diag["github.com/florianl/go-diag"]
        click go-diag "https://github.com/florianl/go-diag"
    end

    NETLINK_CONNECTOR --> netlink
    NETLINK_CRYPTO --> netlink
    NETLINK_GENERIC --> netlink
    NETLINK_KOBJECT_UEVENT --> netlink
    NETLINK_NETFILTER --> netlink
    NETLINK_ROUTE --> netlink
    NETLINK_SOCK_DIAG --> netlink
    NETLINK_W1 --> netlink
Loading

netlink's People

Contributors

acln0 avatar bdrung avatar brlbil avatar dstiliadis avatar fbegyn avatar florianl avatar hechuanxupt avatar jsimonetti avatar markusbauer avatar mdlayher avatar nak3 avatar pgier avatar rwhelan avatar selahaddinh avatar stapelberg avatar terinjokes avatar ti-mo avatar tklauser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

netlink's Issues

Could it be used for NFLOG?

I have a package which currently uses libnetfilter_log via cgo: https://github.com/ncw/go-nflog-acctd

I'd really like to lose the cgo out of that package.

Could this package be used for NFLOG accounting too? If so what would need to be done?

I guess this might fall under this from the front page

My goal for this package is to use it as a building block for the creation of other netlink family packages.

Apologies if this is a stupid question - I'm only really familiar with libnetfilter_log and I'm a bit hazy how it all fits together with netlink!

Is locking of go-routine to thread mandatory?

@mdlayher first, I want to thank you for the great work you have done with this library.

While doing some performance tests, I identified some performance issues
with the thread lock.

runtime.LockOSThread()

If you do a pprof and look at the flame graph, you will notice that this threads consume
a constant amount of CPU that is relatively high, and also create caching issues.

My theory is that this is indeed needed when you have multiple clients that
connect in different namespaces (or for that matter, even single client that connects
to a namespace other than the parent process namespace). But it should not be
needed if all clients are in the same namespace as the parent. The issue with go
and namespaces is that the thread must be locked inside a namespace, but if all
clients are in the main process namespace there should be no need to lock the thread.

In tests that we did, we did not notice the problem if all clients are in the parent
namespace.

I was wondering if you have a specific test-case scenario where the bug was showing up.

If not, would it make sense to send a patch, where the thread lock is only activated for
any remote namespace call?

Thanks,

netlink: consider walking attributes slice directly in AttributeDecoder

As a first step right now, the NewAttributeDecoder constructor unpacks the attributes from the byte slice so it can walk them internally. It'd be interesting to try to directly walk the byte slice instead, to avoid additional internal allocations and copies.

This could also mean that the constructor could stop returning an error and an invalid attributes slice could be caught by the Err method instead.

netlink: network namespaces can be disabled in kernel configuration

When I try to use the library in order to open a generic netlink socket I get the error message:

panic: failed to dial: open /proc/self/task/2281/ns/net: no such file or directory

It seems that this library expects a feature to be enabled in my kernel which is not. Do you happen to know which kernel build configuration/feature is responsible for this subdirectory? It might be handy to document this requirement as well.

netlink: receive hangs indefinitely on invalid message

When using conn.Execute to send/receive an invalid (unknown) netlink message, receive will hang, waiting for a reply.
There probably needs to be some kind of timeout to happen in receive.

For example, send this message:

netlink.Message{
  Header: netlink.Header{
    Length: 0x10,
    Type: 0x12,
    Flags: 0x301,
    Sequence: 0x1,
    PID: 0x214e,
  },
  Data: []uint8(nil),
}

Test failures on s390x

I tried building and running the tests on Fedora/s390x and got a lot of test failures. From the test output, a lot of the bytes appear reversed, maybe this is an issue of big vs. little endianess?
Full build info is in koji:
https://koji.fedoraproject.org/koji/taskinfo?taskID=26880670
https://kojipkgs.fedoraproject.org//work/tasks/670/26880670/build.log

--- FAIL: TestMarshalAttributes (0.00s)
    --- FAIL: TestMarshalAttributes/one_attribute,_no_data (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x04 0x00 0x01 0x00]
    		-  got: [0x00 0x04 0x00 0x01]
    --- FAIL: TestMarshalAttributes/one_attribute,_no_data,_length_calculated (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x04 0x00 0x01 0x00]
    		-  got: [0x00 0x04 0x00 0x01]
    --- FAIL: TestMarshalAttributes/one_attribute,_padded (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x05 0x00 0x01 0x00 0xff 0x00 0x00 0x00]
    		-  got: [0x00 0x05 0x00 0x01 0xff 0x00 0x00 0x00]
    --- FAIL: TestMarshalAttributes/one_attribute,_padded,_length_calculated (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x05 0x00 0x01 0x00 0xff 0x00 0x00 0x00]
    		-  got: [0x00 0x05 0x00 0x01 0xff 0x00 0x00 0x00]
    --- FAIL: TestMarshalAttributes/one_attribute,_aligned (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x08 0x00 0x02 0x00 0xaa 0xbb 0xcc 0xdd]
    		-  got: [0x00 0x08 0x00 0x02 0xaa 0xbb 0xcc 0xdd]
    --- FAIL: TestMarshalAttributes/one_attribute,_aligned,_length_calculated (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x08 0x00 0x02 0x00 0xaa 0xbb 0xcc 0xdd]
    		-  got: [0x00 0x08 0x00 0x02 0xaa 0xbb 0xcc 0xdd]
    --- FAIL: TestMarshalAttributes/multiple_attributes (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x05 0x00 0x01 0x00 0xff 0x00 0x00 0x00 0x08 0x00 0x02 0x00 0xaa 0xbb 0xcc 0xdd 0x04 0x00 0x03 0x00 0x10 0x00 0x04 0x00 0x11 0x11 0x11 0x11 0x22 0x22 0x22 0x22 0x33 0x33 0x33 0x33]
    		-  got: [0x00 0x05 0x00 0x01 0xff 0x00 0x00 0x00 0x00 0x08 0x00 0x02 0xaa 0xbb 0xcc 0xdd 0x00 0x04 0x00 0x03 0x00 0x10 0x00 0x04 0x11 0x11 0x11 0x11 0x22 0x22 0x22 0x22 0x33 0x33 0x33 0x33]
    --- FAIL: TestMarshalAttributes/multiple_attributes,_length_calculated (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x05 0x00 0x01 0x00 0xff 0x00 0x00 0x00 0x08 0x00 0x02 0x00 0xaa 0xbb 0xcc 0xdd 0x04 0x00 0x03 0x00 0x10 0x00 0x04 0x00 0x11 0x11 0x11 0x11 0x22 0x22 0x22 0x22 0x33 0x33 0x33 0x33]
    		-  got: [0x00 0x05 0x00 0x01 0xff 0x00 0x00 0x00 0x00 0x08 0x00 0x02 0xaa 0xbb 0xcc 0xdd 0x00 0x04 0x00 0x03 0x00 0x10 0x00 0x04 0x11 0x11 0x11 0x11 0x22 0x22 0x22 0x22 0x33 0x33 0x33 0x33]
    --- FAIL: TestMarshalAttributes/nested_bit,_type_1,_length_0 (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x04 0x00 0x01 0x80]
    		-  got: [0x00 0x04 0x80 0x01]
    --- FAIL: TestMarshalAttributes/endianness_bit,_type_1,_length_0 (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x04 0x00 0x01 0x40]
    		-  got: [0x00 0x04 0x40 0x01]
    --- FAIL: TestMarshalAttributes/max_type_space,_length_0 (0.00s)
    	attribute_test.go:263: unexpected bytes:
    		- want: [0x04 0x00 0xff 0x3f]
    		-  got: [0x00 0x04 0x3f 0xff]
--- FAIL: TestUnmarshalAttributes (0.00s)
    --- FAIL: TestUnmarshalAttributes/one_attribute,_no_data (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: <nil>
    		-  got: invalid attribute; length too short or too large
    --- FAIL: TestUnmarshalAttributes/one_attribute,_padded (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: <nil>
    		-  got: invalid attribute; length too short or too large
    --- FAIL: TestUnmarshalAttributes/one_attribute,_aligned (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: <nil>
    		-  got: invalid attribute; length too short or too large
    --- FAIL: TestUnmarshalAttributes/multiple_attributes (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: <nil>
    		-  got: invalid attribute; length too short or too large
    --- FAIL: TestUnmarshalAttributes/nested_and_endianness_bits (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: invalid attribute; type cannot have both nested and net byte order flags
    		-  got: invalid attribute; length too short or too large
    --- FAIL: TestUnmarshalAttributes/nested_bit,_type_1,_length_0 (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: <nil>
    		-  got: invalid attribute; length too short or too large
    --- FAIL: TestUnmarshalAttributes/endianness_bit,_type_1,_length_0 (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: <nil>
    		-  got: invalid attribute; length too short or too large
    --- FAIL: TestUnmarshalAttributes/max_type_space,_length_0 (0.00s)
    	attribute_test.go:469: unexpected error:
    		- want: <nil>
    		-  got: invalid attribute; length too short or too large
--- FAIL: TestMessageMarshal (0.00s)
    --- FAIL: TestMessageMarshal/OK_no_data (0.00s)
    	message_test.go:228: unexpected Message bytes:
    		- want: [0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00]
    		-  got: [0x00 0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00]
    --- FAIL: TestMessageMarshal/OK_unaligned_data (0.00s)
    	message_test.go:228: unexpected Message bytes:
    		- want: [0x14 0x00 0x00 0x00 0x00 0x00 0x01 0x00 0x01 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x61 0x62 0x63 0x00]
    		-  got: [0x00 0x00 0x00 0x14 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x0a 0x61 0x62 0x63 0x00]
    --- FAIL: TestMessageMarshal/OK_aligned_data (0.00s)
    	message_test.go:228: unexpected Message bytes:
    		- want: [0x14 0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x14 0x00 0x00 0x00 0x61 0x62 0x63 0x64]
    		-  got: [0x00 0x00 0x00 0x14 0x00 0x02 0x00 0x00 0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x14 0x61 0x62 0x63 0x64]
--- FAIL: TestMessageUnmarshal (0.00s)
    --- FAIL: TestMessageUnmarshal/OK_no_data (0.00s)
    	message_test.go:309: unexpected error:
    		- want: <nil>
    		-  got: not enough data to create a netlink message
    --- FAIL: TestMessageUnmarshal/OK_data (0.00s)
    	message_test.go:309: unexpected error:
    		- want: <nil>
    		-  got: not enough data to create a netlink message
--- FAIL: TestConnReceiveErrorLinux (0.00s)
    --- FAIL: TestConnReceiveErrorLinux/ENOENT (0.00s)
    	conn_linux_error_test.go:87: unexpected error:
    		- want: no such file or directory
    		-  got: errno 16777217
    --- FAIL: TestConnReceiveErrorLinux/multipart_done_with_error_attached (0.00s)
    	conn_linux_error_test.go:87: unexpected error:
    		- want: interrupted system call
    		-  got: errno 50331649

Switch from syscall to x/sys/unix

Since new developments and improvements can only take place in unix, this package should be using unix instead of the deprecated and frozen syscall.

Error susses not handled, passed with other messages.

When receiving messages, checkMessage(m) checks for error messages.
If the error message is 0x0 success it returns nil, this causes the error message to be propagated to the upper-level caller.

HeaderTypeDone messages are suppressed so I was wondering if this behavior intentional?
or should we fix it?

// receive is the internal implementation of Conn.Receive, which can be called
// recursively to handle multi-part messages.
func (c *Conn) receive() ([]Message, error) {
[.......]
if err := checkMessage(m); err != nil {
return nil, err
}
}

nltest: testing package for netlink interactions

It would be great to provide a testing package akin to net/http/httptest that spins up another userspace netlink socket and then accepts connections.

The test server would accept a closure indicating what to do with received data, and what to send back in reply. This would eliminate the need to stub out sockets calls using an interface for some tests.

First step is probably adding a new parameter called PID to Config that dials the kernel as usual when set to zero. Then it should be easy enough to wire up another socket and use it from a Conn.

The same package could be the base for other packages like a genltest or rtnltest.

AttributeDecoder.Bytes

On PR #95, we discussed and added a Bytes function to the AttributeEncoder type, stating:

It's nice and concise with a Bytes method, and general enough to make sense for many callers

If it's common to encode arbitrary bytes, it's likely common to decode them as well. A user of this API can do so with the Do method and a helper, but they have to be especially careful to heed the warning:

The function fn should not retain any reference to the data b outside of the scope of the function.

If there's no objection, for API symmetry, we should add a Bytes method to AttributeDecoder.

netlink: blocking Receive prevents Send

I encountered this while using the nfqueue subsystem. Given a var nlconn *netlink.Conn:

One goroutine (henceforth "the producer") does approximately this:

for {
	msgs, err := nlconn.Receive()
	if err != nil {
		// handle	
	}
	ch <- msgs
}

It sends the messages to a channel to fan them out to workers. The workers do approximately this:

for msgs := range ch {
	for _, msg := range msgs {
		reply := process(msg)
		nlconn.Send(reply)
	}
}

To set the stage for the pseudo-deadlock, assume the producer calls Receive. The request travels down to the lockedNetNSGoroutine associated with the connection, it is sent on g.funcC.

So far so good. The lockedNetNSGoroutine receives the request, runs it, and blocks here.

Meanwhile, a worker has finished processing a message it was handed previously, and wants to send a reply. It calls Send, which blocks here because the lockedNetNSGoroutine is busy waiting for the blocking call to recvmsg to return, and is not servicing the channel.

I have refactored my code such that all the processing happens synchronously for now, so instead of a producer and a number of workers, I have only this:

for {
	msgs, err := nlconn.Receive()
	if err != nil {
		// handle
	}

	for msgs := range ch {
		for _, msg := range msgs {
			reply := process(msg)
			nlconn.Send(reply)
		}
	}
}

This works, and I don't think it is prohibitively expensive in terms of performance. It may even be faster than fanning out. I haven't measured anything yet, since my project is in a prototype stage. There may also be the possibility for me to use two different *netlink.Conns bound to the same netfilter queue, one for receiving and one for sending. I don't know if this works: I have not tried it yet.

I thought about potential solutions for a little while, but I couldn't figure out any good ones. Therefore, I have filed this bug for discussion. Can something be done here? Should we at least document this? While debugging the issue, I was stumped for a while, since Receive and Send both acquire a read lock (!), and it seemed absurd to me that they'd be mutually exclusive in a deeper sense, until I saw the stack traces of the goroutines involved.

Thanks,
Andrei

netlink: integration tests make for slow development cycles

$ go test -count 5 .
ok      github.com/mdlayher/netlink     14.789s

It'd be nice to speed this up a bit through some combination of t.Parallel and observing go test -short. We do check os.Getpid() in one of the integration tests which will not work when tests are running in parallel and can have different netlink header PIDs due to multiple sockets being open.

netlink: should use SOCK_CLOEXEC when creating file descriptors

Currently, when a program using package netlink creates a child process, the child process inherits netlink socket file descriptors, because we don't create them with SOCK_CLOEXEC, and we don't mark them as close-on-exec in other ways iether. This probably doesn't matter very much in practice, but I think we should still try to get it right regardless.

I propose that we mirror the standard library net package behavior when creating sockets: see net.sysSocket and syscall.ForkLock.

I'm going to send a PR shortly.

netlink: Close does not unblock concurrent Receive operation

Similar to #136, though the use-case is different.
I would want to listen for any IP Route changes so I use blocking Receive() for multicast messages. I cannot use timeout on it because than there is a chance to miss events while restarting the Receive cycle.
In order to break the Receive (i.e. on graceful exit), I close the Conn on another goroutine, which appears to be the only way to do it.
The problem is that Receive in it's core read function acquires the R side of the RWMutex:

s.mu.RLock()

Close, however wants to acquire the W side here:
s.mu.Lock()

which is not possible until the R(eader) side is freed.
This results in a deadlock, until something is received on the socket, which is not deterministic.
I don't have a solution in mind, but maybe close function should lock the W side of the mutex only after calling the close syscall, ensuring all blocking operations are woke up (and so releasing the lock). If other parallel operations are handling the errors appropriately, it might not cause problems if s.closed temporarily appears to be false, while the socket is already closed.

netlink: expose raw io.ReadWriter from Conn

I received an email about someone trying to use this package to deal with NETLINK_KOBJECT_UEVENT, and it appears that this family sends and receives raw strings instead of netlink messages.

We should try to expose a method to access the socket directly as an io.ReadWriter for these advanced use cases. Even if a family doesn't use messages, it can still be useful to have the multicast group and BPF functionality

I'm not sure what the API should be, or how this would fit with nltest yet.

Network namespace support

While writing an integration test for nfconntrack, I tried using https://github.com/vishvananda/netns to create a temporary network namespace to run the test in. Unfortunately, I found out it's currently not possible to do this, because netlink spawns its own internal goroutine outside of the control of the user.

Setting the main goroutine to a netns is ineffective, since the internal goroutine calls runtime.LockOSThread(), which creates (or uses) another pthread to pin itself to.

I suggest we:

  • finally deprecate NoLockThread bool, it's been the default for a while now
  • implement a minimal subset of the syscall functionality required to make the goroutine get and set its own netns
  • pass *Config to newSysSocket(), and give it the NetNS int member

This way, the user can use whatever library they want to create a netns, obtain a handle, etc.

Improve behaviour of SetReadBuffer / SetWriteBuffer

Hi,
after dealing with netlink socket's buffer size I compared mdlayher/netlink's implementation of SetReadBuffer with libnfnetlink's nfnl_rcvbufsiz and found an improvement:

SetReadBuffer uses a single syscall with SO_RCVBUF to set the buffer size.
In contrast libnfnetlink's nfnl_rcvbufsiz uses SO_RCVBUFFORCE with a fallback to SO_RCVBUF if that fails.

This difference is important if the user requests a buffer size that is larger than the system's limit on buffer sizes. SO_RCVBUFFORCE ignores this limit if the user is root (or has CAP_NET_ADMIN set), while SO_RCVBUF fails (see man socket for details). In my use case, this difference would save me from tampering with global limits if I need a large netlink buffer.

Implementing SetReadBuffer with SO_RCVBUFFORCE will allow users to fully use their privileges without limiting the abilities of unprivileged users. Given that buffer sizes are typically changed only once performance impact should be negligible.

If you approve this change I'll try to implement this and send you a PR.

forward messages besides NLM_F_MULTI

Hi,
I've tried, to receive statistics from the conntrack subsystem.
Unfortunately, the first returned message has the flag NLM_F_MULTI. Because of this, there are only blocking recvmsg events and the actuall message is never returned.

Below, I've written a piece of code, which reproduces this issue:

package main

import (
	"fmt"

	"github.com/mdlayher/netlink"
	"github.com/mdlayher/netlink/nlenc"

	"golang.org/x/sys/unix"
)

func putExtraHeader(familiy, version uint8, resid uint16) []byte {
	buf := make([]byte, 2)
	nlenc.PutUint16(buf, resid)
	return append([]byte{familiy, version}, buf...)
}

func main() {
	con, err := netlink.Dial(unix.NETLINK_NETFILTER, nil)
	if err != nil {
		fmt.Println("Could not open socket to NETLINK_NETFILTER")
		return
	}

	data := putExtraHeader(unix.AF_UNSPEC, unix.NFNETLINK_V0, 0)

	req := netlink.Message{
		Header: netlink.Header{
			Type: netlink.HeaderType( 0x1 << 8 | 5),
			Flags: netlink.HeaderFlagsRequest | netlink.HeaderFlagsDump,
		},
		Data: data,
	}
	con.Execute(req)
}

If you run this peace of code with strace, you see the request is sent to the kernel and then follows a blocking recvmsg event.

PING/PONG example not working

Hello, I'm trying to get a simple PING/PONG request/response working but I get an error type returned. Is there anything obvious that I'm missing?

package main

import (
	"fmt"
	"os"

	"github.com/mdlayher/netlink"
	"golang.org/x/sys/unix"
)

func main() {
	// message to send
	ping := []byte("PING")

	con, err := netlink.Dial(unix.NETLINK_ROUTE, &netlink.Config{})
	if err != nil {
		fmt.Printf("Dial failed: %v\n", err)
		os.Exit(1)
	}

	msg := netlink.Message{
		Header: netlink.Header{
			Type:  0,
			Flags: netlink.HeaderFlagsRequest | netlink.HeaderFlagsAcknowledge,
		},
		Data: ping,
	}

	// Send the message
	sm, err := con.Send(msg)
	if err != nil {
		fmt.Printf("Error on send: %v\n", err)
		os.Exit(1)
	}
	fmt.Printf("> %+v\n", sm)

	// Receive messages
	rm, err := con.Receive()
	if err != nil {
		fmt.Printf("Error on recv: %v\n", err)
		os.Exit(1)
	}
	for i, mes := range rm {
		fmt.Printf("< %v: %+v\n", i, mes)
	}
}

Running it I get:

> {Header:{Length:20 Type:unknown(0) Flags:request|acknowledge Sequence:2596996163 PID:502} Data:[80 73 78 71]}
< 0: {Header:{Length:36 Type:error Flags:0 Sequence:2596996163 PID:502} Data:[0 0 0 0 20 0 0 0 0 0 5 0 67 4 203 154 246 1 0 0]}

I thought the first 4 bytes of the data payload were to contain the error code in the event of an error, but here they are all zeros.

Invalid sequence number for dump messages

Hi,

I'm using genetlink and netlink, but my feeling is that the issue is more generic than genetlink, so I'm putting the issue here.

I have this code:

	c, err := genetlink.Dial(nil)
	if err != nil {
		log.Fatalf("dial generic netlink: %v", err)
	}
	defer c.Close()

	family, err := c.GetFamily("NCSI")
	if err != nil {
		log.Fatalf("get NCSI netlink family: %v", err)
	}
	// [ .. ]
	for {
		req := genetlink.Message{
				Header: genetlink.Header{
					Command: NCSI_CMD_PKG_INFO,
					Version: family.Version,
				},
			Data: ed,
		}

		msgs, err := c.Execute(req, family.ID, netlink.HeaderFlagsRequest |  netlink.HeaderFlagsDump)
		if err != nil {
			log.Fatalf("execute NCSI dump: %v", err)
		}

		log.Printf("got %v replies", len(msgs))

		for _, m := range msgs {
			ad, err := netlink.NewAttributeDecoder(m.Data)
			if err != nil {
				log.Fatalf("failed to create attribute decoder: %v", err)
			}
			for ad.Next() {
				if ad.Type() == NCSI_ATTR_PACKAGE_LIST {
					handleNcsiPackageList(ad.Bytes())
				}
			}
		}
		time.Sleep(5 * time.Second)
	}

And that results in the first round being processed OK, but the second crashing with:

1970/01/01 00:00:13 execute NCSI dump: mismatched sequence in netlink reply

Adding some traces in the library and the kernel results in:

1970/01/01 00:00:13 nextSequence called, a4d861bd is the new value
1970/01/01 00:00:13 MarshalBinary working on sequence a4d861bd
[   13.147538] DEBUG: netlink_rcv_skb sequence is a4d861bd
[   13.153799] DEBUG: ncsi_pkg_info_all_nl sequence is a4d861bc
[   13.159327] DEBUG: ncsi_pkg_info_all_nl sequence is a4d861bc
1970/01/01 00:00:13 request has sequence a4d861bd
1970/01/01 00:00:13 reply has sequence a4d861bc
1970/01/01 00:00:13 expected a4d861bd, got a4d861bc
1970/01/01 00:00:13 execute NCSI dump: mismatched sequence in netlink reply

These are the traces I added in the kernel:

diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 82e6edf9c5d9..f0eb130631d5 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -216,6 +216,8 @@ static int ncsi_pkg_info_all_nl(struct sk_buff *skb,
 	void *hdr;
 	int rc;
 
+  printk("DEBUG: ncsi_pkg_info_all_nl sequence is %08x\n", cb->nlh->nlmsg_seq);
+
 	rc = genlmsg_parse(cb->nlh, &ncsi_genl_family, attrs, NCSI_ATTR_MAX,
 			   ncsi_genl_policy, NULL);
 	if (rc)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 13a203157dbe..7d82aa274b45 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2448,6 +2448,8 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *,
 		if (!(nlh->nlmsg_flags & NLM_F_REQUEST))
 			goto ack;
 
+    printk(KERN_ERR "DEBUG: netlink_rcv_skb sequence is %08x\n", nlh->nlmsg_seq);
+
 		/* Skip control messages */
 		if (nlh->nlmsg_type < NLMSG_MIN_TYPE)
 			goto ack;

These are in the netlink library:


diff --git a/conn.go b/conn.go
index d2dffb3..b7d987f 100644
--- a/conn.go
+++ b/conn.go
@@ -3,6 +3,7 @@ package netlink
 import (
        "errors"
        "io"
+       "log"
        "math/rand"
        "os"
        "sync/atomic"
@@ -500,17 +501,22 @@ func (rc *rawConn) Write(_ func(fd uintptr) (done bool)) error { return errSysca
 // nextSequence atomically increments Conn's sequence number and returns
 // the incremented value.
 func (c *Conn) nextSequence() uint32 {
-       return atomic.AddUint32(c.seq, 1)
+       n := atomic.AddUint32(c.seq, 1)
+       log.Printf("nextSequence called, %08x is the new value", n)
+       return n
 }
 
 // Validate validates one or more reply Messages against a request Message,
 // ensuring that they contain matching sequence numbers and PIDs.
 func Validate(request Message, replies []Message) error {
+       log.Printf("request has sequence %08x", request.Header.Sequence)
        for _, m := range replies {
                // Check for mismatched sequence, unless:
                //   - request had no sequence, meaning we are probably validating
                //     a multicast reply
+               log.Printf("reply has sequence %08x", m.Header.Sequence)
                if m.Header.Sequence != request.Header.Sequence && request.Header.Sequence != 0 {
+                       log.Printf("expected %08x, got %08x", request.Header.Sequence, m.Header.Sequence)
                        return errMismatchedSequence
                }
 
diff --git a/message.go b/message.go
index 8cd514d..3197ffd 100644
--- a/message.go
+++ b/message.go
@@ -3,6 +3,7 @@ package netlink
 import (
        "errors"
        "fmt"
+       "log"
 
        "github.com/mdlayher/netlink/nlenc"
 )
@@ -199,6 +200,7 @@ func (m Message) MarshalBinary() ([]byte, error) {
        nlenc.PutUint16(b[4:6], uint16(m.Header.Type))
        nlenc.PutUint16(b[6:8], uint16(m.Header.Flags))
        nlenc.PutUint32(b[8:12], m.Header.Sequence)
+       log.Printf("MarshalBinary working on sequence %08x\n", m.Header.Sequence)
        nlenc.PutUint32(b[12:16], m.Header.PID)
        copy(b[16:], m.Data)

It looks like the kernel passes the old sequence number from my previous request to the dump function when called the second time.

So this looks like either in order of probability:

  • My code is doing something I'm not allowed to, or I'm not handling the dump correctly
  • There is a bug in the netlink library when doing multiple dumps on a single connection
  • The NCSI netlink family in the kernel is bugged

Do you have any idea? I've been scratching my head at this for a few hours now.

Implementation question: netlink_netfilter

I'm trying to pull conntrack stats (basically replicating conntrack -S) from netfilter using this library, and have some questions about how to best go about it. Right now I'm just using netlink, not genetlink, is that right? Main problem right now is how to handle the returned data.

(The C implementation is found here, and that is what I've based my work so far on)

Basically, after removing trial/error and debugging stuff, the code boils down to this:

	conn, err := netlink.Dial(12, nil)
	if err != nil {
		log.Fatal(err)
	}
	defer conn.Close()

	req := netlink.Message{
		Header: netlink.Header{
			Flags: netlink.HeaderFlagsRequest | netlink.HeaderFlagsDump,
			Type:  (1 << 8) | 4, // From conntrack.c, nfct_mnl_nlmsghdr_put, nlmsg_type = (subsys << 8) | type
		},
		Data: []byte{0, 0, 0, 0}, // Not sure why I have to do this, but conntrack sends it like this, and if omitted, Receive hangs
	}

	msgs, err := conn.Execute(req)
	if err != nil {
		log.Fatalf("failed to execute request: %v", err)
	}

	for _, m := range msgs {
		log.Infof("%+v", m.Data)
	}

This works, and I get back the correct data (from inspecting the byte array returned). However, doing UnmarshalBinary on the messages fails. Not sure if that is to be expected?

Running strace conntrack -S, I can see the basic structure of the messages:

{
	msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 
	msg_namelen=12,
	msg_iov=[
		{
			iov_base=[
				{
					{len=92, type=NFNL_SUBSYS_CTNETLINK<<8|IPCTNL_MSG_CT_GET_STATS_CPU, flags=NLM_F_MULTI, seq=1538077407, pid=13359},
					{
						nfgen_family=AF_UNSPEC, version=NFNETLINK_V0, res_id=htons(0), [
							{{nla_len=8, nla_type=0x2}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0x4}, "\x00\x00\x00\x31"},
							{{nla_len=8, nla_type=0x5}, "\x00\x03\x4c\x0c"},
							{{nla_len=8, nla_type=0x8}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0x9}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xa}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xb}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xc}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xd}, "\x00\x00\x5a\xd2"}
						]
				},
				{
					{len=92, type=NFNL_SUBSYS_CTNETLINK<<8|IPCTNL_MSG_CT_GET_STATS_CPU, flags=NLM_F_MULTI, seq=1538077407, pid=13359},
					{
						nfgen_family=AF_UNSPEC, version=NFNETLINK_V0, res_id=htons(1), [
							{{nla_len=8, nla_type=0x2}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0x4}, "\x00\x00\x00\x1c"},
							{{nla_len=8, nla_type=0x5}, "\x00\x03\x46\xda"},
							{{nla_len=8, nla_type=0x8}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0x9}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xa}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xb}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xc}, "\x00\x00\x00\x00"},
							{{nla_len=8, nla_type=0xd}, "\x00\x00\x59\xec"}
						]
				},
				// ... more elements
			],
			iov_len=4096
		}
	],
	msg_iovlen=1,
	msg_controllen=0,
	msg_flags=0
}

So each message is an array. Is there something built-in to handle this, or should I just build some manual deserialization (not too hard, if required)?

Also, even though this seems to be a read-only operation (get stats), I get permission denied if I don't run as root. Is this expected, or am I doing something wrong?

netlink: consider addition of a proper Error type

@acln0 left this excellent comment on #123:

Throughout the standard library, errors from system calls are decorated with additional details, e.g. os.SyscallError, net.OpError. Package netlink doesn't do any of this at the moment, as far as I can tell. It seems to return syscall.Errno values up to the callers.

Perhaps this is something worth thinking about. For example, when does sendmsg(2) fail on a netlink socket? If and when it does, is the syscall.Errno value enough for the caller to be able to diagnose the cause of the error?

This circles back to the netlink.Error naming issue I mentioned above. Is there any point in having a type that captures operational errors and decorates them with additional context? If there is, what would the type be named? Would it be netlink.Error? Or perhaps netlink.OpError?

Personally, all the errors I've gotten while using netlink have been in the form of netlink error messages, rather than system call errors from operations such as calling conn.Send, so I haven't had this problem. That being said, I have not used netlink in very advanced ways, so perhaps I have missed something. But having noticed this detail about error handling while I was working on the code, I thought I would at least bring it up for discussion.

Thus far I've been pretty lax about which errors this package should expose. I believe we could do a lot better with the addition of a proper Error type, which not only exposes the underlying system call error, but perhaps some trace points for better diagnosis of problems than a plain EINVAL or similar. It'd also be worth considering the error values proposal if that work ends up landing in Go 1.13, so that we can support nicely formatted, wrap/unwrap-able errors.

Receive() returning errno -1 for proc connector messages

So 225b7dc seems to have broken my client code that reads from the linux proc connector interface. Specifically, the switch statement that routes NLMSG_DONE packets to the following conditional:

	if c := nlenc.Int32(m.Data[0:4]); c != success {
		// Error code is a negative integer, convert it into
		// an OS-specific system call error
		return newError(-1 * int(c))
}

I'm prefacing this by saying that I am not a netlink expert by any means. The first part of the payload for a proc connector message is a cb_id struct. Based on my (limited) understanding, for a cn_proc message, that value should be CN_IDX_PROC which is 0x1 Hence, m.Data[0:4] being 0x1 and the function returning -1.

I'm not sure what the reasoning is for parsing the first 32 bits in a NLMSG_DONE packet is, and like I said I'm not a netlink expert, so I could just be doing something wrong too.

The netlink(7) man page says that NLMSG_DONE terminates a multipart message, however, from what I can see, the proc connector interface sends payload data with it:

The first message is an ack from connector, second is a fork() event.

header: netlink.Header{Length:0x4c, Type:0x3, Flags:0x0, Sequence:0x45ad, PID:0x0}, data: [1 0 0 0 1 0 0 0 173 69 0 0 1 0 0 0 40 0 0 0 0 0 0 0 4 0 0 0 223 182 179 251 190 66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
header: netlink.Header{Length:0x4c, Type:0x3, Flags:0x0, Sequence:0x5f19, PID:0x0}, data: [1 0 0 0 1 0 0 0 25 95 0 0 0 0 0 0 40 0 0 0 1 0 0 0 5 0 0 0 94 140 11 51 191 66 0 0 77 88 0 0 77 88 0 0 53 98 0 0 53 98 0 0 0 0 0 0 0 0 0 0]

incorrect length for attributes with NLA_F_NESTED flag set

	if e.ToData > 0 {
		attrs = append(attrs, netlink.Attribute{ Type: unix.NLA_F_NESTED | unix.NFTA_RANGE_TO_DATA, Data: []byte{}})
		attrs = append(attrs, netlink.Attribute{ Type: unix.NFTA_DATA_VALUE, Data: binaryutil.BigEndian.PutUint32(e.ToData)})
	}
	data, err := netlink.MarshalAttributes(attrs)
	if err != nil {
		return nil, err
	}

Generates this sequence of bytes...

 0x04 0x00 0x04 0x80 0x08 0x00 0x01 0x00 0x07 0xee 0x00 0x00 

But expected:

 0x0c 0x00 0x04 0x80 0x06 0x00 0x01 0x00 0x07 0xee 0x00 0x00

Since it is a nested attribute, the total length should be 12 - 0x0c and then the actual data is 6.

netlink: flappy test TestIntegrationConnTimeout

--- FAIL: TestIntegrationConnTimeout (0.00s)
    conn_linux_gteq_1.12_integration_test.go:43: timeout did not fire

Should be easy enough to investigate, but this is a nuisance and should be fixed.

netlink: failed to unmarshal response

hi,
I try to get tcp sockets status, so I have changed some code with example_test.go, made a new msg(netlink.Message) with Data(struct inet_diag_req). I define a inet_diag_req struct like C code.
The ss command send data with inet_diag_req

	req.nlh.nlmsg_len = sizeof(req);
	req.nlh.nlmsg_type = socktype;
	req.nlh.nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST;
	req.nlh.nlmsg_pid = 0;
	req.nlh.nlmsg_seq = 123456;
	memset(&req.r, 0, sizeof(req.r));
	req.r.idiag_family = AF_INET;
	req.r.idiag_states = f->states;
	iov[0] = (struct iovec){
		.iov_base = &req,
		.iov_len = sizeof(req)
	};
	msg = (struct msghdr) {
		.msg_name = (void*)&nladdr,
		.msg_namelen = sizeof(nladdr),
		.msg_iov = iov,
		.msg_iovlen = f->f ? 3 : 1,
	};
	if (sendmsg(fd, &msg, 0) < 0)
		return -1;

But failed.

failed to unmarshal response: not enough data to create a netlink message

and get some error in dmesg

SELinux: unrecognized netlink message: protocol=4 nlmsg_type=0 sclass=32

Do you have any suggestion?

netlink: TestLinuxConnIntegrationConcurrent occasionally fails

After running this command approximately 5 times:

[zsh|matt@nerr-2]:~/src/github.com/mdlayher/netlink 0 *(bpf) ± go test -tags=integration ./...
panic: failed to execute request: mismatched sequence in netlink reply

goroutine 75 [running]:
github.com/mdlayher/netlink.TestLinuxConnIntegrationConcurrent.func2(0xc4200e44e0, 0x2710, 0xc4200b6710)
        /home/matt/src/github.com/mdlayher/netlink/conn_linux_test.go:317 +0x2a2
created by github.com/mdlayher/netlink.TestLinuxConnIntegrationConcurrent
        /home/matt/src/github.com/mdlayher/netlink/conn_linux_test.go:351 +0x2f4
FAIL    github.com/mdlayher/netlink     0.115s
?       github.com/mdlayher/netlink/cmd/genllist        [no test files]
?       github.com/mdlayher/netlink/cmd/nl80211demo     [no test files]
?       github.com/mdlayher/netlink/cmd/nlecho  [no test files]
?       github.com/mdlayher/netlink/cmd/nlmcast [no test files]
ok      github.com/mdlayher/netlink/genetlink   0.003s
ok      github.com/mdlayher/netlink/nlenc       0.001s

I've seen this in CI a couple times as well. Will try to get to the bottom of it.

netlink: investigate Go 1.11 non-blocking file runtime poller support

I hacked up a local version of this package that is able to use the runtime network poller support in Go 1.11, but I don't see any reasonable way to be able to do the lower level system call flags and out-of-band data passing that we may require in this package.

I will leave this issue open for tracking, but we may have to implement epoll support ourselves to get the required level of flexibility.

netlink: towards using the network poller in 1.12

Hello.

This CL has landed, and will be included in Go 1.12. I think this opens an avenue for package netlink to make use of the poller.

I've investigated the code, and I believe the change can be made in a backwards-compatible manner, such that package netlink keeps working gracefully for < 1.12 users, and includes the new features for 1.12 and later.

In terms of changes to the public API of the package, no backwards incompatible changes would be made, but the set of patches would add SetDeadline, SetReadDeadline and SetWriteDeadline methods to Conn.

In terms of the internals of the package, making this change requires a pretty major refactor of the bits in conn_linux.go: sysSocket needs to change from using an integer file descriptor to an *os.File, and operations need to use syscall.RawConn when making system calls. A compatibility shim for pre-1.12 would also have to be written.

I am willing to commit to doing this work, provided that @mdlayher is open to the idea. I filed this bug in order to gauge interest, ask for a green light, and track progress.

Thanks,
Andrei

Closing a Netlink socket with ongoing Receive() call

Hi Matt

I'm trying to cleanly terminate a Netlink socket listening on a multicast group.

func main() {
	conn, err := netlink.Dial(0xc, nil) // NETLINK_NETFILTER
	if err != nil {
		log.Fatal(err)
	}

	err = conn.JoinGroup(10) // Non-existent multicast group (Receive() will never return)
	if err != nil {
		log.Fatal(err)
	}

	recvChan := make(chan []netlink.Message)

	// Start a blocking receive
	go func() {
		recv, err := conn.Receive()
		if err != nil {
			log.Fatal(err)
		}
		recvChan <- recv
	}()

	// Make sure goroutine enters Receive()
	// time.Sleep(50 * time.Millisecond)

	err = conn.Close()
	if err != nil {
		log.Fatal(err)
	}

	log.Println(<-recvChan)
}

This uncovers two issues:

  1. A race condition: running this program multiple times in sequence yields different results. One is a panic:
panic: send on closed channel
goroutine 4 [running]:
github.com/mdlayher/netlink.(*sysSocket).do(0xc4200bc0a0, 0xc4200f6000)
    /home/timo/go/src/github.com/mdlayher/netlink/conn_linux.go:356 +0xa8

.. that can be avoided by closing the channel after the WaitGroup merges. This causes a deadlock in the test suite, so this might invalidate some assumptions you've made in other places.

func (s *sysSocket) Close() error {
	...
	s.wg.Wait()
	close(s.funcC)
  1. Receive() is non-interruptible. (uncomment the Sleep() in the example)

If there's no valid Netlink message coming in (like in the example above), Receive() will hang forever. I've tried closing the fd from the main thread by executing unix.Close() directly instead of queueing it with s.do(), but this (expectedly) doesn't interrupt the syscall either. It will only yield when a message is read from the socket.

I know this is not directly an issue with the library itself, but it would be pleasant to have a solution or workaround. Is there anything you could think of? A quick search hinted that syscalls can be interrupted by signals, not sure if this is something we could implement.

Have Conn.Execute() wait for NLM_F_ACK when the request asked for one

When making a netlink request, the caller may ask for an ACK by setting netlink.Acknowledge in the message header.

Typically this is done for requests which don't return data, for example commands. In this case the acknowledge usually contains an error code indicating the status of the command.

For information requests (e.g. retrieving state), an ACK is generally not asked for. This is because the information request will (almost) always result in a response from the kernel.

If an ACK is asked for with an information request, the netlink.Conn instance will lose track of sequence numbers, as the ACK arrives as a separate message.

This paste of strace output demonstrates what happens:

https://pastebin.com/QmFudvru

The message with sequence number 241523028 is an information request which sets NLM_F_ACK. Two responses are received for this request: the first on line 15 of the paste, a 192 byte message; and the second on line 18, the ACK message.

Note that there is a further sendmsg call interleaved between these messages: this is because Conn.Execute() treats the 192 byte message as the response for request 241523028, and returns to the caller. A subsequent request with sequence number 241523029 then receives the ACK for 241523028, and Validate fails. This state persists for the remainder of the lifetime of the Conn instance.

Since Conn.Execute has visibility of the request flags, it could be extended to wait for the ACK (scratch code for the purposes of demonstration only!):

func (c *Conn) Execute(message Message) ([]Message, error) {
    c.mu.Lock()    
    defer c.mu.Unlock()

    req, err := c.lockedSend(message)    
    if err != nil {    
        return nil, err    
    } 

    replies := []Message{}

again:
    rx, err := c.lockedReceive()
    if err != nil {
        return nil, err
    }

    if err := Validate(req, rx); err != nil {
        return nil, err
    }

    replies = append(replies, rx...)

    if (message.Header.Flags & Acknowledge) != 0 {
        var gotAck bool
        for _, m := range rx {
            if (m.Header.Flags & Acknowledge) != 0 {
                gotAck = true
            }
            if !gotAck {
                goto again
            }
        }
    }
    return replies, nil
}

netlink: performance bottleneck due to single goroutine in high performance applications

In scenarios where a locked OS thread isn't necessary, is the goroutine in sysSocket.write/read still required? Perhaps I'm missing a subtlety of M:N with the use of Recvmsg and netlink, but my expectation was that with the proper descriptor any goroutine could read correctly from that socket if the netns is consistent. Is that not the case?

I ask because in high traffic systems I'm seeing a fair amount of go runtime scheduler time in profiles when GOMAXPROCS>1 in go 1.12 due to the forced go routine change, and in testing in lightly loaded setups with a 100-200 netlink multicast event per second rate I saw a 5-10% CPU improvement if the goroutine is skipped (some from not needing to close the channel or defer). I am not a socket expert of course in this case so I figured it was easier to ask.

Feature request: batch mode

Some nftables commands (e.g. NFT_MSG_NEWTABLE, see https://github.com/torvalds/linux/blob/a048a07d7f4535baa4cbad6bc024f175317ab938/net/netfilter/nf_tables_api.c#L5667) need to receive a batch of messages, i.e. message(s) which are enclosed in begin/end messages with extra_header.res_id NFNL_MSG_BATCH_BEGIN and NFNL_MSG_BATCH_END, respectively. These messages must then be sent in a single sendmsg syscall.

However, the netlink package currently only offers a Send method on type Conn, which serializes a single message.

To implement batching in my nftables package, I’d need a SendMessages (or similar) method, which would serialize multiple messages at once.

Assuming the feature request makes sense to you, would you like to add this yourself, or would you prefer a pull request?

nlenc: support uint8 encoding and decoding in nlenc

When we create a Data in Attribute, we usually need to encode a uintN
into a byte slice. However, as nlenc does not support uint8 encoding and
decoding, we have to create similar function for only uint8 in the
code. We would like to make our code consistent, so will be happy if
nlenc supports this.

netlink: failed to build the code

hi,I tried to compile my go program, but something went wrong with netlink. Can you help me see why?

go test *.go

github.com/mdlayher/netlink

../../gopath/src/github.com/mdlayher/netlink/conn_linux.go:538:13: s.fd.SetDeadline undefined (type *os.File has no field or method SetDeadline)
../../gopath/src/github.com/mdlayher/netlink/conn_linux.go:542:13: s.fd.SetReadDeadline undefined (type *os.File has no field or method SetReadDeadline)
../../gopath/src/github.com/mdlayher/netlink/conn_linux.go:546:13: s.fd.SetWriteDeadline undefined (type *os.File has no field or method SetWriteDeadline)

netlink: release v1.0.0

Once 1.12 hits and we're able to land runtime network poller support with proper timeouts, I think it's finally time to embrace semver and Go modules, and tag v1.0.0.

Before we do that, I'm debating if it's worth revising some previous interface decisions, such as:

  • changing signature of Config.Groups and Conn.Join/LeaveGroup to pass group int rather than group uint32

This is passed as a 32-bit C int to the kernel, so Go's 64-bit int on 64-bit systems would need a sanity check. int is generally a good default and the most common integer type in Go by far; uint32 requires an explicit cast unless you're using an untyped constant (which is generally the case).

  • unexporting or removing Message.Marshal/UnmarshalBinary

If you're importing this package, you're almost certainly using Conn which takes care of these internally.
We could also do faster and more clever things if we didn't have to worry about allocating for individual messages, or making a copy of byte slices during unmarshaling.

DONE: consider removing HeaderType and HeaderFlags prefixes from typed constants

I'm admittedly a little iffy on this, but in recent projects I tend to prefer more succinct names like netlink.Request, even if the type remains Request HeaderFlags = 1. Is there a possibility of conflict in header types/flags with other current or future exported identifiers? Would this be too confusing in calling code?

  • finding another way to hide/unexport the Socket type and NewConn constructors

These are really only meant for nltest; I had considered doing unsafe //go:linkname at one point but can't remember why I didn't go with it. It's not great, but it might be better than adding generally useless stuff to the exported 1.0.0 API.

DONE: - consider dropping Conn.ReadWriteCloser

This is only used in github.com/mdlayher/kobject as far as I know, and I think that its use there could be superseded by Conn.SyscallConn and methods there.


I think that about covers my thoughts. I know it's a lot, and I don't expect others to have strong opinions on all or any of these, but I would appreciate your feedback. In addition, if there are other things you think are worth revisiting for 1.0.0, please do let me know.

I'm going to tag some of the regular contributors and users of this package in hopes that folks can help me work things out, although I encourage feedback from all who come across this issue.

/cc @florianl @ti-mo @terinjokes @jsimonetti @stapelberg @acln0

functionality for setting socker opts

There should be a function for setting socket opts more general than for example JoinGroup. The use case for me is that I need t oset options from the socket options list from the netlink man(7). Please tell me if there is an alternative that already exists in the library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.