Giter Club home page Giter Club logo

Comments (7)

CodeFetch avatar CodeFetch commented on August 14, 2024

I remember having looked at this with lemoer. There was still a performance disadvantage compared to a in-kernel tunnel like WireGuard because of the copying process of the packets between user- and kernelland.

Is there a possibility to introduce something comparable to MSG_ZEROCOPY with io_uring? I've always wondered why Jason Donenfeld, the OpenVPN or tinc team didn't work on exposing the virtualization TAP sockets...

from fastd.

CodeFetch avatar CodeFetch commented on August 14, 2024

Hm... I just had a look at the current state of the code. SOCK_ZEROCOPY has already been implemented.
https://elixir.bootlin.com/linux/v5.8-rc3/source/drivers/net/tap.c#L693
So the only big performance difference between an in-kernel tunnel and an userland one is the additional copying of an skb to an iov. We can't get rid of that one, because skbs can't be forced to be in the user memory region. Actually I expected the impact of the copying to be much lower... By the way @lemoer we've already talked to NeoRaider about io_uring last year on IRC...

from fastd.

CodeFetch avatar CodeFetch commented on August 14, 2024

Here's a proof-of-concept patched fastd branch which lemoer and I created:
https://github.com/CodeFetch/fastd/tree/final

Indeed the syscall overhead can be reduced with io_uring. Unfortunately a kernel version >5.7 is required to allow poll-retry/fastpoll which is crucial for the performance gain.

Furthermore along a number of minor bugs some race conditions seem to occur unless the operations on an individual socket are being hardlinked. This patch works around this issue which introduces a slight performance penalty. It might have been fixed upstream already and needs further testing. I'll open up a pull-request when NeoRaider has reworked the buffer management to reduce the allocation overhead.

from fastd.

CodeFetch avatar CodeFetch commented on August 14, 2024

@NeoRaider are you done with the buffer pool? I've got a commit somewhere where I started to implement a dynamic buffer pool (which might grow if there is a high demand and shrinks when they are not needed anymore). It looks like your changes are compatible. A dynamic buffer pool is needed for getting a good performance for io_uring while keeping a low memory footprint.

BTW... What about introducing shared memory to implement threading support? Have you given it a thought already? I guess with io_uring the crypto performance will become the bottleneck. Is it possible to do the crypto with packets not "in order"? Otherwise I'd at least hope to make use of more cores on the servers with multiple slave processes.

from fastd.

neocturne avatar neocturne commented on August 14, 2024

The new buffer implementation is finished.

I don't understand the question about shared memory - threads always share their memory? Doing packet processing in threads should be fine on multi-core systems (but it will require some careful locking and/or barriers to ensure that no state is changed when the worker threads do not expect it).

I think packet processing for each peer should be serialized to avoid introducing additional reordering (fastd can handle packets reordered by up to 64 sequence numbers, but the transported network protocols may not), but as multi-core systems usually play a central role in a network and are connected to many peers, this could still provide some speedup.

from fastd.

CodeFetch avatar CodeFetch commented on August 14, 2024

@NeoRaider Sorry I meant subprocesses not pthreading - Shared memory between processes. Pthreads wouldn't bring a performance increase I guess, would they? Indeed I aim for making use of multiple cores, which isn't possible with pthreads only, is it?

from fastd.

neocturne avatar neocturne commented on August 14, 2024

Using multiple cores is the main use case of threads. In fact, the Linux kernel does not really distinguish between processes and threads - a thread is just a process that shares its PID, memory, file descriptors, and a few other things with its parent.

Using multiple processes as workers only makes sense when you need to isolate them from each other, for example to contain crashes or security issues. For fastd, multithreading is the way to go: It should be easier to implement for our use case and uses fewer resources (as almost all memory is shared).

from fastd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.