gopakumarce / r2 Goto Github PK

View Code? Open in Web Editor NEW

217.0 217.0 17.0 12.18 MB

Router in Rust

License: MIT License

Rust 99.68% Thrift 0.07% Shell 0.23% C++ 0.02%

r2's People

Contributors

Stargazers

Watchers

Forkers

davidalphafox kbknapp tgockel cambricorp zhutony isgasho chaosstudygroup brianjd fortitudepub gyrusdentatus suryatmodulus th3p14gu3

r2's Issues

Make packet more "safer"

This was in response to the comments here https://www.reddit.com/r/rust/comments/fr50vl/r2_a_router_in_rust/ by handle https://www.reddit.com/user/BigHandLittleSlap. The primary concern there was handing out the raw mut particle data addresses in the Packet structure as slices to clients. There was another concern about having a shared pool and a suggestion to make pools per thread. In case the link goes inaccessible at some point, the entire discussion pasted here

I don't want to rain on your parade, but really: mutable access to a packet buffer created with "const u8*"? That's a flagrant violation of memory safety and general Rust API design. I don't think even Actix was this brazen with unsafety!

In more general terms, the handling of 9K jumbo packets is disappointing: I've noticed you're splitting these up into multiple little 2K buffers. This is a mistake. The entire point of jumbo packets is to reduce the number of function calls, particularly kernel network I/O calls. Many small buffers will force you to increase the number of system calls, completely negating the benefits of jumbo frames. Conversely, memory requirements for routers is tiny almost by definition because large buffers increase latency, and are paradoxically worse for performance. Typically you don't want to exceed 512KB per port except in special circumstances. Hence, allocating all packet buffers to be big enough to handle 9K jumbo packets isn't that much overhead and is well worth it.

Speaking of performance, I don't see much in the low-level design of your API to enable modern networking such as user-mode networking. Ideally, rewriting something like this in in Rust should at least have a goal of having similar or better performance than, say, the Data Plane Development Kit.

A performance-first design is not optional. A naive design will have less than 1% of the rated throughput of 10 Gbps networking, let alone 25 Gbps or even 100 Gbps, which is already common in the cloud. Even in a home setting this isn't esoteric any more! Notably, WiFi 6 is much faster than 1 Gbps, forcing many relatively cheap routers to have either 5 Gbps or 10 Gbps ports.

Your API layer based on Apache Thrift is... unique. There's a whole field of Software Defined Networking (SDN) that is designed for this very purpose. Ignoring and rolling your own custom thing means that you're missing out on a potentially very valuable integration point that could make this toy project into something useful.

I would be more than happy to rework it with the right approach - what is the right approach towards mutating packet buffers in Rust, pls educate me.

Mutable means mutable. Never try to magically convert anything even vaguely "const" or not "mut" into mut. This is asking for trouble. More accurately, "mut" means unique ownership. If you have a method that provides mutation, then during that time it must have (temporary) unique ownership and/or the equivalent guarantee. Some sort of lease-release, lock-unlock, or other mechanism to guarantee that no other unique copy of the reference is kept. Typically this is implemented with RAII or something similar, but it's got to be enforced by the Rust type system, not doc comments.

The linked method doesn't do this. It just gleefully hands out a mutable buffer reference it got from a const reference using an unsafe block.

Elsewhere in your code you use Arc, which provides this kind of guarantee at runtime, and does so in a thread-safe way as well. However, you mention in several places that ideally your packet processing code runs as a single thread per core. In that case you can use Rc, which isn't thread safe but lower overhead than Arc. However, again, you've marked the various packet buffer structs as Send and Sync, which is a dubious decision at best, because I don't believe these are thread-safe as-is.

Ideally, if you really want to embrace the Rust model, have thread-local buffer pools that are not thread safe. That is, not Send or Sync. This would let consumers safely use your APIs without being able to accidentally send packet buffers across threads. Then instead of Arc, you can use Rc, which is efficient enough to use even in high performance network code to implement safe mutable buffer access using dynamic ownership checks. No unsafe code needed.

Encode your design in the type system, and then utilise the guarantees this affords to optimise it by eliminating the need for locks or interlocked data access. Use the guarantees to allow safety without overhead. Don't cheat by adding unsafe utility methods.

megabits or megabytes?

Note that 512MB is a crazy huge buffer size for a packet router. At a typical 512KB-2MB per port, that's between 256 and 1024 ports! If used for fewer ports, then it would just drive up latency.

A 512MB buffer would make more sense for layer 3 and above, but still... 4GB is nothing on a modern server if you care about performance at all. I just installed a low-end NetScaler appliance with 64GB of memory!

This isn't even touching on subjects like NUMA or non-uniform cache access between cores even on a single socket. For top performance, you really want threads to stay within the available L2 cache, which is typically 512KB to 2MB per core. So in an 8-core system you would want to have about 4-16MB of buffer memory, which is entirely reasonable.

Look at it this way: Why do you want to hold on to the packets? The job of a router is the send them on their way as fast as possible!

Dude.

That's still 3720 buffers per port! Are you seriously saying that it makes sense for a router to hold on to 3720 packets... for nearly 27 milliseconds @ 10 Gbps... before eventually sending them on!?

That's the round-trip time from the east coast to the west coast!

You're literally adding a continent of latency in the name of quality!

27 milliseconds is an eternity, even for file transfers. Saying that's not a big deal is just Wrong with a capital W. No router that adds anywhere near this amount of latency can ever be useful as a general purpose router. Not a home router, not an enterprise router, not a cloud router, not any kind.

Essentially, unless the QoS policy is perfect, then any time such a router has both bulk transfers and latency-sensitive traffic going on concurrently, the latency-sensitive traffic will be queued up along with the bulk traffic and have very poor performance. I've never seen a perfect QoS policy, or even a good one. It's wishful thinking to assume that it's ever implemented in real networks.

Read up about the bandwidth-delay product and router bufferbloat.

I've literally made myself a little career fixing this up for my customers, and my gimmick is tripling application performance mostly by reducing the latency of their software routers or load-balancers by a few hundred microseconds. I do this with a parameter tweaks of their network adapters, host power management settings, and hypervisor tuning. It's like magic. Everything just runs faster. The few times I've had the chance to take this to the next level and turn on SR-IOV and jumbo frames the apps ran like greased lightning.

A rule of thumb is that anything over 1ms rapidly starts degrading the throughput of 10 Gbps networking, but unfortunately it's common to see about 1.3ms even within a data centre. I tend to aim for about 150-300 microseconds, and the ideal goal is sub-100 but hard to achieve without being in control of hardware choices.

PS: The one time I saw a piece of network equipment that advertised big buffers as a feature was these HPE fabric switches designed to be used only for a dedicated "backup and replication" network. No other use. They had 4GB of buffer memory total across 48 ports, which is crazy huge.

Back to your threading design: You should look at things like LMAX Disruptor. Roughly speaking, an architecture based on this would look like this:

Each CPU core has a permanently pinned, dedicated I/O worker thread.
Each worker thread uses a Disruptor ring buffer for data it receives and processes.
All packet writes ("mutation") is local to each worker thread
Other threads only send data down the wire to their own NIC, never again mutating packets.
You can have a separate "receive" ring buffer and a "send" ring buffer per core to keep things nice and efficient.
This is a common multi-threading design: Objects/Messages start their life off as mutable and then are "frozen" before being sent to other threads. Windows Presentation Foundation uses this extensively.

It's also easy to implement in Rust, you can use the type system to encode this. E.g.: a "MutablePacket" can have a "fn freeze( MutablePacket packet) -> Packet" function that moves the non-Sync, non-Send MutablePacket into a frozen Send & Sync Packet.

Add support for dpdk

Add VRF support

The route tables and other structures are all in one default global VRF. Add support for multiple VRFs. This has to be done sooner than later - before the code base grows, at which point it will be hard(er)

Thrift autogenerated APIs needs a format fmt()

thrift Exception type should have a "formatter" that prints the error string instead of deafult description(), like below
impl Display for InterfaceErr {
  fn fmt(&self, f: &mut Formatter) -> fmt::Result {
      if let Some(ref why) = (*self).why {
          write!(f, "{:?}", why)
      } else {
          self.description().fmt(f)
      }
  }
}

Integrate Thrift API compilation with cargo build

Today the Thrift APIs are converted to rust code by manually running thrift code generator, figure out a way to tie that into the build system

Provide unique counter names in unit tests

Cargo test can spin up multiple threads for unit tests. And many unit tests use the Counters module to create counter shared memory. And unless all of them provide a unique name, the tests that run in parallel can trash another test's counter memory or even worse can unmap that shared memory and cause a SIGBUS when running tests. So if you see a SIGBUS when running your unit tests and you have not really added unsafe code, its most likely two tests are sharing the same name in Counters::new(). Provide a way for tests to not have to worry about having unique names in Counters::new()

Add perf counter and related utilities to R2

R2 needs by default per node performance counters, and a simple mechanism for a developer to add performance tracking to their code and to be able to dump the counters and start/stop them etc.. from utilities outside R2

Dump logs from external utility

Today the external log dumper utils/r2log just makes an API call to R2, and R2 dumps the logs. The logs are in shared memory, the external utility itself can dump the logs - we should move to that model. But this will need the data in Logger::hash to also be available to the external utility, to be able to interpret the log entries.

Need Punt/Inject architecture

Right now R2 does not have an architecture for handing off packets to an external application or receiving packets from an external application. It obviously will end up being an R2 node that does the packet handoff/accept, but need a good architecture and design around that

Set proper scheduler wakeup time in vectors.wakeup()

Today we set vectors.wakeup() to indicate that the graph has work pending right away. But the scheduler (right now hfsc) can provide indication of an exact time at which the work needs to be done (and not necessarily right away), so use that in vectors.wakeup()

Add a utility to show the graph in a visually appealing format

Sometimes when we add nodes, we want to make sure we added the graph edges the right way, or we want to visualize all possible ways by which our node can be reached etc.. Add a visual graph dump tool

Make Tx driver nodes to handoff to the "main node" via lockfree queues

Tx nodes for an interface are present on every forwarding thread. But really only one thread handles Tx for an interface. So nodes on other threads should handoff packets to the node on that that one thread via lockfree queues

Not all control messages should go to all nodes

Today when control plane does a broadcast(), the graph calls control_msg() of every single node. The nodes that dont have an interest in the message just ignores the message, but its probably better to not call control_msg() on every node and rather have a dictionary of which nodes are interested in which messages

Format the utility cli outputs in a uniform way

Today all utilities print output in their own desired format. At least provide an option to dump the data in say a tabular form

ARP/mac address learning enhancements

Today we just learn by doing ARP, and we dont have mac aging or prevention against mac address flooding etc.. Also we dont learn mac from the source packet (if the source is in the connected subnet). ARP/mac learning needs significant amount of enhancements

At least for the purpose of Encap node learning the mac from Decap, it should be possible to combine both Encap and Decap nodes to one single node. Need to think about this carefully if we want to scale mac address learning

Also the Encap nodes on all threads dont need to learn macs and keep it in hashtables etc.., only the encap on the thread owning the interface needs to learn the mac

Add a mechanism to trace packets through nodes

Tag packets at input based on some filter, and have a way print the path of these packets through nodes in R2

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.