Giter Club home page Giter Club logo

Comments (12)

jamilbk avatar jamilbk commented on June 13, 2024

@AndrewDryga discovered that a simple speedtest from one client floods the metrics even at log level info:

https://cloudlogging.app.goo.gl/bTavBctEz1iXb6VT8

from firezone.

jamilbk avatar jamilbk commented on June 13, 2024

I think we might be instrumenting each client message?

    /// Process the bytes received from a client.
    ///
    /// After calling this method, you should call [`Server::next_command`] until it returns `None`.
    #[tracing::instrument(skip_all, fields(transaction_id, %sender, allocation, channel, recipient, peer), level = "error")]
    pub fn handle_client_input(&mut self, bytes: &[u8], sender: ClientSocket, now: SystemTime) {
        if tracing::enabled!(target: "wire", tracing::Level::TRACE) {
            let hex_bytes = hex::encode(bytes);
            tracing::trace!(target: "wire", %hex_bytes, "receiving bytes");
        }

        match self.decoder.decode(bytes) {
            Ok(Ok(message)) => {
                if let Some(id) = message.transaction_id() {
                    Span::current().record("transaction_id", hex::encode(id.as_bytes()));
                }

                self.handle_client_message(message, sender, now);
            }
            // Could parse the bytes but message was semantically invalid (like missing attribute).
            Ok(Err(error_code)) => {
                self.queue_error_response(sender, error_code);
            }
            // Parsing the bytes failed.
            Err(client_message::Error::BadChannelData(ref error)) => {
                tracing::debug!(%error, "failed to decode channel data")
            }
            Err(client_message::Error::DecodeStun(ref error)) => {
                tracing::debug!(%error, "failed to decode stun packet")
            }
            Err(client_message::Error::UnknownMessageType(t)) => {
                tracing::debug!(r#type = %t, "unknown STUN message type")
            }
            Err(client_message::Error::Eof) => {
                tracing::debug!("unexpected EOF while parsing message")
            }
        };
    }

from firezone.

AndrewDryga avatar AndrewDryga commented on June 13, 2024

Disabled OLTP on staging. Now I can hit 160 Mbps (instead of 10) until the f1-micro CPU is throttled on the relay.

from firezone.

thomaseizinger avatar thomaseizinger commented on June 13, 2024

I think we might be instrumenting each client message?

The span should def be level "debug" I think, not error!

from firezone.

thomaseizinger avatar thomaseizinger commented on June 13, 2024

I'll spend some cycles on this next week :)

from firezone.

thomaseizinger avatar thomaseizinger commented on June 13, 2024

Some benchmarking revealed that we were indeed allocating a lot of unnecessary spans.

The next improvements I can see are allocations for the actual relayed data. Based on the current design, those are expected. There are a number of things we can change here:

  1. If we want to stay in user-space, #4095 would be a first attempt, maybe paired with using mio directly (it is used by tokio under the hood) to dynamically listen on multiple ports.
  2. Move away from user-space and implement the relaying as an eBPF program.
  3. Build something using io-uring to avoid copying between user space and kernel space.

from firezone.

jamilbk avatar jamilbk commented on June 13, 2024

I would say (1) is probably the least risky and might yield learnings or results we can apply to clients and gateways as well which are limited to user-space.

Have we verified where the major bottlenecks are? Surely we can copy packets from user space to kernel space faster than 150 Mbps

from firezone.

thomaseizinger avatar thomaseizinger commented on June 13, 2024

I would say (1) is probably the least risky and might yield learnings or results we can apply to clients and gateways as well which are limited to user-space.

(1) are already the learnings from clients & gateways that I'd like to apply back to the relay 😃

Have we verified where the major bottlenecks are? Surely we can copy packets from user space to kernel space faster than 150 Mbps

As far as I can tell, it is allocations. Also, we currently have queues which isn't ideal I think.

I wasn't able to fully saturate my CPU yesterday and it topped out at about 1GBps. Not sure what the bottleneck is there?

from firezone.

jamilbk avatar jamilbk commented on June 13, 2024

Not sure what the bottleneck is there?

My first hunch would be RAM speed, but you're on a fast system, I would expect more than 1Gbps.

I think with the right profiling approach we can get much faster. @conectado has done some work in this area if you want to pick his brain when he's back from PTO.

topped out at about 1GBps

I'd be curious to know what the CPU wall time was on this vs waiting on IO. That should give us a rough indication of how much of the bottleneck is "our fault". We could also be copying between RAM and CPU multiple times.

So yeah, I guess the consensus is to start optimizing in user-space. We'll need a good grasp of the data patterns to make a good kernel-space implementation anyhow.

from firezone.

thomaseizinger avatar thomaseizinger commented on June 13, 2024

Did some more benchmarking and was able to reach ~ 7GBps locally:

target/release/firezone-relay  46.97s user 75.49s system 325% cpu 37.636 total

Memory usage was at a constant 70MB.

from firezone.

thomaseizinger avatar thomaseizinger commented on June 13, 2024

Did some more benchmarking and was able to reach ~ 7GBps locally:

I am not sure how much I can actually trust these numbers because the benchmarking client I am using (https://github.com/vi/turnhammer) is able to generate the packets but somehow fails to receive them.

from firezone.

thomaseizinger avatar thomaseizinger commented on June 13, 2024

However, I can still generate a flamegraph from this usage and ~50% of our time is spent allocating stuff / moving memory using _memmove_avx512_unaligned_erms.

from firezone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.