Comments (12)
@AndrewDryga discovered that a simple speedtest from one client floods the metrics even at log level info
:
https://cloudlogging.app.goo.gl/bTavBctEz1iXb6VT8
from firezone.
I think we might be instrumenting each client message?
/// Process the bytes received from a client.
///
/// After calling this method, you should call [`Server::next_command`] until it returns `None`.
#[tracing::instrument(skip_all, fields(transaction_id, %sender, allocation, channel, recipient, peer), level = "error")]
pub fn handle_client_input(&mut self, bytes: &[u8], sender: ClientSocket, now: SystemTime) {
if tracing::enabled!(target: "wire", tracing::Level::TRACE) {
let hex_bytes = hex::encode(bytes);
tracing::trace!(target: "wire", %hex_bytes, "receiving bytes");
}
match self.decoder.decode(bytes) {
Ok(Ok(message)) => {
if let Some(id) = message.transaction_id() {
Span::current().record("transaction_id", hex::encode(id.as_bytes()));
}
self.handle_client_message(message, sender, now);
}
// Could parse the bytes but message was semantically invalid (like missing attribute).
Ok(Err(error_code)) => {
self.queue_error_response(sender, error_code);
}
// Parsing the bytes failed.
Err(client_message::Error::BadChannelData(ref error)) => {
tracing::debug!(%error, "failed to decode channel data")
}
Err(client_message::Error::DecodeStun(ref error)) => {
tracing::debug!(%error, "failed to decode stun packet")
}
Err(client_message::Error::UnknownMessageType(t)) => {
tracing::debug!(r#type = %t, "unknown STUN message type")
}
Err(client_message::Error::Eof) => {
tracing::debug!("unexpected EOF while parsing message")
}
};
}
from firezone.
Disabled OLTP on staging. Now I can hit 160 Mbps (instead of 10) until the f1-micro
CPU is throttled on the relay.
from firezone.
The span should def be level "debug" I think, not error!
from firezone.
I'll spend some cycles on this next week :)
from firezone.
Some benchmarking revealed that we were indeed allocating a lot of unnecessary spans.
The next improvements I can see are allocations for the actual relayed data. Based on the current design, those are expected. There are a number of things we can change here:
- If we want to stay in user-space, #4095 would be a first attempt, maybe paired with using
mio
directly (it is used by tokio under the hood) to dynamically listen on multiple ports. - Move away from user-space and implement the relaying as an eBPF program.
- Build something using io-uring to avoid copying between user space and kernel space.
from firezone.
I would say (1) is probably the least risky and might yield learnings or results we can apply to clients and gateways as well which are limited to user-space.
Have we verified where the major bottlenecks are? Surely we can copy packets from user space to kernel space faster than 150 Mbps
from firezone.
I would say (1) is probably the least risky and might yield learnings or results we can apply to clients and gateways as well which are limited to user-space.
(1) are already the learnings from clients & gateways that I'd like to apply back to the relay 😃
Have we verified where the major bottlenecks are? Surely we can copy packets from user space to kernel space faster than 150 Mbps
As far as I can tell, it is allocations. Also, we currently have queues which isn't ideal I think.
I wasn't able to fully saturate my CPU yesterday and it topped out at about 1GBps. Not sure what the bottleneck is there?
from firezone.
Not sure what the bottleneck is there?
My first hunch would be RAM speed, but you're on a fast system, I would expect more than 1Gbps.
I think with the right profiling approach we can get much faster. @conectado has done some work in this area if you want to pick his brain when he's back from PTO.
topped out at about 1GBps
I'd be curious to know what the CPU wall time was on this vs waiting on IO. That should give us a rough indication of how much of the bottleneck is "our fault". We could also be copying between RAM and CPU multiple times.
So yeah, I guess the consensus is to start optimizing in user-space. We'll need a good grasp of the data patterns to make a good kernel-space implementation anyhow.
from firezone.
Did some more benchmarking and was able to reach ~ 7GBps locally:
target/release/firezone-relay 46.97s user 75.49s system 325% cpu 37.636 total
Memory usage was at a constant 70MB.
from firezone.
Did some more benchmarking and was able to reach ~ 7GBps locally:
I am not sure how much I can actually trust these numbers because the benchmarking client I am using (https://github.com/vi/turnhammer) is able to generate the packets but somehow fails to receive them.
from firezone.
However, I can still generate a flamegraph from this usage and ~50% of our time is spent allocating stuff / moving memory using _memmove_avx512_unaligned_erms
.
from firezone.
Related Issues (20)
- Ensure `reconnect` clears all previous backoff timers HOT 1
- One-click installer for DO
- k8s instructions
- Pulumi instructions
- Show instructions in docs for deploying Gateways for different infra
- UX audit tracking issue
- connlib: perform mangling of DNS requests to resolvers that are CIDR resources before we look up the peer HOT 1
- connlib: implement reconnect as "drop all connections and wait for new packets to trigger new ones"
- Allow FIREZONE_TOKEN to point to file HOT 1
- chore(connlib/android): revert possible Android regression from #4788
- Tracking issue for extensions to property-based state machine tests
- techdebt(connlib): use emitted events to update DNS servers in clients
- connlib: unify packet routing between CIDR and DNS resources
- Show warning if admin enters only IPv4 or IPv6 upstream resolvers
- Linux / Windows GUI client user service HOT 4
- Add a new `General` section to Settings
- Allow removing a Resource from a Site when multi-site Resources is not active
- Policy flexibility
- Allow removing Resources and Groups from a Policy
- Add resource to favorites
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firezone.