Hello, I have been doing some networking tests to compare performanc

Thank you for the responses <a class="user-mention notranslate" data-hovercard-type="u

Much Slower Transfer Rate for HTTP Requests Compared to Native,about nanovms/nanos

Comments (11)

francescolavra commented on June 2, 2024

Yes, using NAT (Qemu usermode) networking can negatively affect performance of network-intensive workloads. You should use TAP networking instead, please refer to the documentation, in the "Bridged Network" section, for info on how to set up TAP networking.
Other possible differences between the native application and the unikernel might be:

the Nginx configuration: are you using the same nginx.conf file (except for any file path changes that might be necessary) for your Linux Nginx server as used in the unikernel?
if your physical machine has more than one CPU, are you pinning your Linux Nginx server process to a single CPU?

from nanos.

eyberg commented on June 2, 2024

we should add some benchmarking documentation but just to re-echo @francescolavra user-mode (local default networking) is horrible for performance - it works fine for normal dev/test work but for anything perf. related - we'd suggest testing on a cloud of your choice (such as GCP or AWS) - where one vm is your unikernel, the other is whatever stock linux you're using (debian, etc.) and then a third one for your wrk program to be all on the same network; there is a lot of other tuning that can be done here but that is the min. baseline that we'd suggest to start with if you want to do your own benchmarking; also when comparing perf both the native and the unikernel should be in a vm (which is why we suggest just deploying to a cloud for the test as it makes easy, reproducible and real)

from nanos.

Jackson-Greene-Curtin commented on June 2, 2024

Thank you for the responses @francescolavra @eyberg - I believe the configuration is the same between both but I will check again to confirm, and I have used taskset to pin the server process to a single CPU. I also tried as a runC container with limited resources and got very similar performance to native.

It seems like the QEMU NAT networking may be the issue here so I will try using bridged mode as suggested. I did attempt to use it earlier for a different test but was running into issues where there was no connection between the unikernels.

From what I can tell using bridged mode will create a bridge (br0 default) and then attach the tap devices (1 per unikernel) to the bridge. How can I then access the unikernel externally in the same network (for example: from another machine running the client)? Wouldn't the bridge also need an ethernet interface attached to it? Maybe I am misunderstanding.

The documentation also states:
"Bridged networking should only be used when you want full control over the networking. Generally speaking this means you are on real hardware, running linux. If you are deploying to the public cloud you will want to use ops normally and NOT setup this as it'l be slow." - maybe this isn't the case anymore?

I understand how using a cloud provider for testing would be ideal, although I am looking to optimise network performance for locally hosted applications.

I'll post any updates on how bridged mode goes if I can get it working.

Thanks for the help

from nanos.

eyberg commented on June 2, 2024

we suggest using the cloud as the cloud effectively sets up the bridge and the taps for you w/out any work on your part; when you do an 'ops image create' it'll create an ami and 'instance create' creates an ec2 instance out of the ami - so it's not the case where you want to create networking on top - you get it for free w/no setup just by doing that instead

your understanding on the bridge is correct - if you had multiple physical machines you either need to bridge the outgoing interface (which will drop the connection at first) or forward stuff in via iptables and bridge via a dummy - both of these options would need to be taken into account for your benching

if your only desire is to benchmark i'd still recommend you try the cloud option as you won't have to deal w/the networking config and the cloud networks can be as good or better than what you might have locally regardless; if that's not what you are trying to go for there might be other options depending on what you are actually trying to do, fyi, if you happen to be on a arm mac you can also use the auto-bridging stuff that is available by building via:

go build -ldflags "-X github.com/nanovms/ops/qemu.OPSD=true -X github.com/nanovms/ops/qemu.MACPKGD=true"

from nanos.

Jackson-Greene-Curtin commented on June 2, 2024

we suggest using the cloud as the cloud effectively sets up the bridge and the taps for you w/out any work on your part; when you do an 'ops image create' it'll create an ami and 'instance create' creates an ec2 instance out of the ami - so it's not the case where you want to create networking on top - you get it for free w/no setup just by doing that instead

your understanding on the bridge is correct - if you had multiple physical machines you either need to bridge the outgoing interface (which will drop the connection at first) or forward stuff in via iptables and bridge via a dummy - both of these options would need to be taken into account for your benching

if your only desire is to benchmark i'd still recommend you try the cloud option as you won't have to deal w/the networking config and the cloud networks can be as good or better than what you might have locally regardless; if that's not what you are trying to go for there might be other options depending on what you are actually trying to do, fyi, if you happen to be on a arm mac you can also use the auto-bridging stuff that is available by building via:
go build -ldflags "-X github.com/nanovms/ops/qemu.OPSD=true -X github.com/nanovms/ops/qemu.MACPKGD=true"

Thanks for your reply @eyberg
We are interested in testing local performance, but we might make an attempt on public cloud to test the difference.

I was able to get the bridge mode working. We discovered that the br0 bridge is recreated on every launch which was ok, but it reassigns the bridge IP on br0 to .66

Network throughput testing initially shows (in MB/s) for Nanos 68/217/235 for 1/5/15 workers respectively. Compared to native Linux with single core 160/966/1006 for 1/5/15 workers respectively.

In your opinion is there anything that we might not be factoring for which could explain this difference? Or is this simply a performance difference between Nanos using lwIP compared to the Linux TCP/IP stack? Has any other network throughput testing been completed other than the Nginx requests per second?

In our testing the Nginx requests per second is superior to Linux when the payload size of the response is small, but eventually the throughput limitation will outweigh that performance advantage (as response size increases). Keen to be shown otherwise though as this may be user error.

from nanos.

eyberg commented on June 2, 2024

the default bridge support in ops is poor as most users deploy to the cloud - it can definitely be improved, nanovms/ops#1171

as for your test - it isn't quite clear in when you are comparing - you are comparing nginx running on a physical debian/ubuntu system? or is that in a vm? how many threads? using kvm or vsphere or ?

it should also be noted that our lwip is not stock lwip, you might also look into go and rust webservers but again - needs to be apples to apples

from nanos.

eyberg commented on June 2, 2024

"I have used taskset to pin the server process to a single CPU. I also tried as a runC container with limited resources and got very similar performance to native." <-- this seems to indicate that you are comparing running nginx not in a vm vs one that is in a vm (the unikernel) which would easily explain the results you are seeing - for a more realistic/fair comparison you'll need to stick that one that isn't in a vm in a linux distro of some kind and run it as a vm

it doesn't make a ton of sense to compare a physical deploy to a virtualized one as nanos can only be ran as a vm and the workloads it can replace are virtualized ones (eg: cloud)

from nanos.

Jackson-Greene-Curtin commented on June 2, 2024

"I have used taskset to pin the server process to a single CPU. I also tried as a runC container with limited resources and got very similar performance to native." <-- this seems to indicate that you are comparing running nginx not in a vm vs one that is in a vm (the unikernel) which would easily explain the results you are seeing - for a more realistic/fair comparison you'll need to stick that one that isn't in a vm in a linux distro of some kind and run it as a vm

it doesn't make a ton of sense to compare a physical deploy to a virtualized one as nanos can only be ran as a vm and the workloads it can replace are virtualized ones (eg: cloud)

I can understand your reasoning for it not being a realistic comparison in most cases as services are generally deployed to a cloud service and will be running in a virtual environment - although my testing / use case is related to on-premise servers that can run services natively or virtualised.

Understandably there will be some overhead in the extra network complexity from virtualisation, but the idea was to get performance as close as (or better in some ways) possible to a runc container on a physical machine with the same resource limitations.

I think my goal is essentially to use the best nanos / qemu networking configuration possible to see if the performance impact is fine considering the increased security.

I am happy to try some testing on cloud environments but it wouldn't be directly related to what I am looking for.

from nanos.

francescolavra commented on June 2, 2024

The performance difference between the unikernel and the native server is most likely due to the fact that with the unikernel the network packets transferred between the guest (Nanos) and the host (Linux) go through the hypervisor (Qemu), which is a user process runnning in the host.
If your host runs a kernel configured with CONFIG_VHOST_NET=y, or if the vhost_net kernel module can be loaded with modprobe vhost_net, then you should be able to get performance close to native if you use vhost-net, which allows bypassing the Qemu user process in the network data transfer between guest and host. Using vhost-net means adding vhost=on to the -netdev tap option in the Qemu command line.

from nanos.

eyberg commented on June 2, 2024

also, I didn't see it mentioned in this but the 1.18 pkg you are using is set to 'worker_processes 1;' (eg: no multiple workers) - I don't know what kind of config you are using for your non-virtualized comparison but if that's not 1 then you'll def. see a (possibly large) diff from that as well; you can compare other types of workloads (such as go, rust webservers) or if you're set on nginx - we recently cut a multi-threaded version of that at https://repo.ops.city/v2/packages/eyberg/nginx-mt/1.25.2/x86_64/show

... at the end of the day though the host will always trump the guest

if your goal is to just do a straight up perf comparison than there are a ton of variables that need to be tightly controlled in a repeatable fashion for your comparison otherwise you are going to get wildly diff results and it's going to be very hard to draw any meaningful conclusions

if on the other hand there is a particular use-case it might be better to drill in on that particular app itself

from nanos.

Jackson-Greene-Curtin commented on June 2, 2024

Thanks for the replies @eyberg @francescolavra. After some more testing it seems that using vhost=on significantly improved the performance compared to the default. When testing against localhost it was still a lot slower (2-3x) but that is to be expected, so no major issue there. I will be looking into SR-IOV soon because it seems like the best solution in terms of performance and not relying on the host (at a software level) to deal with networking.

from nanos.

Much Slower Transfer Rate for HTTP Requests Compared to Native about nanos HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent