Comments (6)
I've done further testing in response to your assertions.
First, I added a binary mode to the websocketpp, uwebsocket, and Go (golang.org/x/net/websocket) servers and updated the benchmark client to support it. Performance of all servers increased roughly proportionally to the message size reduction (the same message encoded in binary is about 75% the size of the message encoded in binary) but not more. As you mention, a single broadcast from the server will cause 1000's of JSON decodes in the client, but this test indicates that due to using multiple client machines in parallel that was not a limiting factor.
Next, I tested using C++ instead of Go for the benchmark client. In addition, the C++ benchmark does not use a a full websocket implementation; it works directly with libuv. I used your throughput benchmark as the starting point and updated it to run the same test as the Go tool's binary broadcast test. It was indeed able to get higher results than a single Go client, but running the Go tool on multiple machines in parallel produces higher results.
None of the above tests found the dramatic differences your assertions would lead me to expect. However, I think I found something that explains the substantially different results we observe. I noticed the benchmarks for uwebsockets are hard-coded to connect to 127.0.0.1. This could confound the results in two ways. First, the client and server are running on the same machine. So any resources taken by the benchmark client have a direct negative effect on the server. This explains getting a substantially different result from a very low overhead C++ client and a more heavyweight Go client. Second, by using a loopback interface instead of an actual network there is far less overhead. This allows seeing much higher numbers than is possible when actually on a real network.
The point of a good benchmark is to maximize the result difference between the test subjects.
I do not see the fact that most implementations are within 50% of each other as flaw, I see it as a valid data point that for this particular workload the choice of language and library should probably not be decided just based on throughput. For other workloads, results may be substantially different.
The raw results are here: https://github.com/hashrocket/websocket-shootout/blob/master/results/round-02-binary.md. The C++ benchmark is here: https://github.com/hashrocket/websocket-shootout/tree/master/cpp/bench.
from websocket-shootout.
To validate Chapter 6 of my first post, and to really show you how flawed your "benchmark of websocket libraries" is, I made my own server with uWS and it performs multiple hundreds of percentages better than the one you wrote (using the very same uWS):
clients: 1000 95per-rtt: 7ms min-rtt: 4ms median-rtt: 7ms max-rtt: 7ms
clients: 2000 95per-rtt: 15ms min-rtt: 8ms median-rtt: 11ms max-rtt: 18ms
clients: 3000 95per-rtt: 19ms min-rtt: 12ms median-rtt: 14ms max-rtt: 25ms
clients: 4000 95per-rtt: 22ms min-rtt: 16ms median-rtt: 19ms max-rtt: 27ms
clients: 5000 95per-rtt: 31ms min-rtt: 20ms median-rtt: 23ms max-rtt: 36ms
clients: 6000 95per-rtt: 37ms min-rtt: 23ms median-rtt: 27ms max-rtt: 39ms
clients: 7000 95per-rtt: 36ms min-rtt: 26ms median-rtt: 29ms max-rtt: 40ms
clients: 8000 95per-rtt: 41ms min-rtt: 30ms median-rtt: 33ms max-rtt: 45ms
clients: 9000 95per-rtt: 44ms min-rtt: 34ms median-rtt: 37ms max-rtt: 49ms
clients: 10000 95per-rtt: 50ms min-rtt: 38ms median-rtt: 42ms max-rtt: 50ms
clients: 11000 95per-rtt: 54ms min-rtt: 42ms median-rtt: 45ms max-rtt: 59ms
clients: 12000 95per-rtt: 59ms min-rtt: 46ms median-rtt: 49ms max-rtt: 61ms
clients: 13000 95per-rtt: 63ms min-rtt: 50ms median-rtt: 53ms max-rtt: 64ms
clients: 14000 95per-rtt: 65ms min-rtt: 55ms median-rtt: 57ms max-rtt: 68ms
clients: 15000 95per-rtt: 73ms min-rtt: 58ms median-rtt: 61ms max-rtt: 75ms
clients: 16000 95per-rtt: 78ms min-rtt: 62ms median-rtt: 65ms max-rtt: 83ms
clients: 17000 95per-rtt: 89ms min-rtt: 66ms median-rtt: 69ms max-rtt: 145ms
clients: 18000 95per-rtt: 91ms min-rtt: 69ms median-rtt: 73ms max-rtt: 95ms
clients: 19000 95per-rtt: 90ms min-rtt: 73ms median-rtt: 77ms max-rtt: 93ms
clients: 20000 95per-rtt: 94ms min-rtt: 77ms median-rtt: 80ms max-rtt: 95ms
clients: 21000 95per-rtt: 98ms min-rtt: 81ms median-rtt: 86ms max-rtt: 103ms
clients: 22000 95per-rtt: 101ms min-rtt: 86ms median-rtt: 89ms max-rtt: 103ms
clients: 23000 95per-rtt: 105ms min-rtt: 89ms median-rtt: 93ms max-rtt: 105ms
clients: 24000 95per-rtt: 105ms min-rtt: 94ms median-rtt: 97ms max-rtt: 109ms
clients: 25000 95per-rtt: 130ms min-rtt: 97ms median-rtt: 103ms max-rtt: 202ms
clients: 26000 95per-rtt: 115ms min-rtt: 102ms median-rtt: 106ms max-rtt: 116ms
clients: 27000 95per-rtt: 123ms min-rtt: 104ms median-rtt: 112ms max-rtt: 125ms
clients: 28000 95per-rtt: 131ms min-rtt: 110ms median-rtt: 115ms max-rtt: 134ms
Just like Chapter 6 states, a broadcast is ultimately going to end up being a loop of syscalls (which is a constant workload for all servers). That's why it is important to know what you are doing when implementing things like pub/sub and similar things (like this very benchmark of yours). You cannot use your grandmother as a test subject when testing how fast a sports car is and then conclude, based on the fact that your grandmother didn't go any faster, that "all cars are the same speed". What you benchmark in that case is your grandmother, not the car.
By implementing a very simple server based on my own recommendations from this repo: https://github.com/alexhultman/High-performance-pub-sub I was able to give you results of your own benchmark, close to 5x different than those you came up with.
You need to stop tainting the bechmark with your own shortcomings. You cannot conclude that uWS is "about the same" as other low-perf implementations, when the issue is what you put ontop of the library. A server will not just magically be fast just because you swapped to uWS - it requires that you know how to use it and surrounding low-level matters.
Stick with the echo tests, they are standard in this industry: they benchmark receiving performance (parsing + memory management) as well as sending performance (framing and memory management). Everything else is up to the user, it's not part of the websocket library. Node.js, Apache, h2o, NGINX and all those HTTP server measure performance in requests per second aka echo, simply becuse that is the only way to show (without tainting the server with user code) the performance of the server and only the server.
For reference, this is the result I get with the server you wrote in uWS:
clients: 1000 95per-rtt: 25ms min-rtt: 7ms median-rtt: 15ms max-rtt: 26ms
clients: 2000 95per-rtt: 41ms min-rtt: 10ms median-rtt: 32ms max-rtt: 44ms
clients: 3000 95per-rtt: 56ms min-rtt: 14ms median-rtt: 47ms max-rtt: 59ms
clients: 4000 95per-rtt: 72ms min-rtt: 19ms median-rtt: 62ms max-rtt: 76ms
clients: 5000 95per-rtt: 87ms min-rtt: 22ms median-rtt: 80ms max-rtt: 99ms
clients: 6000 95per-rtt: 106ms min-rtt: 25ms median-rtt: 96ms max-rtt: 111ms
clients: 7000 95per-rtt: 125ms min-rtt: 29ms median-rtt: 113ms max-rtt: 132ms
clients: 8000 95per-rtt: 139ms min-rtt: 33ms median-rtt: 129ms max-rtt: 144ms
clients: 9000 95per-rtt: 158ms min-rtt: 37ms median-rtt: 145ms max-rtt: 176ms
clients: 10000 95per-rtt: 182ms min-rtt: 48ms median-rtt: 164ms max-rtt: 189ms
clients: 11000 95per-rtt: 203ms min-rtt: 49ms median-rtt: 185ms max-rtt: 214ms
clients: 12000 95per-rtt: 217ms min-rtt: 49ms median-rtt: 200ms max-rtt: 225ms
clients: 13000 95per-rtt: 240ms min-rtt: 53ms median-rtt: 217ms max-rtt: 252ms
clients: 14000 95per-rtt: 257ms min-rtt: 57ms median-rtt: 234ms max-rtt: 263ms
clients: 15000 95per-rtt: 266ms min-rtt: 74ms median-rtt: 253ms max-rtt: 271ms
clients: 16000 95per-rtt: 282ms min-rtt: 69ms median-rtt: 269ms max-rtt: 285ms
clients: 17000 95per-rtt: 300ms min-rtt: 72ms median-rtt: 288ms max-rtt: 361ms
clients: 18000 95per-rtt: 316ms min-rtt: 88ms median-rtt: 306ms max-rtt: 323ms
clients: 19000 95per-rtt: 331ms min-rtt: 84ms median-rtt: 323ms max-rtt: 336ms
clients: 20000 95per-rtt: 349ms min-rtt: 80ms median-rtt: 341ms max-rtt: 353ms
clients: 21000 95per-rtt: 366ms min-rtt: 91ms median-rtt: 357ms max-rtt: 369ms
clients: 22000 95per-rtt: 386ms min-rtt: 93ms median-rtt: 375ms max-rtt: 388ms
clients: 23000 95per-rtt: 396ms min-rtt: 111ms median-rtt: 391ms max-rtt: 406ms
clients: 24000 95per-rtt: 416ms min-rtt: 98ms median-rtt: 408ms max-rtt: 429ms
clients: 25000 95per-rtt: 436ms min-rtt: 104ms median-rtt: 428ms max-rtt: 537ms
clients: 26000 95per-rtt: 453ms min-rtt: 107ms median-rtt: 446ms max-rtt: 454ms
clients: 27000 95per-rtt: 473ms min-rtt: 112ms median-rtt: 465ms max-rtt: 479ms
clients: 28000 95per-rtt: 487ms min-rtt: 117ms median-rtt: 480ms max-rtt: 492ms
As you can see, the difference is major. Yet the very same websocket library has been utilized. I hope this will get you to realize how flawed this benchmark is.
This is yet again validating my very first post "Chapter 6".
from websocket-shootout.
Yes I can post it, but it would be very unfair if you used it since the other servers would be using a different broadcasting algorithm.
This is what I have currently, it depends on a new function which is not fully decided on yet, but should land some time soon (I have discussed this function for a while with other people doing pub/sub):
#include <uWS/uWS.h>
#include <iostream>
#include <string>
using namespace std;
struct Sender {
std::string data;
uWS::WebSocket<uWS::SERVER> ws;
};
std::vector<Sender> senders;
uWS::Hub hub;
bool newThisIteration, inBatch;
int main(int argc, char *argv[]) {
uv_timer_t timer;
uv_timer_init(hub.getLoop(), &timer);
uv_prepare_t prepare;
prepare.data = &timer;
uv_prepare_init(hub.getLoop(), &prepare);
uv_prepare_start(&prepare, [](uv_prepare_t *prepare) {
if (inBatch) {
uv_timer_start((uv_timer_t *) prepare->data, [](uv_timer_t *t) {}, 1, 0);
newThisIteration = false;
}
});
uv_check_t checker;
uv_check_init(hub.getLoop(), &checker);
uv_check_start(&checker, [](uv_check_t *checker) {
if (inBatch && !newThisIteration) {
std::vector<std::string> messages;
std::vector<int> excludes;
for (Sender s : senders) {
messages.push_back(s.data);
}
if (messages.size()) {
uWS::WebSocket<uWS::SERVER>::PreparedMessage *prepared = uWS::WebSocket<uWS::SERVER>::prepareMessageBatch(messages, excludes, uWS::OpCode::BINARY, false, nullptr);
hub.getDefaultGroup<uWS::SERVER>().forEach([&prepared](uWS::WebSocket<uWS::SERVER> ws) {
ws.sendPrepared(prepared, nullptr);
});
uWS::WebSocket<uWS::SERVER>::finalizeMessage(prepared);
}
for (Sender s : senders) {
s.data[0] = 'r';
s.ws.send(s.data.data(), s.data.length(), uWS::OpCode::BINARY);
}
senders.clear();
inBatch = false;
}
});
hub.onMessage([](uWS::WebSocket<uWS::SERVER> ws, char *message, size_t length, uWS::OpCode opCode) {
switch (message[0]) {
case 'b':
senders.push_back({std::string(message, length), ws});
newThisIteration = true;
inBatch = true;
break;
case 'e':
ws.send(message, length, opCode);
}
});
hub.listen(3000);
hub.run();
}
I landed the initial commit here: uNetworking/uWebSockets@e4b7584
from websocket-shootout.
Can you share the code for this?
from websocket-shootout.
I love the fact that you've put together a nice set of socket implementations in various languages (especially Elixir!).
I would very much like to see a more optimized version of the Node implementation, though. If it took advantage of inline caching and V8 CrankShaft's optimizer it could be doing dramatically better I think.
Most.js does an amazing job at that: https://github.com/cujojs/most/tree/master/test/perf
from websocket-shootout.
Good write up. I also wonder why the ws
websocket library was used instead of uWS
. When uWS is far better in performance. That's not fair for nodejs or the author, and I think the blog chart should be re-updated with uWS numbers instead.
from websocket-shootout.
Related Issues (18)
- Node cluster HOT 1
- Rust benchmark results HOT 3
- C++ not compiled with optimizations? HOT 1
- JSON and the C++ benchmark HOT 3
- Document build dependencies HOT 2
- node uws version doesn't work
- cpp uWebSockets impl is single threaded HOT 1
- Logo for this project HOT 3
- Update ยตWS (0.10.9) HOT 6
- Error in actioncable benchmark HOT 1
- How does amount of messages handled per seconds change in comparison to number of clients? HOT 3
- Questions about JavaScript in Round 2 HOT 4
- Rust Suggestion
- JVM settings for Clojure HOT 1
- Need RAM usage too..
- Use cluster for Nodejs to use multiple cores
- Other options for Go HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from websocket-shootout.