Comments (8)
Thanks for this. I'm a little surprised to see the msg/ms so low, especially compared to running a similar benchmark using Go and its channels. My understanding is Go channels are not lock-free, so I would at least expect results to be within an order of magnitude but they appear to be several orders apart.
chan_t (C)
chan_t: 1_1000000 send/recv time in ms: 3149 (317 nr_of_msg/msec)
chan_t: 2_1000000 send/recv time in ms: 14558 (137 nr_of_msg/msec)
chan_t: 4_1000000 send/recv time in ms: 33221 (120 nr_of_msg/msec)
chan_t: 8_1000000 send/recv time in ms: 70295 (113 nr_of_msg/msec)
chan_t: 16*1000000 send/recv time in ms: 141924 (112 nr_of_msg/msec)
chan (Go)
chan: 1_1000000 send/recv time in ms: 54.913379 (18210.498392 nr_of_msg/msec)
chan: 2_1000000 send/recv time in ms: 109.967315 (18187.222267 nr_of_msg/msec)
chan: 4_1000000 send/recv time in ms: 232.021836 (17239.756693 nr_of_msg/msec)
chan: 8_1000000 send/recv time in ms: 447.934368 (17859.759312 nr_of_msg/msec)
chan: 16*1000000 send/recv time in ms: 885.989277 (18058.909307 nr_of_msg/msec)
Obviously it's not a completely equivalent comparison (e.g. goroutines vs pthreads). The Golang team has spent a lot of effort optimizing channels, but I was expecting chan_t to hold up better.
from chan.
Sorry, but I do not know Go. I've only read about it. My understanding is that Go routines are high speed user threads (aka fibers or coroutines). And if Go channels synchronize per default, only one slot per client is needed. Also if all routines run in only one thread then no locks are needed.
So it is possible (if my understanding is right) running the benchmark comes down to a simple function call – the clients (channel <- id) call into servers which are stored in a waiting list (lock-free cause of a single thread).
I've rewritten the benchmark to call a simple function instead of into the queue and the result is
500000 nr_of_msg/msec for a single thread. 2 threads transfer 1000000 nr_of_msg/msec
and 8 threads transfer 2000000 (cause of quad core, does not scale beyond 4 threads).
And as you can see the Go benchmark does not scale up for more threads which suggests that
only one system thread executes all 16 Go routines.
from chan.
Yeah, I think you're right. You can tell the Go scheduler to utilize more cores with:
runtime.GOMAXPROCS(runtime.NumCPU())
Doing that with a quad core system yields slightly higher latency, likely because it's no longer on a single thread.
chan: 1_1000000 send/recv time in ms: 88.191041 (11339.020253 nr_of_msg/msec)
chan: 2_1000000 send/recv time in ms: 204.120321 (9798.142538 nr_of_msg/msec)
chan: 4_1000000 send/recv time in ms: 561.135630 (7128.401381 nr_of_msg/msec)
chan: 8_1000000 send/recv time in ms: 1117.243417 (7160.480767 nr_of_msg/msec)
chan: 16*1000000 send/recv time in ms: 2235.774605 (7156.356443 nr_of_msg/msec)
from chan.
If written test code to see whether goroutines could be implemented the way we've speculated.
See https://github.com/je-so/testcode/blob/master/gochan.c
I've implemented only the single thread case and got more than 11000 msg/msec.
The implementation uses a gcc extension: take address of goto labels with &&LABEL and jump to label with goto*(addr).
from chan.
Now the test code (gochan.c) supports system threads. It scales very well:
gochan: 1_30000 send/recv time in ms: 1 (30000 msg/msec)
gochan: 32_30000 send/recv time in ms: 46 (20869 msg/msec)
gochan: 64_30000 send/recv time in ms: 84 (22857 msg/msec)
gochan: 128_30000 send/recv time in ms: 153 (25098 msg/msec)
from chan.
There is a much better test driver in directory
https://github.com/je-so/testcode/tree/master/iperf
which scales linear with the number of scores.
Try it with chan if you want.
from chan.
With padding of the variables to the size of one cache line performance is much better!! See https://github.com/je-so/iqueue/blob/master/README.md for some numbers.
from chan.
Impressive, the padding has a pretty remarkable impact.
from chan.
Related Issues (16)
- memory not freed HOT 10
- deadlock HOT 1
- deadlock2 HOT 2
- Proposal for Removing blocking_pipe_t HOT 6
- Proposol: Removing unecessary malloc/free in queue_t HOT 1
- Clibs package HOT 5
- change implementation language to C HOT 2
- chan_recv can receive duplicate sent values
- optional size param for send/recv
- Will select block if chan_can_recv first true then false?
- Does this facilitate C - Go communication? HOT 4
- problem of threads?
- C2x feature proposal?
- Add support for blocking selects
- primitive values HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chan.