Comments (4)
BTW, I would be interested in the results of your comparison! If you have anything to share go ahead.
from dear-imgui-app-framework.
Sure @ZenSepiol , though as of now I can only provide some apples vs oranges comparisons. My thread pool compromises on a few things to shave off overhead:
- Tasks can only return void and may only have a single parameter, being a pointer to a context
- The order of tasks started may not be the same order they were enqueued
- Currently load balancing is not implemented but this is WIP, however it will probably add a small overhead again for perfectly balanced workloads, while improving performance for unbalanced loads
Here are the numbers for a few small benchmarks (always only 1 writer although both pools support multiple):
-
Raw function call overheads (modified both pools to only start processing with a single thread once everything was enqueued):
- Empty Tasks: 10.000.000
- My thread pool: 222ms
- Your thread pool: 4715ms
- Empty Tasks: 10.000.000
-
Concurrency scaling:
- Empty Tasks: 10.000.000
- Sequential: 0ms
- Thread count: 1
- My thread pool: 283ms
- Your thread pool: 3209ms
- Thread count: 4
- My thread pool: 419ms
- Your thread pool: 5158ms
- Thread count: 16
- My thread pool: 336ms
- Your thread pool: 17887ms
- Small Tasks: 10.000.000
- Sequential: 262ms
- Thread count: 1
- My thread pool: 329ms
- Your thread pool: 3335ms
- Thread count: 4
- My thread pool: 469ms
- Your thread pool: 5671ms
- Thread count: 16
- My thread pool: 324ms
- Your thread pool: 18718ms
- Medium Tasks: 10.000.000
- Sequential: 47168ms
- Thread count: 1
- My thread pool: 47251ms
- Your thread pool: 50123ms
- Thread count: 4
- My thread pool: 11911ms
- Your thread pool: 13701ms
- Thread count: 16
- My thread pool: 3497ms
- Your thread pool: 18850ms
- Medium+ Tasks: 10.000.000 (Note: the workload of a single task got trippled compared to the medium tasks, yet the performance improved for your thread pool)
- Thread count: 16
- My thread pool: 10284ms
- Your thread pool: 10890ms
- Thread count: 16
- Large Tasks: 100.000
- Sequential: 47409ms
- Thread count: 1
- My thread pool: 47393ms
- Your thread pool: 47445ms
- Thread count: 4
- My thread pool: 11948ms
- Your thread pool: 11943ms
- Thread count: 16
- My thread pool: 3253ms
- Your thread pool: 3370ms
- Empty Tasks: 10.000.000
As you can see, the overhead of all the generality provided by the standard library is pretty high. As expected, the performance of both pools converge towards the same limit for increasingly bigger tasks but especially for small tasks your implementation really suffers when there is high contention between threads. My implementation on the other hand scales somewhat with thread count even for empty tasks as by design contention between writing thread and reading threads is reduced the more readers there are.
These numbers were generated on a system with a ryzen 5800X3D running windows 11 with minor background stuff. C++23 enabled compiler was used with optimization level 2. (I did not find time to do linux tests yet but from earlier experiments you should find that on linux your thread pool does comparatively better vs mine than on windows)
If you want I can follow up with more numbers once I find time again to work a little more on my pool, get some real world workloads as benchmark and maybe get to compare it to some state of the art implementations.
from dear-imgui-app-framework.
Thank you very much for sharing!
As you said, there is no thread-pool that ticks all the boxes. Each problem can be individually optimized by adapting the pool.
I expected some of the overhead. Some of the results for the 16 threads are somewhat weird. I suspect there is some sort of caching issue going on. But overall the results seem plausible.
from dear-imgui-app-framework.
I assume you mean the 16 thread results of your pool. Technically it is a caching issue but you normally don't refer to it as such. Essentially what is happening here is that you only have a single synchronisation point being your condition variable/mutex (Apparently this mutex does not internally track threads and spread the synchronisation points). So every time any of your threads comes in contact with the condition variable/mutex it has to change it (telling other threads you are there, sort of). In this process the condition variable/mutex first has to be fetched by thread A, changed and then commited to cache again. Afterwards another thread B on a different core has to do the same thing when it tries to aquire a lock. Now if A and B are rapidly trying to aquire the lock (see empty or small tasks) then that will also happen interchanged. Therefore the condition variable is bouncing between the caches of the cores A and B are on. This is expensive. And you can imagine, the more threads you have the more the variable will be bouncing.
#Rant :D
from dear-imgui-app-framework.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dear-imgui-app-framework.