Giter Club home page Giter Club logo

Comments (4)

ZenSepiol avatar ZenSepiol commented on June 15, 2024

BTW, I would be interested in the results of your comparison! If you have anything to share go ahead.

from dear-imgui-app-framework.

Atzubi avatar Atzubi commented on June 15, 2024

Sure @ZenSepiol , though as of now I can only provide some apples vs oranges comparisons. My thread pool compromises on a few things to shave off overhead:

  • Tasks can only return void and may only have a single parameter, being a pointer to a context
  • The order of tasks started may not be the same order they were enqueued
  • Currently load balancing is not implemented but this is WIP, however it will probably add a small overhead again for perfectly balanced workloads, while improving performance for unbalanced loads

Here are the numbers for a few small benchmarks (always only 1 writer although both pools support multiple):

  • Raw function call overheads (modified both pools to only start processing with a single thread once everything was enqueued):

    • Empty Tasks: 10.000.000
      • My thread pool: 222ms
      • Your thread pool: 4715ms
  • Concurrency scaling:

    • Empty Tasks: 10.000.000
      • Sequential: 0ms
      • Thread count: 1
        • My thread pool: 283ms
        • Your thread pool: 3209ms
      • Thread count: 4
        • My thread pool: 419ms
        • Your thread pool: 5158ms
      • Thread count: 16
        • My thread pool: 336ms
        • Your thread pool: 17887ms
    • Small Tasks: 10.000.000
      • Sequential: 262ms
      • Thread count: 1
        • My thread pool: 329ms
        • Your thread pool: 3335ms
      • Thread count: 4
        • My thread pool: 469ms
        • Your thread pool: 5671ms
      • Thread count: 16
        • My thread pool: 324ms
        • Your thread pool: 18718ms
    • Medium Tasks: 10.000.000
      • Sequential: 47168ms
      • Thread count: 1
        • My thread pool: 47251ms
        • Your thread pool: 50123ms
      • Thread count: 4
        • My thread pool: 11911ms
        • Your thread pool: 13701ms
      • Thread count: 16
        • My thread pool: 3497ms
        • Your thread pool: 18850ms
    • Medium+ Tasks: 10.000.000 (Note: the workload of a single task got trippled compared to the medium tasks, yet the performance improved for your thread pool)
      • Thread count: 16
        • My thread pool: 10284ms
        • Your thread pool: 10890ms
    • Large Tasks: 100.000
      • Sequential: 47409ms
      • Thread count: 1
        • My thread pool: 47393ms
        • Your thread pool: 47445ms
      • Thread count: 4
        • My thread pool: 11948ms
        • Your thread pool: 11943ms
      • Thread count: 16
        • My thread pool: 3253ms
        • Your thread pool: 3370ms

As you can see, the overhead of all the generality provided by the standard library is pretty high. As expected, the performance of both pools converge towards the same limit for increasingly bigger tasks but especially for small tasks your implementation really suffers when there is high contention between threads. My implementation on the other hand scales somewhat with thread count even for empty tasks as by design contention between writing thread and reading threads is reduced the more readers there are.

These numbers were generated on a system with a ryzen 5800X3D running windows 11 with minor background stuff. C++23 enabled compiler was used with optimization level 2. (I did not find time to do linux tests yet but from earlier experiments you should find that on linux your thread pool does comparatively better vs mine than on windows)

If you want I can follow up with more numbers once I find time again to work a little more on my pool, get some real world workloads as benchmark and maybe get to compare it to some state of the art implementations.

from dear-imgui-app-framework.

ZenSepiol avatar ZenSepiol commented on June 15, 2024

Thank you very much for sharing!

As you said, there is no thread-pool that ticks all the boxes. Each problem can be individually optimized by adapting the pool.
I expected some of the overhead. Some of the results for the 16 threads are somewhat weird. I suspect there is some sort of caching issue going on. But overall the results seem plausible.

from dear-imgui-app-framework.

Atzubi avatar Atzubi commented on June 15, 2024

I assume you mean the 16 thread results of your pool. Technically it is a caching issue but you normally don't refer to it as such. Essentially what is happening here is that you only have a single synchronisation point being your condition variable/mutex (Apparently this mutex does not internally track threads and spread the synchronisation points). So every time any of your threads comes in contact with the condition variable/mutex it has to change it (telling other threads you are there, sort of). In this process the condition variable/mutex first has to be fetched by thread A, changed and then commited to cache again. Afterwards another thread B on a different core has to do the same thing when it tries to aquire a lock. Now if A and B are rapidly trying to aquire the lock (see empty or small tasks) then that will also happen interchanged. Therefore the condition variable is bouncing between the caches of the cores A and B are on. This is expensive. And you can imagine, the more threads you have the more the variable will be bouncing.

#Rant :D

from dear-imgui-app-framework.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.