It would be pretty neat if users could specify strict limits on GC - I have an explora

Real time 'GC' about crossbeam HOT 3 CLOSED

crossbeam-rs commented on May 17, 2024

Real time 'GC'

from crossbeam.

Comments (3)

aturon commented on May 17, 2024

FWIW, I'd love to expose functionality along these lines, as well as instrumentation for tracking GC, etc. Crossbeam is still quite experimental, so I'm open to landing early-stage APIs (perhaps under a feature flag) to gain experience.

The current heuristics around when to do a GC and so on are pretty arbitrary, though I did some basic testing with benchmarks to find the smallest local threshhold that didn't impose a performance penalty. I suspect there's a lot more room for experimentation there, as well.

I've never dug deep into GC/allocator customization myself, so maybe a good starting place is just to collect some of the useful ideas that have cropped up elsewhere, and see what makes sense to offer in crossbeam?

from crossbeam.

schets commented on May 17, 2024

Now that I think about it more, I really don't like the API I put in that PR. Adding new features would probably require more, or at least more complicated macros which are kinda opaque and weird as opposed to normal code. I think something more along the lines of the scope api would be better and safer while avoiding macro shenanigans.

I think that the smallest thresholds possible without negatively affecting performance are good - being able to get simple concurrent memory management and avoid GC-like latency spikes is a huge advantage. Also, if these thresholds are adjustable, then users can change them if it actually affects their use.

I've looked at some literature (JVM options, RTJS, Azul JVM, library/software sources) and have some ideas for what could go in (not that all of this needs to) and what the upsides/downsides of such features are. In the below, I use application threads to mean threads that are doing useful work for the program and operating on the datastructures with crossbeam, real-time application threads as application threads with strict latency requirements, and GC threads to mean threads that purely do work for the garbage collector.

In a given scope, one might be able to control:

GC prevention: prevent (and re-enable) the GC. This can take many forms - one could simply skip all GC related activity, or migrate garbage to to global lists. There would need to be an option to force GC disabling (but not force enabling!) so that a poorly behaved third-party library couldn't re-enable it and cause big latency spikes.
Dedicated GC threads: Cossbeam would create GC threads which take work directly from application threads and the global bag. This would lower overall throughput due to worse cache locality and hitting the multithreaded part of the allocator hard, but would allow real-time application threads to never have to use the GC. While needing more than one GC thread seems extreme, in the day and age of 40+ core x86 servers and 100+ core Arm/PowerPC servers, it's entirely plausible.
GC limits: Limit the amount of time spent in the garbage collector, or amount of items collected. Timing restrictions as opposed to completely disabling the GC would allow some of the work to be distributed into application threads as long as it didn't break time/usage limits.
Skipping the allocator: For a limited subset of cases, it might make sense to allow the GC to send freed memory chunks directly back certain writer threads, skipping the allocator and getting better lock-freedom properties. There are ways of doing this with lock-free freelists/queues that are wait-free for consumers and involve few to zero atomics. There could also be GC threads dedicated to this.
Multiple epochs/GC's: There could be separate epochs and global gcs which a datastructure may register with (as opposed to the default one). This could be useful to separate real-time threads away from the rest of the application, threads that may block epoch advancement. If there's a sort of freelist scheme in place, this may be really useful to ensure that a set of real-time threads have enough GC worker time to keep the freelists populated.

Also, in a given scope, user could optionally collect stats (global and local) on:

local GC frequency
of GC calls
statistics on number of operations per call
total time spent in GC
portion of Crossbeam time (time between participant enter/exit calls) spent in GC
latency statistics

And if if we really want to get fancy, we could enable some forms of logging although that seems maybe a bit out of the scope of this project.

This is more of a brain dump that claiming that Crossbeam needs the features, and some of them are fairly involved. I have a lot of reason to believe that they are very, very attractive to people who want the benefits of a GC when writing high performance multicore data structures and don't want to shell out tremendous amounts of money for specialized JVMs and highly specialized hardware (+ the questionable throughput of most real-time jvms), while still bending over backwards to get the GC to cooperate.

from crossbeam.

schets commented on May 17, 2024

I'm going to break this out into a few separate issues, since it's kinda hard to meaningfully discuss such a giant blob of different ideas.

from crossbeam.

Real time 'GC' about crossbeam HOT 3 CLOSED

Comments (3)

of GC calls

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent