Comments (3)
FWIW, I'd love to expose functionality along these lines, as well as instrumentation for tracking GC, etc. Crossbeam is still quite experimental, so I'm open to landing early-stage APIs (perhaps under a feature flag) to gain experience.
The current heuristics around when to do a GC and so on are pretty arbitrary, though I did some basic testing with benchmarks to find the smallest local threshhold that didn't impose a performance penalty. I suspect there's a lot more room for experimentation there, as well.
I've never dug deep into GC/allocator customization myself, so maybe a good starting place is just to collect some of the useful ideas that have cropped up elsewhere, and see what makes sense to offer in crossbeam?
from crossbeam.
Now that I think about it more, I really don't like the API I put in that PR. Adding new features would probably require more, or at least more complicated macros which are kinda opaque and weird as opposed to normal code. I think something more along the lines of the scope api would be better and safer while avoiding macro shenanigans.
I think that the smallest thresholds possible without negatively affecting performance are good - being able to get simple concurrent memory management and avoid GC-like latency spikes is a huge advantage. Also, if these thresholds are adjustable, then users can change them if it actually affects their use.
I've looked at some literature (JVM options, RTJS, Azul JVM, library/software sources) and have some ideas for what could go in (not that all of this needs to) and what the upsides/downsides of such features are. In the below, I use application threads to mean threads that are doing useful work for the program and operating on the datastructures with crossbeam, real-time application threads as application threads with strict latency requirements, and GC threads to mean threads that purely do work for the garbage collector.
In a given scope, one might be able to control:
- GC prevention: prevent (and re-enable) the GC. This can take many forms - one could simply skip all GC related activity, or migrate garbage to to global lists. There would need to be an option to force GC disabling (but not force enabling!) so that a poorly behaved third-party library couldn't re-enable it and cause big latency spikes.
- Dedicated GC threads: Cossbeam would create GC threads which take work directly from application threads and the global bag. This would lower overall throughput due to worse cache locality and hitting the multithreaded part of the allocator hard, but would allow real-time application threads to never have to use the GC. While needing more than one GC thread seems extreme, in the day and age of 40+ core x86 servers and 100+ core Arm/PowerPC servers, it's entirely plausible.
- GC limits: Limit the amount of time spent in the garbage collector, or amount of items collected. Timing restrictions as opposed to completely disabling the GC would allow some of the work to be distributed into application threads as long as it didn't break time/usage limits.
- Skipping the allocator: For a limited subset of cases, it might make sense to allow the GC to send freed memory chunks directly back certain writer threads, skipping the allocator and getting better lock-freedom properties. There are ways of doing this with lock-free freelists/queues that are wait-free for consumers and involve few to zero atomics. There could also be GC threads dedicated to this.
- Multiple epochs/GC's: There could be separate epochs and global gcs which a datastructure may register with (as opposed to the default one). This could be useful to separate real-time threads away from the rest of the application, threads that may block epoch advancement. If there's a sort of freelist scheme in place, this may be really useful to ensure that a set of real-time threads have enough GC worker time to keep the freelists populated.
Also, in a given scope, user could optionally collect stats (global and local) on:
- local GC frequency
-
of GC calls
- statistics on number of operations per call
- total time spent in GC
- portion of Crossbeam time (time between participant enter/exit calls) spent in GC
- latency statistics
And if if we really want to get fancy, we could enable some forms of logging although that seems maybe a bit out of the scope of this project.
This is more of a brain dump that claiming that Crossbeam needs the features, and some of them are fairly involved. I have a lot of reason to believe that they are very, very attractive to people who want the benefits of a GC when writing high performance multicore data structures and don't want to shell out tremendous amounts of money for specialized JVMs and highly specialized hardware (+ the questionable throughput of most real-time jvms), while still bending over backwards to get the GC to cooperate.
from crossbeam.
I'm going to break this out into a few separate issues, since it's kinda hard to meaningfully discuss such a giant blob of different ideas.
from crossbeam.
Related Issues (20)
- Can we avoid to call `try_advance` in some senarios? [crossbeam-epoch] HOT 2
- Epoch memory reclamation algorithm HOT 2
- consider adding `is_disconnected` method to Receiver type HOT 1
- Scaling receiver counter up/down based on demand
- `select!` macro and auto-complete HOT 1
- [crossbeam-epoch] false sharing issue for "epoch: AtomicEpoch" field in "Local" struct
- crossbeam-skiplist bug HOT 3
- recv_deadline with past instant is ambiguous HOT 1
- Is there any difference between SeqCst store and SeqCst fence after a Relaxed store? HOT 1
- Consider exposing different flavors of channels using different types. HOT 9
- Overflow evaluation with recursive types and Box HOT 1
- crossbeam-deque has no blocking pop: how do you use it?
- Semantic one-shot channel HOT 3
- Publish new `crossbeam-epoch` with updated `memoffset` HOT 2
- crossbeam_channel::tick delivery_time can support different flavor HOT 4
- what shuld used for AtomicCell or ShardedLock HOT 1
- Cirrus CI no longer supports 32-bit mode on ARM Linux runners
- Always use load(Acquire) for load_consume on Miri/TSan
- Question (epoch): How to avoid Miri to report data race on an `Atomic` pointer? HOT 6
- crossbeam::channel::Receiver::try_recv can block forever if sending thread is blocked HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crossbeam.