ducklogic / zerogc Goto Github PK
View Code? Open in Web Editor NEWZero overhead tracing garbage collection for rust (WIP)
License: MIT License
Zero overhead tracing garbage collection for rust (WIP)
License: MIT License
This should work cleanly for CollectorId
s of different types. Mixing multiple collector ids of the same type is unlikely to be supported right now.
I already have a doctest demonstrating the expected behavior.
All that remains is to get the procedural drive to support it, then test mixing the 'simple' collector with the 'epsilon' collector.
The main difficulty is breaking the derive macro's current assumption that there is a single 'gc
lifetime. With multiple collector ids, there are likely to be multiple GC lifetimes.
These are both single-threaded collectors that use a u64
as their ID. They have the same shadow-stack implementation.
I'm thinking about adding a new simple
module to the new crate. This will share common code for both collectors. We'll move GcAllocContext
there since it's really only applicable to those implementations.
Complex, external collectors (like DuckLogic) will have the option to use some other interface, just as they did before.
This is needed for collectors that use concurrent marking or forwarding pointers, like Java's ZGC and Shenandoah.
I used to think this was impossible, but now I think it is not.
I think we could insert them in the deref
implementation of Gc. We could guarantee the pointers are updated at least once after each safe point. Is this good enough?
TODO: Consult The GC Handbook
Some allocators may require the stdlib, as an implementation detail. However, the core API should not in order to remain lightweight.
Some allocators may also require liballoc
or malloc
as a backing allocator.
Users should be able to explicitly ignore lifetimes (that are not 'gc
) when procedurally deriving. This should be verified
The attribute should go on the lifetime itself. We should also probably change #[zerogc(ignore_params(A, B, ...))]
to match this behavior.
For example:
struct Foo<'gc, #[zerogc(ignore)] 'a> {
foo: Gc<'gc, Foo<'gc, 'a>>,
s: String
}
Right now the 'a
lifetime causes an error.
ZeroGC should support dynamically tracking roots without tying them to a specific GcContext
. The collector would treat the ObjectHandle
as a root, . See mono's documentation on their(lock-free) GC handles.
In order to access the object, the collector must be paused at a safepoint (to support moving collectors). We could implement this either by temporarily associating it with an existing context, or tying usage to some sort of temporary closure.
We need this support PyObject -> DuckLogic references. Allocation and access need to be extremely efficient in order for our CAPI support to be even reasonably performant.
It'd also be nice if we could use (and maybe allocate) these in the JIT threads. Currently working with constant objects is kind of awkard ๐ข
Right now the simple collector is "thread-safe" in that everything is !Send
. It's not possible to send garbage collected pointers across threads, so no synchronization is ever needed.
As it stands, the API is pretty agnostic to whether or not the underlying implementation is thread-safe. It exects thread safety to be marked by the appropriate Send
and Sync
implementations.
The main issue with allowing multiple threads to use the collector is that a collection would have to pause all threads. Once we decide its time to collect, we'd have to block until all other collectors reach a `safepoint!'. This is done cooperatively (like all of zerogc's safepoints), meaning that a single thread could hold up collection by waiting too long to invoke a safepoint.
Internally other VMs, they just don't have an issue with uncooperative threads, since the VM automatically inserts safepoints. The JVM's JIT inserts a safepoint on every single loop iteration (implemented as a test against a guard page).
This is almost analogous to the tradeoff between cooperative and preemptive multitasking. ZeroGC is designed as a low-cost abstraction. We already rely on the user to insert regular safepoints in the single-threaded case, so it's reasonable to expect them to be responsible in multithreaded case as well.
Currently, ZeroGC safepoints are expected to return very quickly if there's not a collection (usually just checking memory usage).
The ZeroGC API should be updated to allow a user to "freeze" a GcContext
at a safepoint. This would prevent future allocations (until unfrozen), ensuring that the context's set of roots is known. This is essentially a long-term safepoint, that would other threads to collect while the first set is still working.
The simple collector will probably need a little bit of internal synchronization for this (we'll add a feature-flag to control threading support). We'll need to support thread-local caches for the small-object-arenas. A good experiment would be attempting to reimplement the binary-trees example using multiple threads.
This is basically a requirement for DuckLogic. Although Python is essentially single-threaded (due to the GIL), it's still technically possible for python code to be invoked from different threads.
Other systems (like DuckLogic) may want to use a custom type for their garbage-collected pointers. We should support this.
This would also allow us to remove CollectorId
since that would be an implementation-detail of the pointer. We could move the old "simple" Gc pointers to the simple interface described in #4
This is actually not uncommon among garbage collected programs (Eclipse),
but our collector should strive to do better.
Potentially we could allocate a bunch of objects into an arena, requiring bumpalo
to create a new chunk of memory.
Hypothetically a program could stop using the old objects, then never allocate any again (or at least not as many).
Right now, the sweep phase will simply add free objects to the free lists and never attempt to return it to the OS. Even if a chunk of memory is completely empty, we'll still never return it.
In a way, this is related to our dependency on bumpalo
since they don't provide an interface to free a single chunk at a time.
Fragmentation of the arenas could also leave a ton of unused memory, resulting in a similar situation. That situation would require a compaction phase, which is out of scope for the simple allocator (just disable the arenas if this is a problem).....
This is related to #15 because [T]
is unsized. Ideally, we would be able to have Gc<'gc, [i32]>
without any indirection through a Box<[T]>
.
The big problem here is that the simple collector expects Gc
to have a statically-known size......
This is very important to actually properly implement any real-world use. In theory, arrays could just wrap Vec
right now but that's not very elegant.
Implement a heavy-duty generational garbage collector. This should significantly improve performance for applications where allocations are frequent. It is intended primarily for use in language VMs and JITs (#12).
I have simple runtime Waffle and there is baseline JIT, currently I use simple mark&sweep and use reference counting for roots and this causes runtime overhead, my question is it possible to use zerogc
with JIT? Will safepoints work in JITed code? I do not use LLVM or Cranelift, instead I have macro assembler and if safepoints needs some kind of additional code to make it work I believe I can emit this code just fine.
The current Trace
trait is not object-safe, so it canโt be used with trait objects:
trait Foo: Trace {
fn foo(&self);
}
#[derive(Trace)]
struct AnyFoo(dyn Foo);
Could it be made object-safe by turning the associated constants into functions and moving the V: GcVisitor
type parameter of Trace::visit<V>
up to Trace<V>::visit
?
The simple collector is thread safe, so handles should be too. GcHandle is effectively a shared reference, so T must be Sync in order for GcHandle to be Send (just like the requirements in Arc and Gc).
This is already true in practice, we just haven't added the unsafe impl declarations.
All writes value -> dest
need to inform the GC. By default, the GC returns immutable references and you need GcCell
to mutate fields.
Write barriers are essential for generational and incremental garbage collections. Although it's possible to use hardware barriers (by marking memory read-only), most modern GC implementations opt for software-barriers in practice.
Unfortunately GcCell
doesn't have a reference to the owning object, so the current implementation can only inform the collector about value
(and not dest
).
I plan to fix this by adding an unsafe write_barrier
method that takes both value
and dest
as arguments and then triggers a write barrier. We can automatically generate safe wrappers for these, alongside our new procedural derive in #2.
The readme alone has several typos.
I think it might be nice to write a small booklet using MdBook in order to describe the usage and design of the API.
I have some serious API cleanup I want to do, so this documentation will probably come after the 0.2.0 release.
I want to use zerogc to implement main GC algorithm in starlight but there is question: is it safe to panic and then catch panic when zerogc shadow stack is used? As I understand when panic happens zerogc does not at all try to remove shadow stack entries which leads to UB.
I try to use this library, but I have a lot of things I don't understand.
for version 0.2.0-alpha.1
I want to implement the following types, are there any design problems?
type GcCollector = zerogc::dummy_impl::DummyCollectorId;
type GcValue<'gc> = Gc<'gc, Value<'gc>, GcCollector>;
type GcString<'gc> = Gc<'gc, String, GcCollector>;
#[derive(Debug, Clone, Trace)]
pub enum Value<'gc> {
Null,
Boolean(bool),
String(GcString<'gc>),
Array(Vec<GcValue<'gc>>),
Object(HashMap<GcString<'gc>, GcValue<'gc>>),
}
How to deal with missing
TraceImmutable
?
error[E0277]: the trait bound `Gc<'gc, String, DummyCollectorId>: TraceImmutable` is not satisfied
--> projects\nyar-interpreter\src\value\mod.rs:13:24
|
13 | #[derive(Debug, Clone, Trace)]
| ^^^^^ the trait `TraceImmutable` is not implemented for `Gc<'gc, String, DummyCollectorId>`
|
= note: required because of the requirements on the impl of `Trace` for `HashMap<Gc<'gc, String, DummyCollectorId>, Gc<'gc, value::Value<'gc>, DummyCollectorId>>`
= note: required by `NEEDS_TRACE`
= note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)
How to deal with missing
GcRebrand
#[derive(Debug, Clone, Trace)]
pub enum Value<'gc> {
Null,
Boolean(bool),
String(GcString<'gc>),
Array(Vec<GcValue<'gc>>),
Object(HashMap<String, GcValue<'gc>>),
}
error[E0277]: the trait bound `Gc<'gc, String, DummyCollectorId>: GcRebrand<'new_gc, Id>` is not satisfied
--> projects\nyar-interpreter\src\value\mod.rs:17:12
|
17 | String(GcString<'gc>),
| ^^^^^^^^^^^^^ the trait `GcRebrand<'new_gc, Id>` is not implemented for `Gc<'gc, String, DummyCollectorId>`
|
= note: required by `assert_rebrand`
help: consider introducing a `where` bound, but there might be an alternative better way to express this requirement
|
14 | pub enum Value<'gc> where Gc<'gc, String, DummyCollectorId>: GcRebrand<'new_gc, Id> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
How to deal with missing types such as
VecDeque
, which trait bounds need to be implemented manually
#[derive(Debug, Clone, Trace)]
pub enum Value<'gc> {
Null,
Boolean(bool),
String(GcString<'gc>),
Array(VecDeque<GcValue<'gc>>),
Object(HashMap<String, GcValue<'gc>>),
}
error[E0277]: the trait bound `VecDeque<Gc<'gc, value::Value<'gc>, DummyCollectorId>>: Trace` is not satisfied
--> projects\nyar-interpreter\src\value\mod.rs:13:24
|
13 | #[derive(Debug, Clone, Trace)]
| ^^^^^ the trait `Trace` is not implemented for `VecDeque<Gc<'gc, value::Value<'gc>, DummyCollectorId>>`
|
= note: required by `NEEDS_TRACE`
= note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)
This would be super useful for testing.
See details Java Epsilon GC for how it might be used in production.
The nice thing about using this with Rust's ownership model is that collectors could safely be disposed of.
Once all the references are gone, you can clear the arena and reuse it for other tasks.
Essentially this would make it an arena allocator that conforms to the zerogc
interface. That would be super cool!
Let's say one day you decide the overhead of GC isn't worth it. If your tasks are short running enough (1-2 ms) and only allocate a couple of MB of memory, it'd be silly to collect them. Simply throw away the collector and reuse the memory for the next task.
I have a temporary implementation (for unit tests) in 772ffbd
However, when we finally get the real one, I think this should be a separate crate even though it would complicate testing.
Right now allocation requires atomic operations. We should use a thread-local buffer so this isn't required in the common-case.
This would be somewhat difficult to do since we can have multiple running instances. Would we have to use thread_local
? How does the performance overhead of that compare to using atomics?
Maybe we could make SmallObjectCache
a static variable shared between instances. However, if we do that we'd have to differentiate between allocations from different collectors (in for_each and the linked list). This could also result in worse performance if all the different collectors end up messing with each others stuff.
This was much easier when it was just a comment in a config file.......
This would remove the sweep phase of the collection and implicitly mark the arenas as dead instead of adding them to the free list. Instead of searching the free list, allocations would search for a free slot in the dead memory. This would amortize the sweeping cost across future allocations.
As of to 9a9634d, the binary_trees benchmark spent 30% of total time compacting (I accidentally said tracing). This would actually make the mark/sweep collector noticeably faster than the old copying collector.
This actually reduces the theoretical complexity of the algorithm ๐ฎ Normally, the collection is proportional to total objects (live + dead), while now it's only proportional to live objects.
This is discussed in Section 2.5 of The Garbage Collection Handbook (Jones, 2012). In addition to the obvious reduction in collection time, it improves the locality of sweeping. They state that it offers good throughput, especially when there are many dead objects (the same workloads copying collectors are best at).
This threatens to increase complexity, so I'm not quite sure about whether it's a great idea... It should probably be optional, although I'd put it on by default (eager sweeping could return memory to the operating system: #6).
If issues ever arise with the simple collector's pause times, this is definitely the most obvious optimization opportunity...
However, this is currently not a very high priority for me.
The pointer to a frozen collector's stack is only guaranteed to be valid for the lifetime of the FrozenContext. We store it in a hashtable.
Ideally, the user would always unfreeze!
through the standard API, which causes it to be removed.
However, if the user forgets to unfreeze (or runs mem::forget
) the frozen context, the old pointer will remain in the set of PersistentRoots
even if the underlying memory is freed (or moved).
Right now I actually even have this reflected in the API. If a FrozenContext
is ever dropped without unfreezing first it will panic with a "TODO message".
The code is undergoing temporary insanity ๐ง
See also c9d70d8 for the unsafe implementation of freezing simple collectors that we currently use.
When we finally fix this issue, we need to remember to remove all the TODOs in the code (and the FrozenContext destructor that panics).
I suspect this is due to the fact we reallocate the entire heap every collection, then copy everything into the new memory.
Normally implementations of Cheney's copying algorithm divide the heap into two equal halves. While the mutator is running only one of these heaps is active. During collection, the live objects are copied from the old heap to the new heap (implicitly compacting them). The old heap is then considered discarded (but not freed) and the mutator uses the new heap.
Both implementations need 2x the used memory during any given collection cycle (which is a major disadvantage to this algorithm). However, the standard implementation effectively caches the old heap while we reallocate it every time. In theory, this could result in less memory use during mutation (because we free the old memory instead of caching it), however allocating large chunks of memory generally require a mmap
syscall to the operating system. Accessing the new memory seems to trigger a (minor) page fault because it's not in the TLB or any of the caches.
We're wasting most of our time page-faulting, losing any performance gains we would otherwise win by bump-pointer allocation. We should switch to caching the old heap, like standard
Of course, right now the copying collector still has some SEGFAULTs, which are obviously higher-priority.
This is tedious and unsafe to perform by hand. This is essential if I'm going to be using this in production.....
This basically boils down to iterator invalidation. If we have two references to a vector,
a GcVec::push
on one may mutate the vector's length while the user coerced the other reference to a slice (or iterator).
To put it another way, it's impossible for a garbage collected vector/list to implement all of the following properties:
Copy
)[len, capacity)
uninitialized (as opposed to initialized with null
)This is a problem common to all garbage-collected languages, not just zerogc (as I describe below).
Most languages sidestep the problem by forbidding option 1 and refusing to give direct pointer-access to garbage-collected arrays.
Usually, forbidding option 1 is not a problem, since arrays in Java and lists in Python are builtin into the language and replace most of the users of pointers.
There are some exceptions though. Java's JNI allows raw access to an arrays's memory with GetPrimitiveArrayCritical
. However, there is no need to worry about mutating length because java arrays have a fixed size (although it does temporarily prevent garbage collection). Python's native CAPI exposes access to a list's raw memory with PyListObject->ob_items
, although native code must be careful of any user code changing .
This problem can also come up in unexpected places. For example, the C implementation of Python's list.sort
coerces the list into a raw pointer, then sorts the memory directly using the pointers. The original code continued to use the original pointer even if a user comparison, leading to bugs. The current implementation temporarily sets the length to zero and carefully guards against any mutations.
If we took the Java/Python approach of refusing to give out raw pointers, it would also prevent access as a slice.
In idiomatic rust, this is unrealistic, since slices are so pervasive.
As a temporary solution, I ended up forbidding option 2.
However, doesn't work well if you're trying to use GcVec
as the underlying implementation of Java arrays or Python lists. These languages allow users to create multiple references to an array or a list. In our current system, this would require double-indirection of Gc<GcVec<T>>
.
SIDENOTE: Right now GcVec
Instead, I propose a new system of four types and two traits (which will incidentally solve the write-barrier problem):
GcRawVec
- Allows multiple references, but has unsafe slice or pointer access
GcVec
- Effectively a RefCell<GcRawVec<T>>
. Includes safe slice access.GcVecBuffer
- A GcRawVec
that is !Copy
, but gains safe slice access.trait GcVecAccess
- Abstracts over different forms of GcVec
. Calls to push
require a GcContext.
Requires the user to pass a contexttrait GcArrayAccess
- Allows read/write access to arrays, without mutating their length.
GcArray
in addition to GcVec
TODO: Better naming for GcVec
and friends?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.