I'm leaving this here more as a note to myself and also to maybe get some early feedba

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Try to avoid copying when using strings about helix HOT 7 OPEN

flo-l commented on June 1, 2024 5

Try to avoid copying when using strings

from helix.

Comments (7)

chancancode commented on June 1, 2024

I think to keep in mind is safety. Ruby doesn't have a way to track ownership natively, if we took a Ruby string and didn't copy it, and we keep a borrow to the string while calling back into Ruby code, its content could be mutated, causing safety issues on the Rust side. One possible solution is to freeze the string while there is outstanding (readonly) borrow to the Ruby string and then "unfreeze" it when we are done.

However, Ruby does not actually expose the functionality to "unfreeze" a frozen object. Of course, it's implemented as a marker bit under-the-hood, and since we are operating at the C level, we can just "unflip" the bit. But I'm not sure if we would feel comfortable messing with the internals like that, given Ruby might perform other optimizations and who knows what other semantics there might be in the future.

Another solution is to perhaps implement that as a global "call back into the VM" lock.

from helix.

flo-l commented on June 1, 2024

That is indeed a good point I have not considered.

I don't know how the "call back into the VM" lock is best implemented. An option could be to wrap internal data of Ruby Objects (like the data pointer of String) in a custom type, which has a lifetime tied to the VALUE it belongs to, which in term has a lifetime tied to some "VM token", that needs to be moved into each call into the Ruby VM. The construction of these VM tokens should be unsafe, so that users don't cheat the system by creating a new one for each call. So each call into Ruby would need a VM token and return a VM token, where the latter can be used to make the next call into Ruby. That would ensure that programmers can't use data obtained from a previous Ruby VM call. If we make the VM token a zero-sized type this wouldn't add any runtime overhead, just code verbosity.

Or maybe accessing a Ruby Object should only be possible via either an unsafe method that does not copy, or a safe method that does. The docs could describe the invariants one has to uphold for the code to be safe if one uses the unsafe fn.

The two proposals could also be comined. So people either have to copy all Ruby data or avoid the copies and have to deal with VM tokens.

For me it seems the most common use case is implementing one Ruby method in pure Rust, with the goal of improving performance. So calling into Ruby from Rust seems orthogonal, as Ruby tends to be slow. But of course there are always reasons to do it anyway...

However, before attempting to tackle the problem I'm waiting for the repo owners to merge the master with the original branch, see #9. They have diverged massively, with original beeing way ahead in terms of features.

from helix.

wagenet commented on June 1, 2024

@flo-l main development is back on master now. Is this something you're still interested in?

from helix.

flo-l commented on June 1, 2024

I'd still be interested!

What do you think of the VM token idea I outlined above?

from helix.

wagenet commented on June 1, 2024

@chancancode @wycats ^

from helix.

flo-l commented on June 1, 2024

@chancancode @wycats ping :)

from helix.

wycats commented on June 1, 2024

Direction

Right now, Helix classes have an extra field in their struct that points back at the Ruby object. The Ruby object is created when the object crosses into Ruby, which makes it possible to cheaply create Helix classes in Rust without allocating a Ruby object (useful for creating intermediate objects for internal computation).

That field is a helix::Metadata, which today is just a simple alias to VALUE.

Ultimately, I think we should enrich that field to include ownership information. Straw man:

enum Ownership {
    // The struct is owned by Rust and is not wrapped in any Ruby object.
    // This is the starting state for a new Helix class, and can also be used
    // to model Helix methods that take Helix objects by value.
    Rust,

    // When a Helix object crosses into a Helix method that takes it using `&`,
    // its state is changed to Borrowed.
    Shared,

    // When a Helix object crosses into a Helix method that takes it using `&mut`,
    // its state is changed to Unique.
    Unique,

    // Once a Helix object crosses into Rust, its ownership state is "Ruby" until
    // it has crossed back into Rust.
    Ruby
}

// Note that if the state of a Helix object is already Unique, it cannot be passed into
// another Helix method. If the state of a Helix object is already Shared, it cannot
// be passed into another Helix method that takes it via `&mut`.

struct Metadata {
    // `value` is None until it crosses into Ruby for the first time
    value: Option<sys::VALUE>,
    ownership: Ownership
}

What this means is that when a Helix object crosses into Rust, we will either discover that the kind of ownership that the method requests is impossible or flip its ownership.

This is similar to the dynamic approach used by RefCell.

Note that this doesn't address being allowed to take &str from a Ruby String, since we can't put a Rust struct into an existing Ruby string. We could support taking a &str from a frozen string, and I think we should see whether that's sufficient for zero-copy use cases.

The original post here was also correct that we need to use rb_gc_register_address if we ever take ownership of a Ruby object and put it into a heap location (because the conservative GC will fail to mark it 😱). I think we should have a RootedValue struct that is implemented thusly:

struct RootedValue {
    inner: sys::VALUE
}

impl RootedValue {
    fn new(inner: sys::VALUE) -> RootedValue {
        unsafe { sys::rb_gc_register_address(*mut value) };
        RootedValue { inner }
    }
}

impl Drop for RootedValue {
    fn drop(&mut self) {
        unsafe { sys::rb_gc_unregister_address(*mut self.inner) };
    }
}

This will allow us to easily root Ruby objects that we have ownership of and want to temporarily move onto the heap. We probably want to automatically root any values passed into Rust by-value (if they get Rust ownership) to avoid safety footguns here.

from helix.

Try to avoid copying when using strings about helix HOT 7 OPEN

Comments (7)

Direction

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent