Giter Club home page Giter Club logo

Comments (7)

chancancode avatar chancancode commented on June 1, 2024

I think to keep in mind is safety. Ruby doesn't have a way to track ownership natively, if we took a Ruby string and didn't copy it, and we keep a borrow to the string while calling back into Ruby code, its content could be mutated, causing safety issues on the Rust side. One possible solution is to freeze the string while there is outstanding (readonly) borrow to the Ruby string and then "unfreeze" it when we are done.

However, Ruby does not actually expose the functionality to "unfreeze" a frozen object. Of course, it's implemented as a marker bit under-the-hood, and since we are operating at the C level, we can just "unflip" the bit. But I'm not sure if we would feel comfortable messing with the internals like that, given Ruby might perform other optimizations and who knows what other semantics there might be in the future.

Another solution is to perhaps implement that as a global "call back into the VM" lock.

from helix.

flo-l avatar flo-l commented on June 1, 2024

That is indeed a good point I have not considered.

I don't know how the "call back into the VM" lock is best implemented. An option could be to wrap internal data of Ruby Objects (like the data pointer of String) in a custom type, which has a lifetime tied to the VALUE it belongs to, which in term has a lifetime tied to some "VM token", that needs to be moved into each call into the Ruby VM. The construction of these VM tokens should be unsafe, so that users don't cheat the system by creating a new one for each call. So each call into Ruby would need a VM token and return a VM token, where the latter can be used to make the next call into Ruby. That would ensure that programmers can't use data obtained from a previous Ruby VM call. If we make the VM token a zero-sized type this wouldn't add any runtime overhead, just code verbosity.

Or maybe accessing a Ruby Object should only be possible via either an unsafe method that does not copy, or a safe method that does. The docs could describe the invariants one has to uphold for the code to be safe if one uses the unsafe fn.

The two proposals could also be comined. So people either have to copy all Ruby data or avoid the copies and have to deal with VM tokens.

For me it seems the most common use case is implementing one Ruby method in pure Rust, with the goal of improving performance. So calling into Ruby from Rust seems orthogonal, as Ruby tends to be slow. But of course there are always reasons to do it anyway...

However, before attempting to tackle the problem I'm waiting for the repo owners to merge the master with the original branch, see #9. They have diverged massively, with original beeing way ahead in terms of features.

from helix.

wagenet avatar wagenet commented on June 1, 2024

@flo-l main development is back on master now. Is this something you're still interested in?

from helix.

flo-l avatar flo-l commented on June 1, 2024

I'd still be interested!

What do you think of the VM token idea I outlined above?

from helix.

wagenet avatar wagenet commented on June 1, 2024

@chancancode @wycats ^

from helix.

flo-l avatar flo-l commented on June 1, 2024

@chancancode @wycats ping :)

from helix.

wycats avatar wycats commented on June 1, 2024

Direction

Right now, Helix classes have an extra field in their struct that points back at the Ruby object. The Ruby object is created when the object crosses into Ruby, which makes it possible to cheaply create Helix classes in Rust without allocating a Ruby object (useful for creating intermediate objects for internal computation).

That field is a helix::Metadata, which today is just a simple alias to VALUE.

Ultimately, I think we should enrich that field to include ownership information. Straw man:

enum Ownership {
    // The struct is owned by Rust and is not wrapped in any Ruby object.
    // This is the starting state for a new Helix class, and can also be used
    // to model Helix methods that take Helix objects by value.
    Rust,

    // When a Helix object crosses into a Helix method that takes it using `&`,
    // its state is changed to Borrowed.
    Shared,

    // When a Helix object crosses into a Helix method that takes it using `&mut`,
    // its state is changed to Unique.
    Unique,

    // Once a Helix object crosses into Rust, its ownership state is "Ruby" until
    // it has crossed back into Rust.
    Ruby
}

// Note that if the state of a Helix object is already Unique, it cannot be passed into
// another Helix method. If the state of a Helix object is already Shared, it cannot
// be passed into another Helix method that takes it via `&mut`.

struct Metadata {
    // `value` is None until it crosses into Ruby for the first time
    value: Option<sys::VALUE>,
    ownership: Ownership
}

What this means is that when a Helix object crosses into Rust, we will either discover that the kind of ownership that the method requests is impossible or flip its ownership.

This is similar to the dynamic approach used by RefCell.

Note that this doesn't address being allowed to take &str from a Ruby String, since we can't put a Rust struct into an existing Ruby string. We could support taking a &str from a frozen string, and I think we should see whether that's sufficient for zero-copy use cases.

The original post here was also correct that we need to use rb_gc_register_address if we ever take ownership of a Ruby object and put it into a heap location (because the conservative GC will fail to mark it 😱). I think we should have a RootedValue struct that is implemented thusly:

struct RootedValue {
    inner: sys::VALUE
}

impl RootedValue {
    fn new(inner: sys::VALUE) -> RootedValue {
        unsafe { sys::rb_gc_register_address(*mut value) };
        RootedValue { inner }
    }
}

impl Drop for RootedValue {
    fn drop(&mut self) {
        unsafe { sys::rb_gc_unregister_address(*mut self.inner) };
    }
}

This will allow us to easily root Ruby objects that we have ownership of and want to temporarily move onto the heap. We probably want to automatically root any values passed into Rust by-value (if they get Rust ownership) to avoid safety footguns here.

from helix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.