Giter Club home page Giter Club logo

y-crdt's Introduction

Y CRDT

A collection of Rust libraries oriented around implementing Yjs algorithm and protocol with cross-language and cross-platform support in mind. It aims to maintain behavior and binary protocol compatibility with Yjs, therefore projects using Yjs/Yrs should be able to interoperate with each other.

Project organization:

  • lib0 is a serialization library used for efficient (and fairly fast) data exchange.
  • yrs (read: wires) is a core Rust library, a foundation stone for other projects.
  • yffi (read: wifi) is a wrapper around yrs used to provide a native C foreign function interface. See also: C header file.
  • ywasm is a wrapper around yrs that targets WebAssembly and JavaScript API.

Other projects using yrs:

  • ypy - Python bindings.
  • yrb - Ruby bindings.

Feature parity among projects

yjs
(13.6)
yrs
(0.18)
ywasm
(0.18)
yffi
(0.18)
y-rb
(0.5)
y-py
(0.6)
ydotnet
(0.4)
yswift
(0.2)
YText: insert/delete
YText: formatting attributes and deltas
YText: embeded elements
YMap: update/delete
YMap: weak links ✅ 
(weak-links branch)
YArray: insert/delete
YArray & YText quotations ✅ 
(weak links branch)
YArray: move ✅ 
(move branch)
XML Element, Fragment and Text
Sub-documents
Shared collections: observers ✅ 
(incompatible with yjs)
Shared collections: recursive nesting
Document observers ✅ 
(incompatible with yjs)
Transaction: origins
Snapshots
Sticky indexes
Undo Manager
Awareness
Network provider: WebSockets ✅ 
(y-websocket)
✅ 
(yrs-warp)
✅ 
(y-rb_actioncable)
✅ 
(ypy-websocket)
Network provider: WebRTC ✅ 
(y-webrtc)
✅ 
(yrs-webrtc)

Maintainers

Sponsors

NLNET

Ably

y-crdt's People

Contributors

appflowy avatar arsnealx avatar cactter avatar davidbrochart avatar dmonad avatar eliias avatar garettcooper avatar gitter-badger avatar hamflx avatar heckj avatar himself65 avatar horusiath avatar jakeswenson avatar lsviana avatar ngasull avatar nugmanoff avatar pierrotsmnrd avatar s-panferov avatar stefanw avatar stellit avatar waidhoferj avatar y21 avatar zaynetro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

y-crdt's Issues

Decoding panics with slice range error

I have encountered a panic when decoding an update:

   6: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/eaadb8947b850a025404082f6297766c4680a42a/library/core/src/slice/index.rs:304:13
   7: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/eaadb8947b850a025404082f6297766c4680a42a/library/core/src/slice/index.rs:18:9
   8: yrs::updates::decoder::DecoderV2::read_buf
             at /var/home/roman/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.2/src/updates/decoder.rs:268:22
   9: yrs::updates::decoder::DecoderV2::new
             at /var/home/roman/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.2/src/updates/decoder.rs:220:29
  10: <yrs::updates::decoder::DecoderV2 as core::convert::From<&[u8]>>::from
             at /var/home/roman/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.2/src/updates/decoder.rs:282:9
  11: yrs::updates::decoder::Decode::decode_v2
             at /var/home/roman/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.2/src/updates/decoder.rs:21:27

Test case:

let data = vec![5, 188, 105, 167, 66, 82, 152, 165, 88, 130, 21, 227, 45, 207, 51, 164, 187, 138, 141, 203, 253, 226, 254, 210, 203, 107, 144, 128, 1, 178, 184, 152, 248, 58, 29, 39, 108, 238, 253, 249, 114, 19, 10, 154, 21, 7, 121, 114, 238, 41, 127, 186, 75, 179, 251, 217, 98, 110, 225, 156, 219, 246, 108];
let update = Update::decode_v2(data.as_slice()).unwrap();

I have tried Y.js and it also throws error: Uncaught RangeError: Invalid typed array length: 13500.

I encode my document with: doc.encode_state_as_update_v2(&StateVector::default())

There is a chance that my update data got corrupted or I am using too big client ID. I am still reporting this issue since it would be nice to catch panics on the Rust side.

Performance track of yjs vs yrs

I'm posting this here to provide some data for future reference between yjs vs yrs insert times over the comparable scenario:

let start = performance.now()
const doc = new Y.Doc()
const ytext = doc.getText('type')
doc.transact(function () {
    for (let i = 0; i < 1000000; i++) {
        ytext.insert(0, (i % 10).toString())
    }
})

console.log('time to apply prepends: ', performance.now() - start)
console.log('text length: ', ytext.toString().length)

const encodedState = Y.encodeStateAsUpdate(doc)
console.log('update message size: ', encodedState.length)

const doc2 = new Y.Doc()
const ytext2 = doc.getText('type')

start = performance.now()
Y.applyUpdate(doc2, encodedState)

console.log('time to apply update message: ', performance.now() - start)
console.log('text length: ', ytext2.toString().length)

Based on old ywasm repo.

Opera (Win10 64bit)

time to apply prepends:  1838.8900000136346
text length:  1000000
update message size:  10983497
time to apply update message:  1536.2449998501688
text length:  1000000

Node.js 16.0

time to apply prepends: 1958.1280000209808
text length:  1000000
update message size:  10983497
time to apply update message:  2839.4483001232147
text length:  1000000

Compatibility issue to yjs while integrate item with no parent

I've noticed that there is a slight difference between the yjs and yrs during the block integrate() when no parent is found for the current Item:
yjs: recognize the item as GC item and continue the integration:
https://github.com/yjs/yjs/blob/main/src/structs/Item.js#L525
yrs: raise the case as a panic:
https://github.com/y-crdt/y-crdt/blob/main/yrs/src/block.rs#L422

here is a test doc produced by yjs:
origin.base64url.txt

I've try to tread it as GC item in yrs but no luck.

`find_pivot`'s mid is out of index

I'm facing the bug that find_pivot's mid is out of index.

I tried to save doc in binary, so I made a new doc and encoded updates with new, empty doc's state_vector.
Randomly it makes error when I decode with that with another new empty doc.

yrs version: 0.11.1

Reproducing bug

  • repo: https://github.com/NamseEnt/namseent
  • head: f84883519fa1afdccc7a137ed4b958719a457386
  • directory: luda-editor/crdt
  • run cargo test
  • I captured encoded updates and put code hard to reproduce bug.

values

  • clock: 2
  • left: 0
  • right: 1
  • div: 1
  • current_clock: 1

Encoded update (v2)

[
                0, 0, 11, 176, 133, 128, 149, 31, 205, 190, 199, 196, 21, 7, 3, 0, 3, 5, 0, 17,
                168, 1, 8, 0, 40, 0, 8, 0, 40, 0, 8, 0, 40, 0, 33, 1, 39, 110, 91, 49, 49, 49, 114,
                111, 111, 116, 105, 51, 50, 114, 111, 111, 116, 115, 116, 114, 105, 110, 103, 114,
                111, 111, 116, 97, 95, 108, 105, 115, 116, 114, 111, 111, 116, 97, 95, 109, 97,
                112, 114, 111, 111, 116, 105, 51, 50, 95, 108, 105, 115, 116, 114, 111, 111, 116,
                105, 51, 50, 95, 109, 97, 112, 114, 111, 111, 116, 115, 116, 114, 105, 110, 103,
                95, 108, 105, 115, 116, 114, 111, 111, 116, 115, 116, 114, 105, 110, 103, 95, 109,
                97, 112, 65, 1, 4, 3, 4, 6, 4, 6, 4, 5, 4, 8, 4, 7, 4, 11, 4, 10, 3, 0, 5, 1, 6, 0,
                1, 0, 1, 0, 1, 2, 65, 8, 2, 8, 0, 125, 2, 119, 5, 119, 111, 114, 108, 100, 118, 2,
                1, 98, 119, 1, 97, 1, 97, 125, 1, 118, 2, 1, 98, 119, 1, 98, 1, 97, 125, 2, 125, 1,
                125, 2, 119, 1, 97, 119, 1, 98, 8, 0, 1, 141, 223, 163, 226, 10, 1, 0, 1,
            ]

Backtrace

0: rust_begin_unwind
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_bounds_check
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/panicking.rs:84:5
   3: <usize as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/slice/index.rs:242:10
   4: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/slice/index.rs:18:9
   5: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/alloc/src/vec/mod.rs:2591:9
   6: yrs::block_store::ClientBlockList::get
             at /home/namse/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.1/src/block_store.rs:214:26
   7: yrs::block_store::ClientBlockList::find_pivot
             at /home/namse/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.1/src/block_store.rs:258:25
   8: yrs::block_store::BlockStore::get_block
             at /home/namse/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.1/src/block_store.rs:416:21
   9: yrs::block::Item::repair
             at /home/namse/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.1/src/block.rs:1073:27
  10: yrs::update::Update::integrate
             at /home/namse/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.1/src/update.rs:216:33
  11: yrs::transaction::Transaction::apply_update
             at /home/namse/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.11.1/src/transaction.rs:397:41
  12: crdt::history_system::HistorySystem<State>::merge
             at ./src/history_system.rs:66:9
  13: crdt::tests::it_works
             at ./src/lib.rs:372:9
  14: crdt::tests::it_works::{{closure}}
             at ./src/lib.rs:19:5
  15: core::ops::function::FnOnce::call_once
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/ops/function.rs:248:5
  16: core::ops::function::FnOnce::call_once
             at /rustc/a8314ef7d0ec7b75c336af2c9857bfaf43002bfc/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test tests::it_works ... FAILED

Unicode failures

Hi, just wanted to report another issue I recently found:

#[cfg(test)]
mod tests {
	use yrs::{OffsetKind, Options};

	#[test]
	fn yrs_unicode() {
		let doc = yrs::Doc::with_options(Options {
			offset_kind: OffsetKind::Utf32,
			..Default::default()
		});

		let text1 = r#"“”"#;
		let text2 = r#"test"#;

		{
			let mut txn = doc.transact();
			let text = txn.get_text("content");
			text.insert(&mut txn, 0, text1);
			txn.commit();
		}

		{
			let mut txn = doc.transact();
			let text = txn.get_text("content");
			text.insert(&mut txn, 1, text2);
			eprintln!("{}", text.to_string());
			txn.commit();
		}
	}
}

Fails with:

failures:

---- crdt::tests::yrs_unicode stdout ----
thread 'crdt::tests::yrs_unicode' panicked at 'byte index 2 is not a char boundary; it is inside '“' (bytes 0..3) of `“”`', library/core/src/str/mod.rs:127:5
stack backtrace:

As I specify OffsetKind::Utf32, I expect that insert at index 1 should be right after the . It does work as expected if I replace text1 with 2-byte long unicode characters. The is 3-byte long.

Meaning of polyfils

I'm looking at https://github.com/yjs/y-crdt/blob/main/yrs-api-wrapper/src/lib.rs and try to figure out how to implement these polyfil methods:

// By merge updates we mean merge full docs, updates or updates + delete sets?
fn merge_updates(updates: &[&[u8]]) -> Vec<u8>;

// Updates don't have state vectors, documents do.
fn encode_state_vector_from_update(_update: &[u8]) -> Vec<u8>;

// Again updates don't have state vectors.
fn diff_updates(update: &[u8], _state_vector: &[u8]) -> Vec<u8>;

The names are not self-explanatory, as they seem to conflate the concepts fo updates, documents and delete sets.

Implement public PrelimText

Is there a reason why PrelimText is not implemented publicly? It would be nice to embed it inside a map.

#38 implemented only PrelimArray and PrelimMap. I can see that there is a PrelimText in block.rs but it returns a String type not a Text.

In my local experiments I implemented a PrelimText2 that returns a Text type and it kind of works.

pub struct PrelimText2<'a>(pub &'a str);

impl Prelim for PrelimText2<'_> {
    fn into_content(self, _txn: &mut Transaction) -> (ItemContent, Option<Self>) {
        let inner = Branch::new(TYPE_REFS_TEXT, None);
        (ItemContent::Type(inner), Some(self))
    }

    fn integrate(self, txn: &mut Transaction, inner_ref: BranchPtr) {
        let text = Text::from(inner_ref);
        text.push(txn, self.0);
    }
}

I was wondering whether there are any limitations of not supporting embedding a Text type into a map.

Roadmap

Planned for Nov/Dec 2021 - these are the highest priority right now:

  • Implement observers API on shared types, both root level and nested ones.
  • Start counting text indexes and offsets using unicode code points instead of language-arbitrary string length representation.

Future goals - order is yet to be prioritized:

  • Port UndoManager(realised in v0.15)
  • Support for sub documents (realised in v0.14)
  • Help in providing language bindings for:
    • Python
    • JVM (Jan 2023 update: in progress)
    • Swift (Jan 2023 update: in progress)
  • Support for lib0 v2 encoding.
  • Implementation of "move" operation to change the position of existing contents in a conflict-free way. (realised ~v0.12)

And as always - ongoing work on optimization of the library and fixing bugs found along the way.

Delete multi-byte character from text will cause panic

Hi,
I'm unsure if I misused the API.
Should the len value pass to text::remove_range() correspond to the offset_kind of doc?

To Reproduce

// yrs = "0.10.2"
// lib0 = "0.10.2"

use yrs::{
    types::text::*,
    Doc
};

#[test]
fn reproduce() {
    let doc = Doc::new();
    let mut txn = doc.transact();
    let txt = txn.get_text("test");

    txt.insert(&mut txn, 0, "😭😊");
    txt.remove_range(&mut txn, 0, "😭".len() as u32);

    assert_eq!(txt.to_string().as_str(), "😊");
}
thread 'reproduce' panicked at 'byte index 2 is not a char boundary; it is inside '😭' (bytes 0..4) of `😭😊`'

`y_text.format` edge case

Testing with ywasm I found some interesting behavior with format(). Seems like the delta returned from the observer only picks up changes when the last character is included in the formatting block. For example:

const x = d1.getText('test')
let target = null
let delta = null
let observer = x.observe(e => {
    target = e.target
    delta = e.delta
})

transact(d1,txn => x.insert(txn, 0, 'abcd'))

let start = 1
let len = 3
transact(d1,txn => x.format(txn, start, len, { bold: true }))
console.log(delta)

When len = 3 I get the expected delta: [ { retain: 1 }, { retain: 3, attributes: { bold: true } }
But when len = 2 the delta is an empty array: []
I'd expect something like [ { retain: 1 }, { retain: 2, attributes: { bold: true } }

Is this correct behavior?

Using lib0 in a server app

Hello and thanks for a great library! I am trying to use it in my app, and I noticed that for the most part lib0 assumes that its input is safe. This makes it not ideal for server applications since malicious users can easily crash your server by sending an invalid lib0 payload.

What are your thoughts on using more Result types in code?

Release tags

I'm trying to package y-py for conda-forge in conda-forge/staged-recipes#17076. The package is build from sources. In order to trigger the build for new versions, a tag must be applied to the GitHub repository.
Would it be possible to tag 0.1.3?

Implementing Send Trait for the Doc Type

Hi, I've been experimenting with the use of Yrs for a personal project as it meets all of my CRDT requirements. However, the Doc type being !Send is a signficant barrier for adoption, as I would need to significantly restructure the multi-threaded asynchronous context of my application to work around it. I'm exploring alternatives like reconstructing the Doc from a shared encoded State Vector and the alternative update API, but it would be much easier to wrap an Arc<Mutex<_>> around the Doc.

I haven't delved into the code of Yrs too deeply, but the use of the Rc<UnsafeCell<_>> wrapper around Store seems like the most significant barrier to thread-safety. It looks like this abstraction has yet to be set in stone based on #4, so I'm wondering if it could be adapted to be Send at the same time.

Unexpected diff result

Hi,
I found an issue that if I insert some text with attributes that contains a key-value already in attributes of position before the current index, this key-value will not be in the pub fn diff(&self, txn: &mut Transaction) -> Vec<Diff> result.

To Reproduce

// yrs = "0.10.0"
// lib0 = "0.10.0"

use std::rc::Rc;
use lib0::any::Any;
use yrs::{
    types::text::*,
    types::{Attrs, Value},
    Doc,
};

#[test]
fn reproduce() {
    let doc = Doc::new();
    let mut txn = doc.transact();
    let text = txn.get_text("text");

    // insert "abc" at index 0 with { "a": "a"}
    text.insert_with_attributes(
        &mut txn,
        0,
        "abc",
        Attrs::from([(Rc::from("a"), Any::String(Box::from("a")))]),
    );

    // insert "def" at index 3 with { "a": "a", "b": "b"}
    text.insert_with_attributes(
        &mut txn,
        3,
        "def",
        Attrs::from([
            (Rc::from("a"), Any::String(Box::from("a"))),
            (Rc::from("b"), Any::String(Box::from("b"))),
        ]),
    );

    /*
    expect:
    [
        {
            insert: "abc",
            attributes: {
                "a": "a"
            }
        },
        {
            insert: "def",
            attributes: {
                "a": "a",
                "b": "b"
            }
        }
    ]
    */
    let expect = vec![
        Diff::Insert(
            Value::Any(Any::String(Box::from("abc"))),
            Some(Box::from(Attrs::from([(
                Rc::from("a"),
                Any::String(Box::from("a")),
            )]))),
        ),
        Diff::Insert(
            Value::Any(Any::String(Box::from("def"))),
            Some(Box::from(Attrs::from([
                (Rc::from("a"), Any::String(Box::from("a"))), // lost
                (Rc::from("b"), Any::String(Box::from("b"))),
            ]))),
        ),
    ];

    /*
    unexpected:
    [
        {
            insert: "abc",
            attributes: {
                "a": "a"
            }
        },
        {
            insert: "def",
            attributes: {
                "b": "b"
            }
        }
    ]
    */
    assert!(text.diff(&mut txn).eq(&expect));
}

Question about usage with a remote server

Hey! I'm trying to use ycrdt in a Rust project and I can't seem to wrap my head around how I would use it in a client server scenario.
The canonical example looks like this:

 let doc = Doc::new();
    let mut txn = doc.transact(); // all Yrs operations happen in scope of a transaction
    let root = txn.get_text("root-type-name");
    root.push(&mut txn, "hello world"); // append text to our collaborative document

    // in order to exchange data with other documents we first need to create a state vector
    let remote_doc = Doc::new();
    let mut remote_txn = remote_doc.transact();
    let state_vector = remote_txn.state_vector().encode_v1();

    // now compute a differential update based on remote document's state vector
    let update = txn.encode_diff_v1(&StateVector::decode_v1(&state_vector));

    // both update and state vector are serializable, we can pass the over the wire
    // now apply update to a remote document
    remote_txn.apply_update(Update::decode_v1(update.as_slice()));

This is easy to understand, but it only works because the client part (doc) seems to have access to the remote document (remote_doc). I'm trying to replicate this behaviour using web sockets and I'm running into the following issue:

Generating the update requires access to the local txn and to the remote_txn in order to be generated. But there's no side where I have both of these. On the client I only have the txn and on the server only the remote_txn. So I can't generate the serialised data to send over the wire in order to update a remote document. I'd love to have a simple example of how this would work.

I can think of two approaches to make this work, but it feels like I am overthinking this and there ought to be a simpler solution.

First, a two-roundtrip approach, where the client asks the server for it's current state_vector, then generates the update, and then sends that back to the server. The same in the other direction. This roundtrip is quite difficult to implement though as in between this sequence of messages other messages can sneak in and have to be handled

The other approach is to have N+1 document instances on the server: One root document, and one document replicating each connected client. That allows me to generate the update payloads on the server, however I have the feeling that that's not how it is intended to be used.

I think a brief explanation of what a typical client / server setup should look like would be great.

Thanks!

YArray's iterator hides elements after calling move_to

  1. Build an array
  2. Move last element to be first
  3. Access element directly --> order is correct
  4. Iterate over array --> all elements after first are missing
  5. Push to array --> see all elements again
fn read_chars(txn: &mut yrs::Transaction) -> Vec<String> {
    txn.get_array("chars")
        .iter()
        .map(|v| v.to_string())
        .collect::<Vec<_>>()
}

let doc = yrs::Doc::new();
let txn = &mut doc.transact();

// Create an array
let ychars = txn.get_array("chars");
ychars.push_back(txn, "A");
ychars.push_back(txn, "B");

assert_eq!(vec!["A".to_string(), "B".to_string()], read_chars(txn));

// Move last item to be the first
ychars.move_to(txn, 1, 0);

// Accessing element directly works
assert_eq!(
    Some("A".to_string()),
    txn.get_array("chars").get(1).map(|v| v.to_string())
);

// But this fails with:
// 'test_yarray_move' panicked at 'assertion failed: `(left == right)`
// left: `["B", "A"]`,
// right: `["B"]`
assert_eq!(vec!["B".to_string(), "A".to_string()], read_chars(txn));

// But after we push to the same yarray then things recover.
ychars.push_back(txn, "C");
assert_eq!(
    vec!["B".to_string(), "A".to_string(), "C".to_string()],
    read_chars(txn)
);

Milestone #1

Regarding M1:

Y.applyUpdate(Y.Doc, update:Uint8Array, [transactionOrigin:any])

Ideally I'd try to deserialize update in first step into Rust object, then apply that rust object as an update.

let updates: Vec<Update> = Vec::decode(&mut decoder);
doc.apply_updates(updates);

But I don't know how feasible is doing that upfront. If it's not then proposed API:

impl Doc {
  /// D is implemented by DecoderV1 or DecoderV2, eg.:
  /// ```rust
  /// let mut decoder = DecoverV1::new(binary);
  /// let mut doc = Doc::new();
  /// doc.apply_update(&mut decoder)?;
  /// ```
  pub fn apply_update<D: Decoder>(&mut self, decoder: &mut D) -> Result<()>;
}

Rationale: in Rust there's no sense in making static methods that operate over specific local variables eg. doc.apply_update(decoder) is the same thing as Doc::apply_update(doc, decoder). I also proposed using generics, as Rust will recognize them at call site and devirtualize. Downside of that approach is that generics cannot be exposed in WASM and C FFI. For that, we could work around different decoder version with:

pub enum Decoder { 
  V1(DecoderV1), 
  V2(DecoderV2) 
}

impl Doc {
  pub fn apply_update(&mut self, &mut decoder: Decoder) -> Result<()>;
}

I don't know about all possible use cases of transactionOrigin, so I didn't include it in the API.

Y.encodeStateAsUpdate(Y.Doc, [encodedTargetStateVector:Uint8Array]): Uint8Array

Would it be plausible to separate principles like update generation from serialization/deserialization?

impl Doc {
  pub fn get_updates_since(&self, version: &StateVector) -> Vec<Update>;
}

// Update implements Encode
pub struct Update {
  client_blocks: Vec<Block>,
  delete_set: DeleteSet,
}

let version = StateVector::decode(&mut decoderV1);
let doc = Doc::new();
let updates = doc.get_updates_since(version);
let mut output = Vec::new();
let mut encoder = EncoderV1::new(&mut output);
updates.encode(encoder);

Y.encodeStateVector(Y.Doc): Uint8Array

Similar as above?

impl Doc {
  pub fn state_vector(&self) -> &StateVector;
}

let doc = Doc::new();
let version = doc.state_vector();
let mut output = Vec::new();
let mut encoder = EncoderV1::new(&mut output);
version.encode(encoder);

I admit it's way easier to test them this way.

ydoc.on('update', eventHandler: function(update: Uint8Array, origin: any, doc: Y.Doc))

I must admit that I have the most problems thinking about this one. Rust is well known of using safe approach to pub/sub mechanics via introduction of Streams. However streams all pull-based, while Observables/EventHandlers represent push-based paradigm. However given C FFI and WASM support, we probably should just go with callback support. Therefore proposed API:

impl Doc {
  pub fn on_update<F: : Fn(&[u8], &Doc)->()>(&mut self, fn: F) -> Subscription
}

Subscription here implements drop trait → droping subscription object works as unsubscribe. Since I'm not sure what origin could, be I left it unrepresented.

Y.logUpdate(Uint8Array)

Originally posted by @Horusiath in #2 (comment)

TextEvent.delta is not reporting correct deletions

Hi, I played with a library for some time and I think I found a bug with how delta for deletions is reported.

This is a repro:

#[cfg(test)]
mod tests {
	use yrs::{OffsetKind, Options};

	#[test]
	fn yrs_delete() {
		let doc = yrs::Doc::with_options(Options {
			offset_kind: OffsetKind::Utf32,
			..Default::default()
		});

		let text1 = r#"
		Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Eleifend mi in nulla posuere sollicitudin. Lorem mollis aliquam ut porttitor. Enim ut sem viverra aliquet eget sit amet. Sed turpis tincidunt id aliquet risus feugiat in ante metus. Accumsan lacus vel facilisis volutpat. Non consectetur a erat nam at lectus urna. Enim diam vulputate ut pharetra sit amet. In dictum non consectetur a erat. Bibendum at varius vel pharetra vel turpis nunc eget lorem. Blandit cursus risus at ultrices. Sed lectus vestibulum mattis ullamcorper velit sed ullamcorper. Sagittis nisl rhoncus mattis rhoncus.

		Sed vulputate odio ut enim. Erat pellentesque adipiscing commodo elit at imperdiet dui. Ultricies tristique nulla aliquet enim tortor at auctor urna nunc. Tincidunt eget nullam non nisi est sit amet. Sed adipiscing diam donec adipiscing tristique risus nec. Risus commodo viverra maecenas accumsan lacus vel facilisis volutpat est. Donec enim diam vulputate ut pharetra sit amet aliquam id. Netus et malesuada fames ac turpis egestas sed tempus urna. Augue mauris augue neque gravida. Tellus orci ac auctor augue mauris augue. Ante metus dictum at tempor. Feugiat in ante metus dictum at. Vitae elementum curabitur vitae nunc sed velit dignissim. Non arcu risus quis varius quam quisque id diam vel. Fermentum leo vel orci porta non. Donec adipiscing tristique risus nec feugiat in fermentum posuere. Duis convallis convallis tellus id interdum velit laoreet id. Vel eros donec ac odio tempor orci dapibus ultrices in. At varius vel pharetra vel turpis nunc eget lorem. Blandit aliquam etiam erat velit scelerisque in.
		"#;

		let text2 = r#"test"#;

		{
			let mut txn = doc.transact();
			let text = txn.get_text("content");
			text.insert(&mut txn, 0, text1);
			txn.commit();
		}

		{
			let mut txn = doc.transact();
			let text = txn.get_text("content");
			text.insert(&mut txn, 100, text2);
			txn.commit();
		}

		{
			let mut txn = doc.transact();
			let mut text = txn.get_text("content");

			let count = text1.chars().count() as u32 + text2.chars().count() as u32;

			let observer = text.observe(move |txn, edit| {
                                 // THIS ASSERT FAILS!
				assert_eq!(edit.delta(txn)[0], yrs::types::Delta::Deleted(count))
			});

			text.remove_range(&mut txn, 0, count);
			txn.commit();
		}

		{
			let mut txn = doc.transact();
			let text = txn.get_text("content");
			assert_eq!(text.to_string(), "");
		}
	}
}

I expect the Delta::Deleted to report the exact count of characters I requested, but it fails with:

failures:

---- crdt::tests::yrs_delete stdout ----
thread 'crdt::tests::yrs_delete' panicked at 'assertion failed: `(left == right)`
  left: `Deleted(104)`,
 right: `Deleted(1691)`', crdt.rs:235:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

So apparently it fails to report all the characters starting after the "test" I inserted in a second transaction.

Please let me know if that's just me and I'm using the API incorrectly.

Todos

Todo items we collected in the last PR review:

  • Document Block::integrate method (bool)
  • Prevent double-commit of transactions

Milestones

Milestone 1

  • #6 Implementation of the Yjs update format + differential updates which was not in the initial work description.
  • Implement the API described in https://docs.yjs.dev/api/document-updates
  • Testing: Create a script that generates Yjs documents based on a seed number. Read these encoded Yjs documents with Yrs' update API and encode them again. The documents should contain the same content.

Milestone 2

  • Implementation of the shared types API (read & write API). We get as far as we get. We won't be able to implement all features (the delta format & event system is quite complicated). Additionally to the original work description, we will plan to calculate the differences after a transaction is finished.

Milestone 3

• Port to WASM and create language bindings for Python. The python bindings weren't in the original work description.
• Performance analysis compared to Yjs and Automerge

ywasm/www needs a 2 minute update?

First, thanks for all the awesome work on this project. It's pretty darned cool.

I am struggling a bit to get all the pieces working, however.

I think the examples in ywasm/www are perhaps outdated and could use some love.

index.js seems to be using an old wasm.Doc() class instead of the perhaps newer wasm.YDoc() class?

The "insert"s there... maybe need to be in a transaction in the newer way of doing things?

I suspect someone on top of the evolution of the project could update the index.js there in about 2 minutes to properly print out some statistics in a browser the way it was meant to, but I have yet to get it to go.

Thanks for any leads and updates!

Also, the README.md seems to be a generic readme for create-wasm-app. Could someone perhaps do a quick cut-and-paste from their terminal after they make this go and commit it? Seems like maybe it's supposed to be something like
cd y-crdt/ywasm/www
npm run build
npm start
perhaps a note that mentions you need to build the pkg in y-crdt/ywasm too? (And, the preferred way to do that?)

Thanks!

Error to decode update v2 on specific case

yrs version: 0.11.2

I failed to decode update_v2 with this buffer,

#[test]
fn yrs_decode_test() {
    use yrs::updates::decoder::Decode;

    let buffer = [
        0x00, 0x00, 0x06, 0xd0, 0x87, 0xf7, 0xf9, 0x0d, 0x02, 0x03, 0x02, 0x01, 0x00, 0x00, 0x07,
        0x28, 0x00, 0x27, 0x00, 0x28, 0x01, 0x27, 0x52, 0x47, 0x72, 0x6f, 0x6f, 0x74, 0x5f, 0x5f,
        0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x5f, 0x5f, 0x72, 0x6f, 0x6f, 0x74, 0x73, 0x65,
        0x71, 0x75, 0x65, 0x6e, 0x63, 0x65, 0x69, 0x64, 0x6e, 0x61, 0x6d, 0x65, 0x63, 0x75, 0x74,
        0x73, 0x72, 0x6f, 0x6f, 0x74, 0x63, 0x68, 0x61, 0x72, 0x61, 0x63, 0x74, 0x65, 0x72, 0x73,
        0x72, 0x6f, 0x6f, 0x74, 0x66, 0x61, 0x63, 0x65, 0x5f, 0x65, 0x78, 0x70, 0x72, 0x65, 0x73,
        0x73, 0x69, 0x6f, 0x6e, 0x73, 0x04, 0x0b, 0x04, 0x08, 0x02, 0x44, 0x01, 0x0a, 0x04, 0x10,
        0x05, 0x01, 0x01, 0x00, 0x02, 0x01, 0x03, 0x01, 0x00, 0x01, 0x02, 0x41, 0x01, 0x01, 0x07,
        0x00, 0x7d, 0x01, 0x77, 0x15, 0x56, 0x39, 0x55, 0x6b, 0x39, 0x70, 0x78, 0x55, 0x4b, 0x5a,
        0x49, 0x72, 0x57, 0x36, 0x63, 0x4f, 0x6b, 0x43, 0x30, 0x52, 0x67, 0x77, 0x0c, 0x6e, 0x65,
        0x77, 0x20, 0x73, 0x65, 0x71, 0x75, 0x65, 0x6e, 0x63, 0x65, 0x00,
    ];

    yrs::Update::decode_v2(&buffer).unwrap();
}

Error message was EndOfBuffer on decode_v2's unwrap().

So I tried to make a reproducing step,

#[test]
fn yrs_decode_test_2() {
    use lib0::any::Any;
    use yrs::updates::decoder::Decode;

    let doc = yrs::Doc::new();
    let mut txn = doc.transact();
    let root = txn.get_map("root");

    root.insert(&mut txn, "sequence", yrs::PrelimMap::<bool>::new()); //NOTE: This is how I put nested map.
    let sequence = root.get("sequence").unwrap().to_ymap().unwrap();
    sequence.insert(&mut txn, "id", "V9Uk9pxUKZIrW6cOkC0Rg".to_string());
    sequence.insert(&mut txn, "cuts", yrs::PrelimArray::<_, Any>::from([]));
    sequence.insert(&mut txn, "name", "new sequence".to_string());

    root.insert(&mut txn, "__version__", 1);
    root.insert(
        &mut txn,
        "face_expressions",
        yrs::PrelimArray::<_, Any>::from([]),
    );
    root.insert(&mut txn, "characters", yrs::PrelimArray::<_, Any>::from([]));

    let buffer = doc.encode_state_as_update_v2(&yrs::StateVector::default());

    yrs::Update::decode_v2(&buffer).unwrap();
}

And it failed like first code.

I had a some experiment, I commented some insertion code and ran it, then it wasn't failed.

Non-failed-cases-with-comments
#[test]
fn no_error_1() {
    use lib0::any::Any;
    use yrs::updates::decoder::Decode;

    let doc = yrs::Doc::new();
    let mut txn = doc.transact();
    let root = txn.get_map("root");

    root.insert(&mut txn, "sequence", yrs::PrelimMap::<bool>::new());
    let sequence = root.get("sequence").unwrap().to_ymap().unwrap();
    sequence.insert(&mut txn, "id", "V9Uk9pxUKZIrW6cOkC0Rg".to_string());
    sequence.insert(&mut txn, "cuts", yrs::PrelimArray::<_, Any>::from([]));
    sequence.insert(&mut txn, "name", "new sequence".to_string());

    root.insert(&mut txn, "__version__", 1);
    root.insert(
        &mut txn,
        "face_expressions",
        yrs::PrelimArray::<_, Any>::from([]),
    );
    // root.insert(&mut txn, "characters", yrs::PrelimArray::<_, Any>::from([]));

    let buffer = doc.encode_state_as_update_v2(&yrs::StateVector::default());

    yrs::Update::decode_v2(&buffer).unwrap();
}

#[test]
fn no_error_2() {
    use lib0::any::Any;
    use yrs::updates::decoder::Decode;

    let doc = yrs::Doc::new();
    let mut txn = doc.transact();
    let root = txn.get_map("root");

    // root.insert(&mut txn, "sequence", yrs::PrelimMap::<bool>::new());
    // let sequence = root.get("sequence").unwrap().to_ymap().unwrap();
    // sequence.insert(&mut txn, "id", "V9Uk9pxUKZIrW6cOkC0Rg".to_string());
    // sequence.insert(&mut txn, "cuts", yrs::PrelimArray::<_, Any>::from([]));
    // sequence.insert(&mut txn, "name", "new sequence".to_string());

    // root.insert(&mut txn, "__version__", 1);
    // root.insert(
    //     &mut txn,
    //     "face_expressions",
    //     yrs::PrelimArray::<_, Any>::from([]),
    // );
    root.insert(&mut txn, "characters", yrs::PrelimArray::<_, Any>::from([]));

    let buffer = doc.encode_state_as_update_v2(&yrs::StateVector::default());

    yrs::Update::decode_v2(&buffer).unwrap();
}

#[test]
fn yrs_decode_test_3() {
    use lib0::any::Any;
    use yrs::updates::decoder::Decode;

    let doc = yrs::Doc::new();
    let mut txn = doc.transact();
    let root = txn.get_map("root");

    root.insert(&mut txn, "sequence", yrs::PrelimMap::<bool>::new());
    let sequence = root.get("sequence").unwrap().to_ymap().unwrap();
    sequence.insert(&mut txn, "id", "V9Uk9pxUKZIrW6cOkC0Rg".to_string());
    sequence.insert(&mut txn, "cuts", yrs::PrelimArray::<_, Any>::from([]));
    // sequence.insert(&mut txn, "name", "new sequence".to_string());

    root.insert(&mut txn, "__version__", 1);
    root.insert(
        &mut txn,
        "face_expressions",
        yrs::PrelimArray::<_, Any>::from([]),
    );
    root.insert(&mut txn, "characters", yrs::PrelimArray::<_, Any>::from([]));

    let buffer = doc.encode_state_as_update_v2(&yrs::StateVector::default());

    yrs::Update::decode_v2(&buffer).unwrap();
}

Make safe read/write transactions

An idea here is that we could potentially introduce some degree of thread safety into the stack - by using RefCell and splitting transactions apart into read-only ones (which can be fairly lightweight as they don't need to commit or contain list of changes made during transaction) and read-write ones (that are similar to what they are right now):

put struct Doc(RefCell<Store>);

impl Doc {
  fn transact_mut<'a>(&'a self) -> MutTxn<'a> { MutTxn(self.0.borrow_mut()) }
  fn transact<'a>(&'a self) -> Txn<'a> { Txn(self.0.borrow()) }
}

// trait used for read-only Yrs operations
pub trait ReadTxn {
    fn store(&self) -> &Store;
}

// trait used fro read-write Yrs operations
pub trait WriteTxn: ReadTxn {
    fn store_mut(&mut self) -> &mut Store;
}

// Read-write transaction. Only one can be active at the time.
// It implements both [ReadTxn] and [WriteTxn].
pub struct MutTxn<'txn>(RefMut<'txn, Store>, /* other fields */);
impl<'txn> ReadTxn for MutTxn<'txn> { /* todo*/ }
impl<'txn> WriteTxn for MutTxn<'txn> { /* todo*/ }

// Read-only transaction. Possibly multiple active at the time.
// It basically doesn't need any other fields rather than store reference.
// It only implements [ReadTxn].
pub struct Txn<'txn>(Ref<'txn, Store>);
impl<'txn> ReadTxn for MutTxn<'txn> { /* todo*/ }

This would require us to introduce some generics into the method signatures eg. operations which only use read-side of transactions, would follow up signature like fn to_string<T: ReadTxn>(&self, txn: &T), while the read-write ones would become fn remove_range<T: WriteTxn>(&self, txn: &mut T, index: u32, len: u32). This way Txn would work only for read operations, while MutTxn would work for both read/write .

This is not to far from original idea, however while there were several issues with exposing structs with lifecycle params in the interop code, the biggest one at the time was happening in trying to expose transaction as a parameter of subscriber callbacks. I believe we could possibly work around this by universal casting to C's (void*) like:

// C-API - this way we can get rid of lifecycle parameter, 
// which is a no go in C FFI and WASM.
#[repr(C)]
struct YTxn(*const c_void);

impl<'a, 'txn> From<&'a Txn<'txn>> for YTxn {
    fn from(txn: &'a Txn<'txn>) -> Self {
        unsafe {
            let ptr = txn as *const Txn<'txn> as *const c_void;
            UnsafeTxn(ptr)
        }
    }
}

impl Deref for YTxn {
    // lifetime params doesn't matter in C code
    type Target = Txn<'static>;
    
    fn deref(&self) -> &Self::Target {
        unsafe {
            let txn = (self.0 as *const Txn<'static>).as_ref().unwrap();
            txn
        }
    }
}

Building tests-ffi fails on windows machine and HEAP CORRUPTION during test

Howdy,

CMakeList issue with .a lib copy on windows machine

 Error copying file "C:/y-crdt/tests-ffi/../target/debug/libyrs.a" to "C:/y-crdt/tests-ffi/lib"

This is due to the copy command looking for librys.a explicitly:

add_custom_target(yrs-deps
  # DEBUG
  COMMAND ${CMAKE_COMMAND} -E copy "${PROJECT_SOURCE_DIR}/../target/debug/libyrs.a" "${PROJECT_SOURCE_DIR}/lib
)

The fix is pretty easy for that, I can submit a simple pull request to switch based on WIN32 presence.

if(WIN32)
  add_custom_target(yrs-deps

    # DEBUG
    COMMAND ${CMAKE_COMMAND} -E copy "${PROJECT_SOURCE_DIR}/../target/debug/yrs.lib" "${PROJECT_SOURCE_DIR}/lib"
  )
else()
  add_custom_target(yrs-deps

    # DEBUG
    COMMAND ${CMAKE_COMMAND} -E copy "${PROJECT_SOURCE_DIR}/../target/debug/libyrs.a" "${PROJECT_SOURCE_DIR}/lib"
  )
endif()

Once compiled, running tests-ffi leads to HEAP CORRUPTION:
image

CMake generated with MSVC info here:
image

Any thoughts on what the issue might be? I'm trying to integrate into Unreal Engine, and Windows is unfortunately the preferred platform.

Support `Clone` for Doc

I found that yrs's Doc doesn't support Clone.

In my case, encode_state_as_update_v2 takes about 44ms so it blocked the client ui thread so I wanted to clone it to other context and encode in that context. It seems not possible for now because Doc doesn't support Clone.

Is there any plan for supporting Clone for Doc?

Expose RelativePosition

Hello :)

I'm trying to implement remote cursor presence indication, highlight ranges, and other features (using the Text data type) which require RelativePosition be exposed, serializable, and convertible to absolute positions – presumably similar to the functionality of Y.RelativePosition.

It would appear that efforts to this effect have already been started, but are as yet unfinished.

I would be happy to work on a pull request for this, but I figured I should submit an issue first to ascertain whether existing branches may be out there, if such a PR would be welcomed, and if there are any gotchas or policies I might need to be aware of.

Any thoughts?

Thank you!

yffi example has invalid code

I tried the yffi example code on https://github.com/y-crdt/y-crdt/tree/main/yffi, but I found two mistakes.

  1. There are no txt1 and t1. I assume they are txt and txn.
    YTransaction* txn = ytransaction_new(doc);
    YText* txt = ytext(txn, "name");
    
    // append text to our collaborative document
    ytext_insert(txt1, t1, 0, "hello world");
  1. ytext_destroy is not available.
 ytext_destroy(txt);

More efficient encoding

This check effectively checks whether an integer can be encoded in 53 bit (8 byte effectively if we use float encoding). However, the same 53bit number would also take up 8 byte using variable length encoding (encoding in 7bit steps) + there is the cost of computing this value.

y-crdt/lib0/src/any.rs

Lines 121 to 123 in 6f9166c

if num_truncated == *num
&& num_truncated <= crate::number::F64_MAX_SAFE_INTEGER
&& num_truncated >= crate::number::F64_MIN_SAFE_INTEGER

In the javascript lib0 package I check whether a number can be encoded in 31 bit. Then, we can be certain that the encoded size is better than 8 byte float encoding.

I chose 31 bit because at the time we didn't want to encode anything larger than 31 bit. There is probably a middle ground here. Neither 31bit nor 53bit is optimal.

[Pacckaging] include option to not vendor yrs in ypy

The policy in conda-forge and for most linux distributions is to forbid Python packages to vendor binary dependencies that could be packaged independently.

A good exemple on how to do that is pyzmq which does bundle zeromq on PyPI but for which the conda-forge package depends on the zeromq conda-forge package. At build time, the building and vendoring of zeromq is disabled by specifying the installation prefix of zeromq through the ZMQ_PREFIX environment variable.

We will need to do the same for the case of pyrs.

Docstring on `encode_update_v1` (etc) gives the wrong impression

Encodes the document state to a binary format.

this sounds like it will encode the entire document, not just the changes. But in my testing, it seems to only encode the changes that were applied in that transaction.
(This is very fair, but I was looking for an easy 'encode document as bytes' option and this one seemed the most obvious.)

What I tried (and I think I was wrong to try):

doc.transact().encode_update_v1()

What seems to actually work:

doc.encode_state_as_update_v1(&StateVector::default())

Should it say 'Encodes the document update to a binary format' instead?

Ideas about future enhancements

  • Create a struct for clientID with custum hashing function and custom ordering.
  • Change type.map to Option<Box<HashMap>>
  • Create a custom Map with a custom Hasher.
  • Create a default HashMap & HashSet type that uses a standard hasher that we use across y-crdt and lib0

[lib0] Decoding Compatibility with UpdateDecoderV2.readJSON in yjs

The readJSON in UpdateDecoderV2 in yjs did a further decoder look up while reading the typeRef with value "6"

<anonymous> (..../node_modules/lib0/dist/buffer-ac2cdedf.cjs:1408) // the last array element in lib0/decoding.readAnyLookupTable
readAny (..../node_modules/lib0/dist/buffer-ac2cdedf.cjs:1429) // it reads the first byte to allocate the real decoder
readJSON (..../node_modules/yjs/dist/yjs.cjs:913) 
readContentFormat (..../node_modules/yjs/dist/yjs.cjs:8321)
readItemContent (..../node_modules/yjs/dist/yjs.cjs:9475)
readClientsStructRefs (..../node_modules/yjs/dist/yjs.cjs:1346)

However in the decoding logic in yrs, it just passed the value to json_parser without a further look.

            BLOCK_ITEM_FORMAT_REF_NUMBER => {
                ItemContent::Format(decoder.read_key(), decoder.read_json().into())
            }
---->>
impl<'a> Decoder for DecoderV2<'a> {
....
    fn read_json(&mut self) -> Any {
        Any::from_json(self.cursor.read_string())
    }
....
---->>
impl Any {
....
    #[cfg(not(feature = "lib0-serde"))]
    pub fn from_json(src: &str) -> Self {
        use crate::json_parser::JsonParser;
        let mut parser = JsonParser::new(src.chars());
        parser.parse().unwrap() // Crashes within parse() functionn
    }
....
}

here is a serialized ydoc inn base64 format to reproduce this behaviour:

AAMAAwIL1/jIzxcMhsWi6gMJAAJCAQEEEAJCAwQAFA8HAQQAKAOEAEYAhgDGASi5AaMBcHJvc2VtaXJyb3JwdGluZGVudHRhZ05hbWVsaW5lSGVpZ2h0Yl9pZGVlc3Rib2xkYm9sZGZvbnRzaXplZm9udHNpemVub3RlLmd1aWRub3RlR3VpZG5vdGUub3duZXJvd25lcm5vdGUudHlwZW5vdGVUeXBlbm90ZS5wcml2YXRlaXNQcml2YXRlbm90ZS5jcmVhdGVUaW1lY3JlYXRlVGltZQtBAAYHCkQCSAAJCAoFCQgMCQ8KBQEAAAUBAgMGAkEHAgwAfQB3A2RpdncAdwtBYy1WU0drd2RhcnYAfnYCBXZhbHVlfRQEdW5pdHcCcHh+BQB3FmhybF9WX3BSU2RhWjRfUUZYa0VYQlF9gvKPAX0AeXcNMTY1MzY0NDM1MjEwMgA=

Cursor API

As part of introducing move operation (#121), we introduced BlockIter which gives us a handy way to iterate over blocks with regard to deleted blocks, moved ones etc. All of the array methods (get, iter, insert, delete) are performed in context of BlockIter. IMO name is a bit misleading - therefore Cursor in the title, as the way BlockIter works seems closely similar to file system API / editors API. For this reason I'll be using the cursor nomenclature below.

The proposal in this issue is about exposing the cursor capabilities to API users. This would allow us to insert subsequent elements (items or characters) without looking up for index→block ID mapping.

API

impl Text {
  /// Return new cursor set up at given position.
  pub fn seek(&self, index: u32) -> TextCursor;
}

impl TextCursor {
  /// Change the position of a current cursor to point at arbitrary index.
  pub fn seek(&mut self, index: u32);

  /// Move current cursor by given number of elements to the right (if delta is positive) 
  /// or left (if delta is negative).
  pub fn advance(&mut self, delta: i32);

  /// Inserts given string at current cursor position and advances cursor to the end of inserted chunk.
  pub fn insert(&mut self, txn: &mut Transaction, chunk: &str);

  /// Removes a range of characters starting at the current cursor position and advances the cursor to 
  /// the end of removed chunk.
  pub fn remove_range(&mut self, txn: &mut Transaction, len: u32);

  /// Performs move operation of a given range of elements into current cursor position.
  /// Move range structure describes range of start..end indexes to be moved as well as associativity 
  /// (sticking moved blocks to left/right). 
  pub fn move_from(&mut self, txn: &mut Transaction, range: CursorRange);
  
  /// Not sure about this one, but we could create a move range descriptor straight in the terms of cursor itself.
  pub fn range(&mut self, len: u32, assoc_start: Assoc, assoc_right: Assoc) -> CursorRange;

  /// Reads up to `buf.len` elements and puts them in a buffer. Returns a number of elements read.
  /// Advances cursor by the number of elements read.
  pub fn read(&mut self, buf: &mut [Value]) -> usize;
}

/// Cursor implements iterator pattern, so it can be used in for loops.
impl Iterator for TextCursor {
  type Item = Value;
  fn next(&mut self) -> Option<Self::Item> {
    let mut buf = [Value::default();1]; // this is stack-allocated buffer for a single element
    if self.read(&mut buf) == 0 {
      None // we didn't read any elements, so we're at the end of the cursor
    } else {
      Some(buf[0])
    }
  }
}

/// Cursor implements backward iterator pattern, so it can be traversed in reverse order.
impl DoubleEndedIterator for TextCursor {
  fn next_back(&mut self) -> Option<Self::Item>;
}

/// As we know how to track length of underlying text and can track the current position,
/// we could possibly implement this trait for quick direct remaining length computation.
impl ExactSizeIterator for TextCursor {
  fn len(&self) -> usize;
}

pub struct CursorRange {
  start: Option<ID>,
  assoc_start: Assoc,
  end: Option<ID>,
  assoc_end: Assoc,

  // If we obtained ID for start/end, it means, we probably visited blocks already, so we can cache
  // their pointers. Just remember to do double check when MoveRange is used in case if blocks
  // were merged or split.
  start_ptr: Option<BlockPtr>,
  end_ptr: Option<BlockPtr>,
}

pub enum Assoc {
  Left,
  Right
}

Similar API could be done for Arrays and XmlElements.

Expose XmlFragment and XmlText(Text) blocks

I'm trying to build a server for tiptap editor with yrs.
I used yrs, forked it and exposed XmlFragment, this worked without too much of a problem.

But the styles of tiptap is not stored in the xml tags, but rather in the linked list of text blocks.
I was able to print out the blocks by adding some functions to XmlText.

There's no public api yet to access these blocks. Do you plan on creating/exposing such an api?

Would you accept a merge request, and if yes, what whould be your requirements for this api?

`Option::unwrap()` on a `None` value in `transaction.rs` whilst committing/dropping transaction

After applying many V1 updates (all of which were generated by Y.js) to a transaction and then committing, Yrs panics.
(This seems to be specific to these updates, since other documents seem to work fine.)

            // doc_updates: Vec<Vec<u8>>
            let ydoc = Doc::new();
            {
                let mut txn = ydoc.transact();
                debug!("{} updates to apply.", doc_updates.len());
                for update_bytes in doc_updates {
                    let update = Update::decode_v1(&update_bytes)?;
                    txn.apply_update(update);
                }
            } // <---- backtrace points to this line
thread 'tokio-runtime-worker' panicked at 'called `Option::unwrap()` on a `None` value', /home/quail/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.10.0/src/transaction.rs:562:59
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
   2: core::panicking::panic
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:48:5
   3: core::option::Option<T>::unwrap
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/option.rs:755:21
   4: yrs::transaction::Transaction::commit
             at /home/quail/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.10.0/src/transaction.rs:562:26
   5: <yrs::transaction::Transaction as core::ops::drop::Drop>::drop
             at /home/quail/.cargo/registry/src/github.com-1ecc6299db9ec823/yrs-0.10.0/src/transaction.rs:698:9
   6: core::ptr::drop_in_place<yrs::transaction::Transaction>
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ptr/mod.rs:448:1

yrs version 0.10.0

If it would be useful, I can give you some reproduction code along with the updates that I'm applying — there are 423 of them but most of them are tiny — I can probably tar them up and write some code that loads them with std::fs::read or so?
Let me know — I don't know if this is 'obvious' to someone who knows the code or whether you would actually appreciate that.

Test data

Hey @Horusiath,

So I generated a few test cases for you. I compiled the tests using lib0/encoding to a base64-encoded text file. Sorry, but that was the easiest way from JavaScript. Let me know if you want me to compile the files to binary.

I basically took all randomly-generated tests by Yjs (map, text, array) and compiled them all into a single benchmarking suite.

You can use the below node script to execute the tests. I'm attaching a few datasets in different sizes. Let me know if you want me to generate a dataset for a specific use case (e.g. text-only with max 10 generated updates).

Small dataset: https://drive.protonmail.com/urls/MEQ6KB5DMG#ZLpiAo78xZaf
Medium dataset: https://drive.protonmail.com/urls/KX0XGTYHVG#vh2BYYPkQbiN
Large dataset: https://drive.protonmail.com/urls/45E804NJQW#dbBiUwPgDYiX (405MB!)


If you can run these datasets, then it's pretty safe to say that Yrs can process all Yjs updates.


import * as decoding from 'lib0/decoding'
import * as buffer from 'lib0/buffer'
import * as t from 'lib0/testing'
import * as Y from 'yjs'
import fs from 'fs'

const data = fs.readFileSync('./test-data.txt', 'utf8').split('\n')[0]
const binaryData = buffer.fromBase64(data)

const decoder = decoding.createDecoder(binaryData)

let testnumber = 0
while (decoding.hasContent(decoder)) {
  console.log('running test #' + ++testnumber)
  const updatesLen = decoding.readVarUint(decoder)
  const ydoc = new Y.Doc()
  for (let i = 0; i < updatesLen; i++) {
    const update = decoding.readVarUint8Array(decoder)
    Y.applyUpdate(ydoc, update)
  }

  const expectedText = decoding.readVarString(decoder)
  const expectedMap = decoding.readAny(decoder)
  const expectedArray = decoding.readAny(decoder)

  t.compare(expectedText, ydoc.getText('text').toString())
  t.compare(expectedMap, ydoc.getMap('map').toJSON())
  t.compare(expectedArray, ydoc.getArray('array').toJSON())
}

Move compatibility tests

As discussed, here are a few compatibility tests for the move feature. We expect that after applying certain updates the result between Yjs and Y CRDT matches.

The tests are lib0 encoded and can be executed as follows:

  const decoder = decoding.createDecoder(buffer.fromBase64(compatTests))
  let iteration = 0
  while (decoding.hasContent(decoder)) {
    const ydoc = new Y.Doc()
    ydoc.getArray('array')
    const len = decoding.readVarUint(decoder)
    for (let i = 0; i < len; i++) {
      const update = decoding.readVarUint8Array(decoder)
      Y.applyUpdate(ydoc, update)
    }
    const expectedResult = decoding.readAny(decoder)
    const result = ydoc.getArray('array').toJSON()
    t.compare(expectedResult, result)
    console.log('iteration #' + iteration++)
  }

Small dataset - 1.1mb

compat-tests-small.txt

Larger dataset - 8.7mb

compat-tests-larger.txt

Large dataset - 85mb

compat-tests-large.txt

Infinite Loop in Update::merge_updates

Hi, I've found a relatively simple to reproduce issue with the loop in the merge_updates function. I am able to consistently reproduce the issue by repeatedly appending single characters to the end of a text document, where each character is added in its own transaction and that transaction is encoded as an update. When rebuilding the document from the encoded updates, Update::merge_updates gets stuck in an infinite loop until running out of memory.

Here's a CSV with hex encoded updates illustrating the issue:
text_updates.csv

Please let me know if any more details are required. I might take some time to try and figure this out myself but I figured it was better to share it first.

Measure the impact of using BlockPtr

I still have got some doubts around concept behind BlockPtr. While I got the idea behind the initial concept I'm thinking if all of the operations where taken in mind.

As @dmonad mentioned original test was about populating BlockStore of known size with blocks at the initialization time. Inline approach is faster here: we just make Vec::with_capacity<Block>(known_size) and we have entire chunk ready right away with blocks inside already initialized. Compared to that Vec::with_capacity<Rc<RefMut<Block>>(known_size) seems subpar for two reasons:

  • We need to create every single block manually as a separate heap object.
  • When additional blocks will be added over the lifetime of an application, blocks will be created on the heap in potentially different parts of memory. Memory locality is in the favor of Vec<Block> here.

However this approach also comes with several other issues, I'm not sure we talked about:

When new blocks will be added to a vector, and its capacity will be reached, before appending the next element, vector will have to assign new internal array (also on heap) of sizeof(Block) * vec.capacity() * 2 (vec growth factor is around 2.0), then copy all of the existing blocks over to new space. Block size atm is around ~300B. For block store with 1024 elements, this means copying ~300kB of memory before adding new element. If we're talking about block store as Vec<Rc<RefMut<Block>>>, 1024 elements takes only 8kB, making resizing significantly easier.

Another issue is an access time based on BlockPtr - since it's used as Item's left and right block pointer, every access to left/right block involves going through block store. Compared to access by Rc this is really expensive.

blockptr

I don't have yet enough intuition to say how often this will happen.

Here's the assembly of BlockStore::get_item_mut compiled with release options (-C opt-level=2). Keep in mind that this code will be executed every single time we're about to retrieve item instance from block store.

BlockStore::get_item_mut:
        push    rbx
        mov     r8, qword ptr [rsi]
        mov     rax, r8
        shr     rax, 57
        mov     r10, qword ptr [rdi]
        mov     r9, qword ptr [rdi + 8]
        mov     rdx, r10
        and     rdx, r8
        lea     rdi, [rdx + 16]
        and     rdi, r10
        movdqu  xmm1, xmmword ptr [r9 + rdx]
        movd    xmm0, eax
        punpcklbw       xmm0, xmm0
        pshuflw xmm0, xmm0, 224
        pshufd  xmm0, xmm0, 0
        movdqa  xmm2, xmm0
        pcmpeqb xmm2, xmm1
        pmovmskb        eax, xmm2
        mov     ecx, 16
        pcmpeqd xmm2, xmm2
.LBB2_1:
        test    ax, ax
        jne     .LBB2_4
        pcmpeqb xmm1, xmm2
        pmovmskb        eax, xmm1
        test    ax, ax
        jne     .LBB2_7
        mov     rdx, rdi
        add     rdi, rcx
        add     rdi, 16
        add     rcx, 16
        and     rdi, r10
        movdqu  xmm1, xmmword ptr [r9 + rdx]
        movdqa  xmm3, xmm0
        pcmpeqb xmm3, xmm1
        pmovmskb        eax, xmm3
        jmp     .LBB2_1
.LBB2_4:
        lea     ebx, [rax - 1]
        and     ebx, eax
        bsf     ax, ax
        movzx   eax, ax
        add     rax, rdx
        and     rax, r10
        neg     rax
        lea     r11, [rax + 4*rax]
        mov     eax, ebx
        cmp     r8, qword ptr [r9 + 8*r11 - 40]
        jne     .LBB2_1
        mov     rax, qword ptr [r9 + 8*r11 - 32]
        mov     ecx, dword ptr [rsi + 16]
        lea     rcx, [rcx + 8*rcx]
        shl     rcx, 5
        cmp     qword ptr [rax + rcx], 0
        jne     .LBB2_6
        add     rax, rcx
        add     rax, 8
        pop     rbx
        ret
.LBB2_7:
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_2]
.LBB2_8:
        mov     esi, 43
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2
.LBB2_6:
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_3]
        jmp     .LBB2_8

It would be good to have some performance tests for Rcs and BlockPtr in many different scenarios. Potentially we could also create something that works "like" Rc (eg. using unsafe pointer arithmetic) and let's us to leverage the knowledge about the structures that know we're going to use, which generic RC mechanism won't do.

ywasm/www needs a 2 minute update?

First, thanks for all the awesome work on this project. It's pretty darned cool.

I am struggling a bit to get all the pieces working, however.

I think the examples in ywasm/www are perhaps outdated and could use some love.

index.js seems to be using an old wasm.Doc() class instead of the perhaps newer wasm.YDoc() class?

The "insert"s there... maybe need to be in a transaction in the newer way of doing things?

I suspect someone on top of the evolution of the project could update the index.js there in about 2 minutes to properly print out some statistics in a browser the way it was meant to, but I have yet to get it to go.

Thanks for any leads and updates!

Also, the README.md seems to be a generic readme for create-wasm-app. Could someone perhaps do a quick cut-and-paste from their terminal after they make this go and commit it? Seems like maybe it's supposed to be something like
cd y-crdt/ywasm/www
npm run build
npm start
perhaps a note that mentions you need to build the pkg in y-crdt/ywasm too? (And, the preferred way to do that?)

Thanks!

UB when bypassing immutability rules through `ptr::write`

Some parts of yrs bypass immutability through std::ptr::write:

https://github.com/yjs/y-crdt/blob/7cca0291ad1b743ff97478d8d9ea75d923724597/yrs/src/block.rs#L62-L65

https://github.com/yjs/y-crdt/blob/7cca0291ad1b743ff97478d8d9ea75d923724597/yrs/src/block.rs#L320-L325

(There may be more occasions of this but I haven't checked)

If you have a reference &T, then normally the Rust compiler performs optimizations based on the knowledge that &T points to immutable data. Mutating that data, like in the functions above, is considered undefined behavior.

You can opt out of the immutability guarantee of &T through UnsafeCell

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.