loro-dev / loro Goto Github PK

View Code? Open in Web Editor NEW

3.0K 3.0K 49.0 9.35 MB

Reimagine state management with CRDTs. Make your app collaborative effortlessly.

Home Page: https://loro.dev

License: MIT License

Rust 96.22% TypeScript 3.62% Shell 0.01% Java 0.06% C++ 0.02% Go 0.02% Python 0.01% JavaScript 0.05%

collaborative-editing crdt local-first offline-first p2p privacy-first rich-text

loro's People

Contributors

Stargazers

Watchers

loro's Issues

[LORO-284] Find a way to avoid users calling get_map with an invalid id

We should be able to detect the misuse of the API and panic.

_LORO-284

Standardize encoding

Add regression tests for encodings

The current RleTree uses too much memory to represent the state of the text

Because bumpalo cannot deallocate, we may waste a lot of space if we use bumpalo to represent the state of text/list.

For example, after applying the Automerge paper dataset, the RleTree has

len=104852
InternalNodes=264
LeafNodes=1124
Elements=5755
Bytes(bump.allocated_bytes())=2095872

It takes about 2MB to represent the state. However, each node only has 80 bytes, and each element only takes 8 bytes.

Solutions

Option 1: Make Bump a generic allocator in RleTreeTrait
Option 2: Use another data structure to represent the state

Option 1 may be dirty and inefficient because we also need to wrap the type for the allocated element.

[LORO-310] Access elem by path

null

_LORO-310

[LORO-140] [LORO-141] Support pending changes

null

_LORO-141

[LORO-117] Refactor event: lazy loading diff

We can calculate the actual diff only when users query the diff. It can speed up things a lot, especially when there is a deep observer on the root. But the downside is it requires much more time to load the diff afterward if we need to use the tracker to recalculate the effects.

A nice tradeoff would be that we keep the pointer to the diffing content inside the event without actually allocating new spaces to store the information. For example, we could store the string slices' ranges, instead of allocating a new String to store the data.

_{From SyncLinear.com | LORO-117}

[LORO-139] Peritext

Depends on inkandswitch/peritext#31 and #9

_LORO-139

CI: benchmark report

bench: native benchmark for general CRDT framework

feat: transaction

[LORO-298] Speed up event compose

可能会造成 updates 数量众多，并且 listener 开启的时候更新速度慢

_{From SyncLinear.com | LORO-298}

[LORO-285] Load changes from pending changes when they are ready

null

_LORO-285

[LORO-294] GC in state snapshot

A container's state can be ignored if any of its ancestors is deleted. This may be tricky to implement because of time travel.

_LORO-294

[LORO-297] Snapshot encoding improve - style anchor

We can share start & end anchors between OpLog and DocState

_LORO-297

[LORO-300] Fractional index

This enable users to move element easily. We can integrate it into tree node

_LORO-300

[LORO-301] Mark and delete unmergeable styles like comment

The deletion of such styles is different from the mergeable ones.

_LORO-301

[LORO-302] Getting a non-exist container should return Err

This is not avoided when using getText, getMap...

_LORO-302

[LORO-282] [LORO-283] Add fuzzing tests for version checkout

null

_LORO-283

Need to emit events involving child container re-creation

[LORO-303] Allow users to control the timestamp interval to merge change automatically

null

_LORO-303

[LORO-291] Extract `InnerDiff` from `Diff`

Diff plays two roles in the current implementation:

Message from DiffCalculator to State
Message to users

The former can be more compact than the latter. The current implementation mingles them together.

_LORO-291

Refactor: move cache to parent node

[LORO-288] [LORO-289] Use `override` linked list to store the linear cache history within a register

In the current implementation, we must go through all the history to calculate map diff if we are going backward #106

It's slow and the cache takes lots of memory. It can be more efficient with a linked list.

_LORO-289

Integrate miri test to be part of CI

[LORO-312] Fuzzing tests on chars that take more than 1 byte

null

_LORO-312

[LORO-306] Extract utf16 feature

It's wasm feature in the code now. But utf16 is more descriptive. When this feature is enabled, users control the string and receive the events in utf16 index.

_LORO-306

Use dag node id as version when encoding updates

Won't do it now because this method comes with a price. It requires that the other site has already seen the dag node. But it's not the case most of the time. So it should be provided as a version mark that requires additional documentation.

[LORO-292] Extract all str utils to a crate

null

_LORO-292

[LORO-290] Remove old encode mode

null

_LORO-290

[LORO-295] Use simpler list event

null

_LORO-295

test(core): add a check function to check the Hierarchy correctness

[LORO-307] Provide iterator, keys, values for map & list

null

_LORO-307

[LORO-305] (richtext)Don't need to create new marks if the styles are the same

null

_LORO-305

refactor: Move hierarchy to LoroCore

[LORO-304] Allow users to configure the default encode option

null

_LORO-304

[LORO-311] Mark whether event is triggered by checkout

null

_LORO-311

[LORO-138] How changes merge should be consistent on every site

_{From SyncLinear.com | LORO-138}

Recursive type

Maps and lists should be able to contain other containers.

It's different from the JS interface design. Because it also has to consider the ownership problem and what happens after the
document is dropped.

Add Context trait
Change existing map and text interface
- Remove the weak field that points to LogStore / ContainerManager
Get value deep
To JSON string
List Container
Generalize insert and insert_obj parameters

References

Automerge-Rust

In Automerge-Rust, users can only get the id to the container (map/text/list). And users should use that id to query data inside or mutate the state of the container.

For example, to put an object inside another object:

    fn put_object<O: AsRef<ExId>, P: Into<Prop>>(
        &mut self,
        obj: O,
        prop: P,
        value: ObjType,
    ) -> Result<ExId, AutomergeError> {
        self.ensure_transaction_open();
        let tx = self.transaction.as_mut().unwrap();
        tx.put_object(&mut self.doc, obj.as_ref(), prop, value)
    }

So when the doc is dropped, users can no longer query or mutate the data inside containers.

Y-CRDT

It has prelim types, which can be created without interacting with the doc first. So every container in y-crdt has two states: prelim and internal.

The implementation of YMap::set in y-wasm is

    #[wasm_bindgen(js_name = set)]
    pub fn set(&self, txn: &mut YTransaction, key: &str, value: JsValue) {
        match &mut *self.0.borrow_mut() {
            SharedType::Integrated(v) => {
                v.insert(txn, key.to_string(), JsValueWrapper(value));
            }
            SharedType::Prelim(v) => {
                v.insert(key.to_string(), value);
            }
        }
    }

Under the hood, the value is a Perlim type, which can be created as a Container.

    /// Inserts a new `value` under given `key` into current map. Returns a value stored previously
    /// under the same key (if any existed).
    pub fn insert<K: Into<Rc<str>>, V: Prelim>(
        &self,
        txn: &mut Transaction,
        key: K,
        value: V,
    ) -> Option<Value> {
        let key = key.into();
        let previous = self.get(&key);
        let pos = {
            let inner = self.0;
            let left = inner.map.get(&key);
            ItemPosition {
                parent: inner.into(),
                left: left.cloned(),
                right: None,
                index: 0,
                current_attrs: None,
            }
        };

        txn.create_item(&pos, value, Some(key));
        previous
    }

To avoid document dropping before the container, the mutation functions must have a param of YTransaction. And struct YTransaction(Transaction) holds an Rc to the internal store.

pub struct Transaction {
    /// Store containing the state of the document.
    store: Rc<UnsafeCell<Store>>,
    /// State vector of a current transaction at the moment of its creation.
    pub before_state: StateVector,
    /// Current state vector of a transaction, which includes all performed updates.
    pub after_state: StateVector,
    /// ID's of the blocks to be merged.
    pub(crate) merge_blocks: Vec<BlockPtr>,
    /// Describes the set of deleted items by ids.
    pub delete_set: DeleteSet,
    /// All types that were directly modified (property added or child inserted/deleted).
    /// New types are not included in this Set.
    changed: HashMap<TypePtr, HashSet<Option<Rc<str>>>>,
    committed: bool,
}

refactor: reduce rle unsafe scope

Provide a way for map to initialize its schema

Commonly, a map container has a schema. There is no way to initialize the schema in the current version. It's fine if the values of the entries are JSON values. But it may be problematic if the value is a Container.

Consider the following example.

struct TodoList {
    todos: List
}

Users may initialize the todos field concurrently and insert a to-do item concurrently. However, in the current implementation, one user's edits will override the other's and thus only have a to-do item in the list.

The current workaround in the application code is to initialize all fields immediately after creating the map. But by using this approach, there is still potential data loss when the schema is changed.