Giter Club home page Giter Club logo

loro's People

Contributors

eltociear avatar gentle avatar imsingee avatar leeeon233 avatar zxch3n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loro's Issues

The current RleTree uses too much memory to represent the state of the text

Because bumpalo cannot deallocate, we may waste a lot of space if we use bumpalo to represent the state of text/list.

For example, after applying the Automerge paper dataset, the RleTree has

  • len=104852
  • InternalNodes=264
  • LeafNodes=1124
  • Elements=5755
  • Bytes(bump.allocated_bytes())=2095872

It takes about 2MB to represent the state. However, each node only has 80 bytes, and each element only takes 8 bytes.

Solutions

  • Option 1: Make Bump a generic allocator in RleTreeTrait
  • Option 2: Use another data structure to represent the state

Option 1 may be dirty and inefficient because we also need to wrap the type for the allocated element.

[LORO-117] Refactor event: lazy loading diff

We can calculate the actual diff only when users query the diff. It can speed up things a lot, especially when there is a deep observer on the root. But the downside is it requires much more time to load the diff afterward if we need to use the tracker to recalculate the effects.

A nice tradeoff would be that we keep the pointer to the diffing content inside the event without actually allocating new spaces to store the information. For example, we could store the string slices' ranges, instead of allocating a new String to store the data.

From SyncLinear.com | LORO-117

[LORO-291] Extract `InnerDiff` from `Diff`

Diff plays two roles in the current implementation:

  • Message from DiffCalculator to State
  • Message to users

The former can be more compact than the latter. The current implementation mingles them together.

LORO-291

[LORO-306] Extract utf16 feature

It's wasm feature in the code now. But utf16 is more descriptive. When this feature is enabled, users control the string and receive the events in utf16 index.

LORO-306

Use dag node id as version when encoding updates

Won't do it now because this method comes with a price. It requires that the other site has already seen the dag node. But it's not the case most of the time. So it should be provided as a version mark that requires additional documentation.

Recursive type

Maps and lists should be able to contain other containers.

It's different from the JS interface design. Because it also has to consider the ownership problem and what happens after the
document is dropped.

  • Add Context trait
  • Change existing map and text interface
    • Remove the weak field that points to LogStore / ContainerManager
  • Get value deep
  • To JSON string
  • List Container
  • Generalize insert and insert_obj parameters

References

Automerge-Rust

In Automerge-Rust, users can only get the id to the container (map/text/list). And users should use that id to query data inside or mutate the state of the container.

For example, to put an object inside another object:

    fn put_object<O: AsRef<ExId>, P: Into<Prop>>(
        &mut self,
        obj: O,
        prop: P,
        value: ObjType,
    ) -> Result<ExId, AutomergeError> {
        self.ensure_transaction_open();
        let tx = self.transaction.as_mut().unwrap();
        tx.put_object(&mut self.doc, obj.as_ref(), prop, value)
    }

So when the doc is dropped, users can no longer query or mutate the data inside containers.

Y-CRDT

It has prelim types, which can be created without interacting with the doc first. So every container in y-crdt has two states: prelim and internal.

The implementation of YMap::set in y-wasm is

    #[wasm_bindgen(js_name = set)]
    pub fn set(&self, txn: &mut YTransaction, key: &str, value: JsValue) {
        match &mut *self.0.borrow_mut() {
            SharedType::Integrated(v) => {
                v.insert(txn, key.to_string(), JsValueWrapper(value));
            }
            SharedType::Prelim(v) => {
                v.insert(key.to_string(), value);
            }
        }
    }

Under the hood, the value is a Perlim type, which can be created as a Container.

    /// Inserts a new `value` under given `key` into current map. Returns a value stored previously
    /// under the same key (if any existed).
    pub fn insert<K: Into<Rc<str>>, V: Prelim>(
        &self,
        txn: &mut Transaction,
        key: K,
        value: V,
    ) -> Option<Value> {
        let key = key.into();
        let previous = self.get(&key);
        let pos = {
            let inner = self.0;
            let left = inner.map.get(&key);
            ItemPosition {
                parent: inner.into(),
                left: left.cloned(),
                right: None,
                index: 0,
                current_attrs: None,
            }
        };

        txn.create_item(&pos, value, Some(key));
        previous
    }

To avoid document dropping before the container, the mutation functions must have a param of YTransaction. And struct YTransaction(Transaction) holds an Rc to the internal store.

pub struct Transaction {
    /// Store containing the state of the document.
    store: Rc<UnsafeCell<Store>>,
    /// State vector of a current transaction at the moment of its creation.
    pub before_state: StateVector,
    /// Current state vector of a transaction, which includes all performed updates.
    pub after_state: StateVector,
    /// ID's of the blocks to be merged.
    pub(crate) merge_blocks: Vec<BlockPtr>,
    /// Describes the set of deleted items by ids.
    pub delete_set: DeleteSet,
    /// All types that were directly modified (property added or child inserted/deleted).
    /// New types are not included in this Set.
    changed: HashMap<TypePtr, HashSet<Option<Rc<str>>>>,
    committed: bool,
}

Provide a way for map to initialize its schema

Commonly, a map container has a schema. There is no way to initialize the schema in the current version. It's fine if the values of the entries are JSON values. But it may be problematic if the value is a Container.

Consider the following example.

struct TodoList {
    todos: List
}

Users may initialize the todos field concurrently and insert a to-do item concurrently. However, in the current implementation, one user's edits will override the other's and thus only have a to-do item in the list.

The current workaround in the application code is to initialize all fields immediately after creating the map. But by using this approach, there is still potential data loss when the schema is changed.

Solution

We may need alias of container to support this feature. i.e., there may exist two different ids that point to the same container.

Make Bumpalo optional in RleTree

The current RleTree implementation is heavily using bumpalo. But it doesn't suit the cases where RleTree is kept in the memory for a long time. Because bumpalo never deallocates or reuses allocated memory. See #8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.