automerge / automerge Goto Github PK

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.

Home Page: https://automerge.org

License: MIT License

Rust 23.67% JavaScript 64.26% Makefile 0.01% Nix 0.02% Shell 0.06% TypeScript 6.38% CMake 0.46% C 5.05% HTML 0.06% CSS 0.05%

automerge's Introduction

Automerge

Automerge is a library which provides fast implementations of several different CRDTs, a compact compression format for these CRDTs, and a sync protocol for efficiently transmitting those changes over the network. The objective of the project is to support local-first applications in the same way that relational databases support server applications - by providing mechanisms for persistence which allow application developers to avoid thinking about hard distributed computing problems. Automerge aims to be PostgreSQL for your local-first app.

If you're looking for documentation on the JavaScript implementation take a look at https://automerge.org/docs/hello/. There are other implementations in both Rust and C, but they are earlier and don't have documentation yet. You can find them in rust/automerge and rust/automerge-c if you are comfortable reading the code and tests to figure out how to use them.

If you're familiar with CRDTs and interested in the design of Automerge in particular take a look at https://automerge.org/automerge-binary-format-spec.

Finally, if you want to talk to us about this project please join our Discord server!

Status

This project is formed of a core Rust implementation which is exposed via FFI in javascript+WASM, C, and soon other languages. Alex (@alexjg) is working full time on maintaining automerge, other members of Ink and Switch are also contributing time and there are several other maintainers. The focus is currently on shipping the new JS package. We expect to be iterating the API and adding new features over the next six months so there will likely be several major version bumps in all packages in that time.

In general we try and respect semver.

JavaScript

A stable release of the javascript package is currently available as @automerge/[email protected] where. pre-release verisions of the 2.0.1 are available as 2.0.1-alpha.n. 2.0.1* packages are also available for Deno at https://deno.land/x/automerge

Rust

The rust codebase is currently oriented around producing a performant backend for the Javascript wrapper and as such the API for Rust code is low level and not well documented. We will be returning to this over the next few months but for now you will need to be comfortable reading the tests and asking questions to figure out how to use it. If you are looking to build rust applications which use automerge you may want to look into autosurgeon

Repository Organisation

./rust - the rust rust implementation and also the Rust components of platform specific wrappers (e.g. automerge-wasm for the WASM API or automerge-c for the C FFI bindings)
./javascript - The javascript library which uses automerge-wasm internally but presents a more idiomatic javascript interface
./scripts - scripts which are useful to maintenance of the repository. This includes the scripts which are run in CI.
./img - static assets for use in .md files

Building

To build this codebase you will need:

rust
node
yarn
cmake
cmocka

You will also need to install the following with cargo install

wasm-bindgen-cli
wasm-opt
cargo-deny

And ensure you have added the wasm32-unknown-unknown target for rust cross-compilation.

The various subprojects (the rust code, the wrapper projects) have their own build instructions, but to run the tests that will be run in CI you can run ./scripts/ci/run.

For macOS

These instructions worked to build locally on macOS 13.1 (arm64) as of Nov 29th 2022.

# clone the repo
git clone https://github.com/automerge/automerge
cd automerge

# install rustup
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# install cmake, node, cmocka
brew install cmake node cmocka

# install yarn
npm install --global yarn

# install javascript dependencies
yarn --cwd ./javascript

# install rust dependencies
cargo install wasm-bindgen-cli wasm-opt cargo-deny

# get nightly rust to produce optimized automerge-c builds
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly

# add wasm target in addition to current architecture
rustup target add wasm32-unknown-unknown

# Run ci script
./scripts/ci/run

If your build fails to find cmocka.h you may need to teach it about homebrew's installation location:

export CPATH=/opt/homebrew/include
export LIBRARY_PATH=/opt/homebrew/lib
./scripts/ci/run

Contributing

Please try and split your changes up into relatively independent commits which change one subsystem at a time and add good commit messages which describe what the change is and why you're making it (err on the side of longer commit messages). git blame should give future maintainers a good idea of why something is the way it is.

Releasing

There are four artefacts in this repository which need releasing:

The @automerge/automerge NPM package
The @automerge/automerge-wasm NPM package
The automerge deno crate
The automerge rust crate

JS Packages

The NPM and Deno packages are all released automatically by CI tooling whenever the version number in the respective package.json changes. This means that the process for releasing a new JS version is:

Bump the version in the rust/automerge-wasm/package.json (skip this if there are no new changes to the WASM)
Bump the version of @automerge/automerge-wasm we depend on in javascript/package.json
Bump the version in @automerge/automerge also in javascript/package.json

Put all of these bumps in a PR and wait for a clean CI run. Then merge the PR. The CI tooling will pick up a push to main with a new version and publish it to NPM. This does depend on an access token available as NPM_TOKEN in the actions environment, this token is generated with a 30 day expiry date so needs (manually) refreshing every so often.

Rust Package

This is much easier, but less automatic. The steps to release are:

Bump the version in automerge/Cargo.toml
Push a PR and merge once clean
Tag the release as rust/automerge@<version>
Push the tag to the repository
Publish the release with cargo publish

automerge's People

Contributors

Stargazers

Watchers

Forkers

pvh gozala ept inkandswitch lightsprint09 spacejam johnrees johnptoohey forkkit ttiurani nornagon anirrudh datalayer-externals ndinhbang saki-osive jeffa5 livingspec isgasho zeta1999 tylersamples gterzian epompeii stephanvs vedantroy scostello-bluebeam dvc94ch spector-in-london nicholastmosher scostello erdal-pb transparencies suryatmodulus timfpark jkankiewicz frankfanslc blaine typedlambda morganwu277 mwatts jeromegn scotttrinh rf- lastkey-io adelsz simonthum tombh zhutony evanrichter mayhemheroes bluebear94 distributed-system-lab julienfr112 philschatz traverse-research conradirwin onsetsoftware issackelly patrykryczko gg-big-org artisdom underwindow willmruzek kinddevil zachallaun eternalerrors nokome alexhan01 kepelrs mnbiomark audigolabs flowi zxcvc chrstnb anhduong0811 kkpan11 wjt threepointone pwfoo redefinered-c bugbakery phirtual andyalmandhunter letsbreelhere kowsheek sdedalus kvanlier teohhanhui lstwn munhitsu xlambein salvoravida cyberflamego heckj vineetp6 jamesgpearce dnetguru spaceblocks liangrunda geo25rey timoisik

automerge's Issues

Implement getChangesAdded on the backend

See #368 (comment)

[C-Interface]How to access getChanges

The JS Implementation provides getChanges(oldState: BackendState, newState: BackendState): Change[]. I don't know how this is exposed in the C-Interface.

Nothing gets processed after the first Text field longer than 63 characters

Here's a failing test.

Suggestion (not a feature request): Take a look at Replicated Object Notation (RON)

I'm deliberately duplicated my comment in #266 here. If this suggestion isn't helpful, please close it immediately.

This is just a suggestion, not a feature request. You might want to look at Replicated Object Notation (RON). It might be helpful (or not; I stumbled across this project by accident, it just looks like RON might be useful to it).

Should the frontend return a users value?

The frontend change function takes a closure: https://github.com/automerge/automerge-rs/blob/d3d1b48c4889e374b58e723c16b3c13a5a5d30bb/automerge-frontend/src/lib.rs#L382-L389

This type forces the user to return a unit value in the case of success but they may want to do something with a value from inside this closure. This leads to them being forced to some more ugly style with

let mut my_val = None; // or default
frontend.change(None, |doc| {
  my_val = doc.get(some_path);
  Ok(())
});

Which isn't particularly idiomatic.

I propose changing the signature of the change function to something like:

pub fn change<F, O, E>( 
     &mut self, 
     message: Option<String>, 
     change_closure: F, 
 ) -> Result<(O, Option<UncompressedChange>), E> 
 where 
     E: Error, 
     F: FnOnce(&mut dyn MutableDocument) -> Result<O, E>,

Such that we can see more idiomatic code such as:

let (my_val, change) = frontend.change(None, |doc| {
  Ok(doc.get(some_path))
}).unwrap();

What do you think?

InlineArray can't satisfy alignment of (OpId, Option<MultiValue>)

Using the wasm32-unknown-unknown target I'm getting the following error:

---- automerge::tests::make_change output ----
    error output:
        panicked at 'InlineArray can't satisfy alignment of (automerge_protocol::OpId, core::option::Option<automerge_frontend::state_tree::multivalue::MultiValue>)', /home/andrew/.cargo/registry/src/github.com-1ecc6299db9ec823/sized-chunks-0.6.4/src/inline_array/mod.rs:252:9

        Stack:

        Error
            at /home/andrew/projects/automerge-rs/target/wasm32-unknown-unknown/wbg-tmp/wasm-bindgen-test.js:486:15
            at /home/andrew/projects/automerge-rs/target/wasm32-unknown-unknown/wbg-tmp/wasm-bindgen-test.js:156:22
            at console_error_panic_hook::Error::new::h91749f1d03d15f28 (<anonymous>:wasm-function[5478]:0x16d845)
            at console_error_panic_hook::hook_impl::h9cd40e14baf7f3fc (<anonymous>:wasm-function[688]:0xca6aa)
            at console_error_panic_hook::hook::h5dfdb959295c624c (<anonymous>:wasm-function[5947]:0x174076)
            at core::ops::function::Fn::call::h31c60337fc622960 (<anonymous>:wasm-function[4975]:0x165b82)
            at std::panicking::rust_panic_with_hook::h4e1267e42c34e062 (<anonymous>:wasm-function[1511]:0x1052e8)
            at std::panicking::begin_panic_handler::{{closure}}::h55e6bb589840bdac (<anonymous>:wasm-function[2221]:0x1242e2)
            at std::sys_common::backtrace::__rust_end_short_backtrace::h404ad66b7b407ddd (<anonymous>:wasm-function[6386]:0x1797a4)
            at rust_begin_unwind (<anonymous>:wasm-function[5777]:0x171c35)



    JS exception that was thrown:
        RuntimeError: unreachable
            at __rust_start_panic (<anonymous>:wasm-function[7458]:0x1819fa)
            at rust_panic (<anonymous>:wasm-function[6656]:0x17c576)
            at std::panicking::rust_panic_with_hook::h4e1267e42c34e062 (<anonymous>:wasm-function[1511]:0x10530c)
            at std::panicking::begin_panic_handler::{{closure}}::h55e6bb589840bdac (<anonymous>:wasm-function[2221]:0x1242e2)
            at std::sys_common::backtrace::__rust_end_short_backtrace::h404ad66b7b407ddd (<anonymous>:wasm-function[6386]:0x1797a4)
            at rust_begin_unwind (<anonymous>:wasm-function[5777]:0x171c35)
            at std::panicking::begin_panic_fmt::h22dee889bdf62353 (<anonymous>:wasm-function[6345]:0x17903d)
            at sized_chunks::inline_array::InlineArray<A,T>::new::hdf3e4768efe13f61 (<anonymous>:wasm-function[111]:0x56d7f)
            at im_rc::vector::Vector<A>::new::hd87172de133af21d (<anonymous>:wasm-function[835]:0xd8140)
            at <im_rc::vector::Vector<A> as core::iter::traits::collect::FromIterator<A>>::from_iter::h269fa57c077ba18c (<anonymous>:wasm-function[505]:0xb4722)

This is reproducible using my branch (and specifically commit) here.

I'd be interested to see other people reproduce it and if anyone has an idea as to why it is ocurring or how to fix it, that would be great.

It seems to stem from https://github.com/jeffa5/automerge-rs/blob/5a311e507b110ca5759aeb8c5bc0cad70cbda8e5/automerge-frontend/src/state_tree/diffable_sequence.rs#L120 as changing this to be a normal Vec fixes it.

Add javascript interop testing to CI

Browser/Node.js packaging story

I've been thinking about the best way to go about packaging automerge-rs such that it can become the One True Automerge Backend, used by default in mainline automerge, and installable with no more fuss than an npm install. Here's what I've learned so far, and what I think the path forward could be.

The majority of the groundwork for this is already laid by the automerge-backend-wasm crate, using wasm-pack. The tricky part is getting the wasm bundle that wasm-pack generates to function seamlessly in the various JS environments that automerge currently supports (i.e. Node.js and web browsers via a bundler such as webpack). The tooling in this area seems to be relatively immature (see e.g. rustwasm/wasm-bindgen#2265), so sadly this is cutting-edge rust+wasm+js technology. I expect this story to improve significantly in the coming years.

wasm-pack is able to generate output for a variety of different targets, most importantly, "bundler", "nodejs" and "web". Notably, none of these output formats work for both Node.js and the browser. See rustwasm/wasm-pack#705 for a long discussion and a variety of workarounds. In particular, I think this workaround is probably the best way forward for automerge-rs: rustwasm/wasm-pack#705 (comment)

e.g.

package.json:

{
  "name": "automerge-backend-wasm",
  "files": [
    "pkg-bundler/index_bg.wasm",
    "pkg-bundler/index.js",
    "pkg-bundler/index.d.ts",
    "pkg-node/index_bg.wasm",
    "pkg-node/index.js",
    "pkg-node/index_bg.js",
    "pkg-node/index.d.ts"
  ],
  "main": "pkg-node/index.js",
  "module": "pkg-bundler/index.js",
  "types": "pkg-bundler/index.d.ts",
  "sideEffects": "false"
}

and then building with:

wasm-pack build --target bundler --out-dir pkg-bundler
wasm-pack build --target nodejs --out-dir pkg-node

The downside of this approach is that it builds two distinct wasm bundles. The bundles are nearly identical but differ in a few minor details. This is a fairly minor downside I think, as in a production app, in Node.js install time is not a big concern, and in a web app, the bundler will only pull the web version of the wasm bundle to the client.

With this in place, the following should work:

import wasmBackend from 'automerge-backend-wasm'
import Automerge from 'automerge'

wasmBackend.initCodecFunctions(Automerge)
Automerge.setDefaultBackend(wasmBackend)

// ... use Automerge, but it's fast now

With node.js it works out of the box, but with webpack you need to enable the experimental asyncWebAssembly feature.

Ideally all this would be under the hood in automerge and simply importing stock automerge would get you wasm-backed goodness. Unfortunately, the ecosystem is just not there yet. However, I think we can still support automerge-rs as the primary backend for automerge via asm.js. The setup we'd need to accomplish this is a little arcane in order to avoid needing complicated "tree-shaking" / dead-code-elimination strategies to ship the right code to clients, but I think during the transitional period, the package structure of automerge could look like this:

automerge
└── automerge-backend-asmjs

automerge-wasm
└── automerge-backend-wasm

i.e. two automerge main packages, the "default" one backed by automerge-rs compiled to asmjs, and a separate one backed by the wasm version, for those who want to make the tradeoff of dealing with wasm bundling issues in exchange for performance improvements.

Another possibility could be to arrange the packages like this:

automerge
├── automerge-backend-asmjs
└── automerge-backend-wasm

and have require('automerge') return an asmjs-backed API. To pull in the wasm version, you could require('automerge/wasm').

The API I'd really prefer here is something like the following:

const Automerge = require('automerge')
Automerge.setBackend(require('automerge-backend-wasm'))

However, I don't see a path to having a "pluggable" backend that doesn't do one of the following undesirable things:

Increase boilerplate for all users of automerge (i.e. require "registering" a backend)
Always ship the default backend to clients, even if you never use it

Hence the various package arrangement strategies above.

FFI interface and reuse from higher level languages. Design question.

I would like to use this from not just just but other languages.

Our project currently uses OT, and not CRDT.
SO we are stuck with a central server in order to maintain ordering.

But we now are going P2P with no central server and so using CRDT.

The clients are Dart and the backend SAAS layer are anything, but tend to be golang.
So in this multi lang world, i was wondering if Protobuf would be a possibility ?

Protobufs can "embed" other protobufs these days by using the Any first class type also.
Which is useful when you want a primary data plane using protobufs, that can carry other people Protobufs.

I know this is really an Issue talking about Architecture, and i a proposing a possible solution to the reuse problem.

Rust is a really good choice and so is WASM. Others are also embracing it.
https://github.com/envoyproxy/envoy-wasm

So you can build massive systems where the dataplane itself can do CRDT for example.
We use this currently for various things we need to do like Privacy protection

Add strict number types

See #371

New encoding/decoding breaks downstream libraries.

The encoding::Error and decoding::Error are not added to AutomergeError so they cannot be caught by consuming libraries in general.

Here was my quick hack to fix this issue locally in (automerge-backend/src/lib.rs)

pub use encoding::Error as EncodingError;
pub use decoding::Error as DecodingError;

I don't know much about the structure of automerge-rs or rust in general, but it might make sense to take these up in the errors.rs file. Right now there is only the encoding::Error and decoding::Error from the external crates being considered, we need to be able to respond to the errors being thrown from the the encoding and decoding modules.

Change Value::Map to just be a map and add Value::Table

Currently the frontend's Value enum has one Map variant which represents both maps and tables. While these are both maptypes we don't do this for both sequence types (sequence and text). I also find it sometimes awkward to always have to add the MapType::Map when I'm only working with those.

Adding a new variant and updating the existing map one shouldn't be too invasive while making things a bit clearer too.

Thoughts?

Should the backend be Send + Sync?

Given the ideas proposing to have a single backend with multiple frontends I think it would make sense for the backend to be Send + Sync. This means we could send a backend across threads as well as use it in shared memory settings with, for example, a Mutex. I can see this being particularly useful for server implementations in Rust where clients for a document may not always land on the same thread.

Basically this comes down to replacing instances of Rc with either Arc or removing them if they aren't necessary. Also we'd need to switch from im_rc to im which has a minor performance penalty I believe.

I'm not sure of the impact of this on wasm but think it should still work the same. I've got an example branch of this here

Stub out WASM interface

Create a WASM interface with all the calls that the javascript frontend requires of the backend in the automerge-wasm crate.

CI for ensuring `no_std` compatibility

I've been using this method to make sure that automerge-backend and automerge-protocol build in a no_std environment. This is a little tricky to get working with cargo workspaces so I haven't automated it yet but we should. I think what we would need to do for this to work in CI is to create a new project at test time which sits outside of the workspace root and references automerge-backend and automerge-protocol by path.

Arbitrary structs with custom serialization

Would it be possible to provide a frontend that works with arbitrary Rust structs? Maybe treediff could be helpful for this.

We would like to use Automerge for a project storing mostly coordinates and binary data. JSON isn't very suitable for us, hence we would like to use ordinary structs. These structs will be serializable to Cap'nProto which is structurally similar to JSON. Instead of providing an Automerge fronted for Cap'nProto, it should be possible to specify a trait that has to be implemented, comparing and serializing its data. Also, conflicts would have to be managed through a callback, I presume.

I should also mention that this project will run natively in Rust and is not associated with a webbrowser.

C API missing some functions

The following Backend API is not exposed.

function getChanges(state: BackendState, haveDeps: Hash[]): Uint8Array[]
function getMissingDeps(state: BackendState): Hash[]

Is this based on the new document format?

I follow along the automerge project quite I while. I am still waiting for a stable documen format. I looks like the work on the performance branch is related to this new format. Are you implementing already the new format in rust or still the old one.

Nevertheless great idea of brining automerge to rust. I am looking forward to use it in an iOS App.

Fix column decoding

#359

Missing the getAllChanges method expected by automerge

The latest version of the performance branch expects the backend to have a getAllChanges method.

Should handle concurrent deletion of the same element

I use the C-API inside the swift implementation. The following tests crashes.

// should handle concurrent deletion of the same element
    func testConcurrentUse14() {
        struct Scheme: Codable, Equatable {
            var birds: [String]
        }
        var s1 = Document(Scheme(birds: ["albatross", "buzzard", "cormorant"]))
        var s2 = Document<Scheme>(changes: s1.allChanges())
        s1.change {
            $0.birds.remove(at: 1)
        }
        s2.change {
            $0.birds.remove(at: 1)
        }
        var s3 = s1
        s3.merge(s2)
        XCTAssertEqual(s3.content, Scheme(birds: ["albatross", "cormorant"]))
    }

The merge produces the following patch.

{
  "clock": {
    "1433c6a367c746bfa443678cec6ad8a2": 2,
    "7bcd9c17b285417998593a04bd316e48": 1
  },
  "deps": [
    "25af9eaf85f1a124930d4ffce78bede03ab6094e03dad83b93c583cdf1214ac6",
    "d0731270a188b21371509628c6c5ca868ab311c47b5a54ab77e0bcef585fae81"
  ],
  "canUndo": true,
  "canRedo": false,
  "version": 3,
  "diffs": {}
}

It looks like the diffs property is invalid

[C-Interface] getMissingDeps return type

When doing the following

let length = automerge_get_missing_deps(automerge)
var buffer = Array<Int8>(repeating: 0, count: length)
automerge_read_json(automerge, &buffer)
let newString = String(cString: buffer)

I would expect newString to contain a Clock (key value pair [Int: String]). But a String Array is returned.

Port `diffs`

The javascript backend produces diffs, which are used by the frontend to update it's cache. We need an enum to represent all the different forms of these diffs.

quickcheck-based testing

Hello :)

I'm excited to see automerge coming to the Rust ecosystem!

One of the most effective time-saving engineering techniques I used while writing the first version of https://github.com/rust-crdt/rust-crdt (and most of my projects since then) was to use quickcheck to generate randomized operations, then permute that sequence of operations, then assert that all permutations of applications converged to an identical final state after all operations were applied. You can also clone them to assert idempotency. This is orders of magnitude more effective than most example/table-driven testing. Personally, I hate writing tests, but I take a huge amount of satisfaction when the machine spits back a failing example that illuminates a bias I encoded in the system.

The really nice thing is that quickcheck will then shrink the failing set of operations down to the minimum set that still causes the bug to jump out, effectively telling a story for you to more easily debug what happened compared to staring at a chain of 1000 random operations and not knowing what could have gone wrong. It basically is a way to write like 50 lines of test code that then generates millions of interesting tests that keep mining for bugs in your implementation, avoiding the "pesticide paradox" where your code just becomes immune to your static tests.

I initially just cloned this repo to implement something similar, but I realized that some of the functionality is a bit scattered across various sub-crates, which adds friction to testing.

I see that Backend is doing a bit of the merging work, scattering the logic of the CRDT across the protocol and backend subcrates. The reason why this adds friction is because it is quite convenient to conditionally derive the Arbitrary trait which allows random instances of a type to be synthesized during testing, but if you want to do this for multiple crates in a way that is only done at testing-time, you have to start adding compile-time features (the #[cfg(test]) attribute is not transitive to dependencies during testing).

So, there are 2 options to still accomplish this, if you are interested:

introduce a testing conditional compilation feature that then triggers the Arbitrary trait to be derived for various types (relatively unintrusive)
merge backend and protocol, or at least push the least upper bound function for the automerge CRDT into protocol (much more intrusive, but maybe in useful ways?)

would you be interested in either of these, to facilitate making automerge-rs far more reliable through the use of generative testing?

Rebuilding a frontend from a backend fails when using UTF-8 strings

Hi,

For the project I work on, we often rebuild an automerge frontend from an automerge backend.

I noticed a crash when using UTF-8 characters inside a string. Here are the steps to reproduce :

Reproduction

// This is an example of test string that fails. More examples below.
let test_str = String::from("🌍🌎🌏");

// Create a backend and frontend
let mut backend = automerge_backend::Backend::init();
let mut frontend = automerge_frontend::Frontend::new();

// Main data structure
let mut hashmap: HashMap<String, automerge_frontend::Value> = HashMap::new();
hashmap.insert(
    "key1".to_string(),
    automerge_frontend::Value::Text(test_str.chars().collect()),
);

// Create a "change" action, that sets the hashmap as root of my automerge document.
let change = automerge_frontend::LocalChange::set(
    automerge_frontend::Path::root(),
    automerge_frontend::Value::Map(hashmap, automerge_protocol::MapType::Map),
);

// Apply this change on the frontend
let change_request = frontend
    .change::<_, automerge_frontend::InvalidChangeRequest>(
        Some("set root object".into()),
        |frontend| {
            frontend.add_change(change)?;
            Ok(())
        },
    )
    .unwrap();

// We can notice that the state of the frontend is OK
println!("frontend state : {:?}", frontend.state());

// Trying to retrieve the root dict - this works with the original frontend
let root_dict: automerge_frontend::Value = frontend
    .get_value(&automerge_frontend::Path::root())
    .unwrap();

println!("root_dict {:?}", root_dict);

// Apply it on the backend
backend
    .apply_local_change(change_request.unwrap())
    .unwrap()
    .0;

// Create a new frontend
frontend = automerge_frontend::Frontend::new();

// Rebuild the frontend from the backend
frontend.apply_patch(backend.get_patch().unwrap());

// We can notice that the state of the frontend is NOT OK. 
println!("frontend state : {:?}", frontend.state());

// Trying to retrieve the root dict - this crashes, as the frontend has not been rebuilt properly.
let root_dict: automerge_frontend::Value = frontend
    .get_value(&automerge_frontend::Path::root())
    .unwrap();

Test cases

Here is the list of all the strings I have tested and their status running the code above :

Test	Status
`"Hello"`	Works
`"السلام عليكم"`	Fails
`"Dobrý den"`	Fails
`"שָׁלוֹם"`	Fails
`"नमस्ते"`	Fails
`"こんにちは"`	Fails
`"안녕하세요"`	Fails
`"你好"`	Fails
`"Olá"`	Fails
`"Здравствуйте"`	Fails
`"🌍🌎🌏"`	Fails

Stacktrace

I'm not sure it is very relevant, but I add it anyway. Notice that we use PyO3 to bind to python.

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/hashmap2.rs:283:14
stack backtrace:
   0:        0x1103905e4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hcfc48256a5ab8835
   1:        0x1103ab9b0 - core::fmt::write::haf3903118f694c48
   2:        0x11038e0b6 - std::io::Write::write_fmt::h7385463ac87804ed
   3:        0x110391cef - std::panicking::default_hook::{{closure}}::h91bd4c58cf71392b
   4:        0x1103919bd - std::panicking::default_hook::h7bd29c87df967048
   5:        0x1103922bb - std::panicking::rust_panic_with_hook::hae2b05f08a320721
   6:        0x110391e3b - std::panicking::begin_panic_handler::{{closure}}::h72d68d3a77e0b718
   7:        0x110390a58 - std::sys_common::backtrace::__rust_end_short_backtrace::h7c5e286792f94edb
   8:        0x110391dfa - _rust_begin_unwind
   9:        0x1103c012f - core::panicking::panic_fmt::h1b194bb80d76fb10
  10:        0x1103c0087 - core::panicking::panic::hdb9dddaff64fd68b
  11:        0x11029f133 - jupyter_rtc_automerge::hashmap2::HashmapDocument::test_utf8::h193fd5e9fe887717
  12:        0x110296977 - jupyter_rtc_automerge::hashmap2::__init8698952135223600566::__wrap::{{closure}}::h79f502163f7d59bc
  13:        0x1102a20fc - jupyter_rtc_automerge::hashmap2::__init8698952135223600566::__wrap::h287fa5e7c76fc49c
  14:        0x10fe6e4e8 - __PyMethodDef_RawFastCallKeywords
  15:        0x10fe7ae24 - __PyMethodDescr_FastCallKeywords
  16:        0x10ffaad65 - _call_function
  17:        0x10ffa79ed - __PyEval_EvalFrameDefault
  18:        0x10ff9c34a - __PyEval_EvalCodeWithName
  19:        0x10ffffb80 - _PyRun_FileExFlags
  20:        0x10fffeff7 - _PyRun_SimpleFileExFlags
  21:        0x11002cccf - _pymain_main
  22:        0x10fe408dd - _main

If I can provide any other information to help you fix this, please let me know.

Best regards.

Unbounded memory usage - a small number of changes bumps the memory used by a factor of 10

An example can be found here.

Using the JS backend by changing the automerge package we use like so

- } from "@livingspec/automerge-wasm";
+ } from "@livingspec/automerge";

fixes the issue so i assume it's to do with the backend in some way.

Going further with my investigation, when i remove creating a document from scratch the issue is gone as well, as in memory usage is stable.

Since wasm is built for node and webpack, it might be relevant to say that i have the issue on both.

Please help.

getAllChanges returns incorrect data when DEFLATE is used

I have created a failing test here: https://github.com/livingspec/automerge-upstream/blob/apply-changes-bug/test/test.js#L1315

Using the latest automerge and automerge-backend-wasm.

Based on my observations, the first 11 bytes are clipped from the change.

AssertionError [ERR_ASSERTION]: Expected values to be strictly deep-equal:
+ actual - expected ... Lines skipped

  [
+   Uint8Array(323) [
-   Uint8Array(334) [
-     133,
-     111,
-     74,
-     131,
-     252,
-     38,
-     106,
-     255,
-     2,
-     195,
-     2,
      117,
...
      0
    ]
  ]

The gist of the change is that i have created a long text object containing:

Donec mollis hendrerit risus. Praesent venenatis metus at tortor pulvinar varius. Donec vitae orci sed dolor rutrum auctor. Nullam cursus lacinia erat. Cras ultricies mi eu turpis hendrerit fringilla.

I'm happy to have a stab at fixing the issue, but hoping someone quicker than me would :)

Speed Comparisons Between Automerge Implementations

Firstly, thanks for making this rust port!

What follows below is not meant to be critical, more just an FYI to folks who are considering using the wasm backend right now.

I recently ran into an issue in an Automerge project where it was taking a long time to apply changes to larger Automerge documents (4+ seconds per change, for a doc of ~2MB size).

This project was using the current 'main' branch of Automerge (not the performance branch)

I wondered if switching to the Automerge performance branch + wasm backend might help speed up the updates.

Before doing the full switch, I made a small spike script to compare the speed of updates w/ different automerge versions/backends. My spike script is here: https://gist.github.com/adorsk/3011ceb830a73d138fcc48f2e57775ad . It's a bit hacky, and requires manually installing the different versions of automerge).

The good news is that I found that the wasm backend was usually faster. However, the bad news is that it wasn't always significantly faster. Depending on the type of change operations, the speed-ups I saw were at most 2x faster than the current 'main' Automerge branch.

This certainly shouldn't be taken as an authoritative result, since it's just using my hacky little benchmarking script.

But just a caveat to let folks know that speed improvements may be modest. I hope that this is helpful! And again, thanks for making this implementation, so that it's even possible to do comparisons like this.

Note to future readers who find this later: there are a number of things in-progress which may improve speed: #300 , both in the rust and javascript Automerge implementations. So do check back later.

Does not work in a webpack 5 worker

When importing the backend from a WebWorker in Webpack 5, all we get is an empty module. Upon some inspection, it appears that webpack is treating the index_js.bg file specially, i.e it removes all exports besides the ones needed for WASM, which means it removes all backend related exports.

We have managed to get around this problem by changing the index.js files as follows:

import "../build/mjs/index_bg.wasm";

export const init = backend.init;
export const applyChanges = backend.applyChanges;
export const applyLocalChange = backend.applyLocalChange;
export const getChanges = backend.getChanges;
export const getAllChanges = backend.getAllChanges;
export const getMissingDeps = backend.getMissingDeps;
export const getPatch = backend.getPatch;
export const load = backend.load;
export const clone = backend.clone;
export const free = backend.free;
export const getHeads = backend.getHeads;
export const loadChanges = backend.loadChanges;
export const save = backend.save;
export const decodeSyncMessage = backend.decodeSyncMessage;
export const decodeSyncState = backend.decodeSyncState;
export const encodeSyncMessage = backend.encodeSyncMessage;
export const encodeSyncState = backend.encodeSyncState;
export const generateSyncMessage = backend.generateSyncMessage;
export const initSyncState = backend.initSyncState;
export const receiveSyncMessage = backend.receiveSyncMessage;

Instead of

import * as wasm from "./index_bg.wasm";
export * from "./index_bg.js";

Note that automerge-backend-wasm re-exports from index_bg.js which webpack apparently treats separately.

I have tried to set up an automated test for this, but had no luck in setting up a test environment for webpack, i might do that another day. Until then a test repo can be found here: https://github.com/livingspec/automerge

Steps to reproduce:

yarn
yarn dev
go to https://localhost:3000/worker_bug

Expected: the page should show all the backend functions
Actual: the page shows an empty object, meaning import * as backend from 'automerge-backend-wasm' came back empty

Create automerge-wasm and automerge-backend crates

Ultilmately we will want a rusty interface to automerge, which probably should live under the automerge crate. We will also need a WASM interface. Both of these interfaces will make use of a backend. I suggest we turn this repository into a cargo workspace with three crates:

automergeproviding the rust interface to Automerge, what is currently in the document.rs module
automerge-wasm providing the WASM bindings which will be used to create the javascript package
automerge-backend where the logic of the backend will live. Mostly what is currently in the OpSet.rs and protocol.rs modules

Rename UncompressedChange

Since changes can now actually be compressed (after processing in the backend) the name for the change that a frontend generates seems a bit off:
https://github.com/automerge/automerge-rs/blob/3a8447c068f21256904cab53a553e362cb7ebc14/automerge-protocol/src/lib.rs#L417

Perhaps we should rename that to be the canonical Change since it holds all change information nicely in the struct and move the existing Change to BinaryChange or EncodedChange?

Unwrapping None in group_doc_ops

I'm seeing the following error when trying to use automerge-rs in a project:

panicked at 'called `Option::unwrap()` on a `None` value', /home/andrew/.cargo/git/checkouts/automerge-rs-4d74710a78567f28/ca5ddff/automerge-backend/src/change.rs:695:22

and this is the section:
https://github.com/automerge/automerge-rs/blob/2cc7b60ccb604756bb06fafa38221781e4df2621/automerge-backend/src/change.rs#L688-L698

I'm not sure what should be done if the value is None at this point but I'm happy to contribute if it is something simple to describe, e.g. just skip it.

Create a book for online documentation

The Rust code documentation will be good to have and useful when implementing things in Rust but I think there is still a need for wider Automerge documentation (with rust specific code examples).

I'll propose using mdbook as a typical example of documenting Rust projects more thoroughly.

Update encoding/decoding logic to use 64 rather than size

This ensures we have consistent behaviour across 32bit and 64bit environments.

[C-Binding] Get resulting binary change when apply local change

The JS version of applyLocalChanges supports returning the resulting binary change. Would be great if can expose this to the C-API

Flaky behaviour on save then load backend

I've come across a situation where the backend loading returns a MissingObjectError only sometimes. This seems rather strange behaviour to observe.

I've got it being flaky in a test case here. cargo test broken_save_load a few times should give the error.

I found this via quickcheck so that might explain the randomness of the values.

The changes observed seem to be these:

# for the initial creation of  a value (from Null to this) 
changes: [LocalChange { path: Path([]), operation: Set(Map({"\u{2}": Sequence([Primitive(Null), Primitive(Uint(0)), Primitive(Str("")), Primitive(Counter(0)), Primitive(Str(""))]), "\u{0}": Sequence([Primitive(Counter(0)), Primitive(Str("")), Primitive(Uint(0)), Primitive(Timestamp(0)), Primitive(Int(0)), Primitive(Uint(0)), Primitive(Null), Primitive(Null), Primitive(F32(0.0)), Primitive(Null), Primitive(Counter(0)), Primitive(Uint(0)), Primitive(Null), Primitive(Uint(0)), Primitive(Str("")), Primitive(Null), Primitive(Timestamp(0)), Primitive(Timestamp(0)), Primitive(Uint(0)), Primitive(Counter(0)), Primitive(Uint(0)), Primitive(F32(0.0)), Primitive(Str(""))]), "\u{0}\u{0}": Sequence([Primitive(Str("")), Primitive(Counter(0)), Primitive(Str("")), Primitive(Boolean(false)), Primitive(Timestamp(0)), Primitive(Int(0)), Primitive(F64(0.0)), Primitive(Timestamp(0)), Primitive(Null), Primitive(Uint(0)), Primitive(F64(0.0)), Primitive(Boolean(false)), Primitive(F64(0.0)), Primitive(F64(0.0)), Primitive(Null), Primitive(F64(0.0)), Primitive(F64(0.0))]), "": Sequence([Primitive(Null), Primitive(Uint(0)), Primitive(Int(0)), Primitive(Null), Primitive(F32(0.0)), Primitive(F64(0.0)), Primitive(Uint(0)), Primitive(F64(0.0)), Primitive(Timestamp(0)), Primitive(Str("")), Primitive(Boolean(false)), Primitive(Counter(0)), Primitive(Int(0)), Primitive(Null), Primitive(F64(0.0)), Primitive(Null), Primitive(F64(0.0)), Primitive(Counter(0)), Primitive(Boolean(false))]), "\u{1}": Map({"": Primitive(F64(0.0))}, Table)}, Map)) }]

# for the change from previous to this value
changes: [LocalChange { path: Path([Key("\u{2}")]), operation: Delete }, LocalChange { path: Path([Key("\u{0}")]), operation: Delete }, LocalChange { path: Path([Key("\u{0}\u{0}")]), operation: Delete }, LocalChange { path: Path([Key("")]), operation: Delete }, LocalChange { path: Path([Key("\u{1}")]), operation: Delete }]

Strange reordering when applying patch from backend

With the code below I'm seeing some strange behaviour where changing elements in a sequence seems to eventually lead to a reordering.

#[test]
fn broken() {
    // setup
    let mut hm = std::collections::HashMap::new();
    hm.insert(
        "".to_owned(),
        automerge::Value::Sequence(vec![automerge::Value::Primitive(Primitive::Null)]),
    );
    let mut b = automerge::Backend::init();

    // new frontend with initial state
    let (mut f, c) =
        automerge::Frontend::new_with_initial_state(Value::Map(hm, automerge::MapType::Map))
            .unwrap();

    // get patch and apply
    let (p, _) = b.apply_local_change(c).unwrap();
    f.apply_patch(p).unwrap();

    // change first value and insert into the sequence
    let c = f
        .change::<_, automerge::InvalidChangeRequest>(None, |d| {
            d.add_change(automerge::LocalChange::set(
                automerge::Path::root().key("").index(0),
                automerge::Value::Primitive(automerge::Primitive::Int(0)),
            ))
            .unwrap();
            d.add_change(automerge::LocalChange::insert(
                automerge::Path::root().key("").index(1),
                automerge::Value::Primitive(automerge::Primitive::Boolean(false)),
            ))
            .unwrap();
            Ok(())
        })
        .unwrap();

    // setup first expected
    let mut ehm = HashMap::new();
    ehm.insert(
        "".to_owned(),
        automerge::Value::Sequence(vec![
            automerge::Value::Primitive(automerge::Primitive::Int(0)),
            automerge::Value::Primitive(automerge::Primitive::Boolean(false)),
        ]),
    );
    let expected = automerge::Value::Map(ehm.clone(), automerge::MapType::Map);

    // ok, sequence has int then bool
    assert_eq!(expected, f.get_value(&Path::root()).unwrap());

    // now apply the change to the backend and bring the patch back to the frontend
    if let Some(c) = c {
        let (p, _) = b.apply_local_change(c).unwrap();
        f.apply_patch(p).unwrap();
    }
    let v = f.get_value(&Path::root()).unwrap();

    let expected = automerge::Value::Map(ehm, automerge::MapType::Map);
    // not ok! sequence has bool then int
    assert_eq!(expected, v);
}

Pre-release API checks

Before doing a release we could go through this checklist as a guide: https://rust-lang.github.io/api-guidelines/checklist.html

Possible bug in WASM backend

Hi,

I'm a student working with @ept, and my project is about measuring CRDT performance. As far as I understand, there are two ways of generating an encoded operation with Automerge:

Method 1: this was suggested in the Automerge README:

let new_doc = Automerge.change(doc, () => { /* ... */ })
Automerge.getChanges(doc, new_doc)

Method 2: digging into the source code a bit, we see that Automerge.change is just sugar for:

let [new_doc, change] = Automerge.Frontend.change(doc, () => { /* ... */ })
Automerge.encodeChange(change)

I wrote a little script (https://gist.github.com/eugene-eeo/cbe85d83e9a0bf3c5e7ddda3481803c0) that measures the performance difference between those two methods. Note that I do not just measure time required to apply and encode changes (local), but also the time required to apply those changes on another document (remote). On my machine, without the WASM backend there is quite a big difference:

Method 1:
local: 2068.88 ms
remote: 307.84 ms
local: 1823.04 ms
remote: 252.80 ms
local: 1623.03 ms
remote: 218.70 ms
local: 1620.21 ms
remote: 242.04 ms
local: 1619.00 ms
remote: 220.40 ms

Method 2:
local: 801.65 ms
remote: 307.96 ms
local: 590.28 ms
remote: 243.93 ms
local: 442.30 ms
remote: 228.21 ms
local: 413.66 ms
remote: 200.77 ms
local: 412.88 ms
remote: 202.48 ms

Notice that remote performance stays largely the same.
However when the WASM backend is used, remote performance suffers:

Method 1:
local: 438.06 ms
remote: 70.19 ms
local: 367.26 ms
remote: 68.41 ms
local: 329.03 ms
remote: 63.91 ms
local: 313.60 ms
remote: 63.49 ms
local: 306.28 ms
remote: 61.29 ms

Method 2:
local: 313.25 ms
remote: 966.28 ms
local: 221.59 ms
remote: 939.10 ms
local: 189.73 ms
remote: 938.31 ms
local: 173.63 ms
remote: 981.85 ms
local: 165.58 ms
remote: 930.15 ms

Increasing the length to something like 5k makes my computer run out of memory (using the WASM backend).

I'm using commit a27dd61e2406e9047f68d4e3209f80b78d8d1451 for the automerge repo, and 83145b82c49809aaccf7e6463e164de59225045d for automerge-rs.

Should increment_by take an i32?

https://github.com/automerge/automerge-rs/blob/c103b0638e24076e6cd755f102f3c75926c25d51/automerge-frontend/src/mutation.rs#L55-L56

The comment on the increment_by function indicates that we should be able to pass a negative value but it uses a u32 (so does the associated LocalOperation).

Should this simply be an i32 instead, or maybe even an i64 since that is what the Counter type contains?

patch.edits is missing from the backend

I'm trying to give automerge-rs a crack, and I'm running into this error:

TypeError: patch.edits is not iterable
    at updateTextObject (/home/seph/3rdparty/automerge/frontend/apply_patch.js:231:28)
    at interpretPatch (/home/seph/3rdparty/automerge/frontend/apply_patch.js:278:12)
    at getValue (/home/seph/3rdparty/automerge/frontend/apply_patch.js:17:12)
    at applyProperties (/home/seph/3rdparty/automerge/frontend/apply_patch.js:69:24)
    at updateMapObject (/home/seph/3rdparty/automerge/frontend/apply_patch.js:107:3)
    at interpretPatch (/home/seph/3rdparty/automerge/frontend/apply_patch.js:272:12)
    at applyPatchToDoc (/home/seph/3rdparty/automerge/frontend/index.js:148:3)
    at makeChange (/home/seph/3rdparty/automerge/frontend/index.js:105:20)
    at Object.change (/home/seph/3rdparty/automerge/frontend/index.js:251:12)
    at change (/home/seph/3rdparty/automerge/src/automerge.js:34:29)

It looks like the patch object returned by applyLocalChange is weirdly missing some fields:

I'm running my code by compiling the cjs package (using the package.json script) then linking, and running it with this code:

const automerge = require('automerge')

const backend = require('automerge-backend-wasm')
automerge.setDefaultBackend(backend)

let state = automerge.from({text: new automerge.Text("")}) // this is as far as it gets, then it crashes

Getting system_time on wasm32 target

Currently the system_time is obtained through this function.

https://github.com/automerge/automerge-rs/blob/c103b0638e24076e6cd755f102f3c75926c25d51/automerge-frontend/src/lib.rs#L417-L426

To be able to use this in a web project (using seed) I'm working on I've had to change this to use chrono:

https://github.com/jeffa5/automerge-rs/blob/b290f9117aced3e3ed1535ed0ccde3a819d88232/automerge-frontend/src/lib.rs#L420-L427

As there was already a comment I wonder whether you'd had ideas about how best to solve the issue and if chrono is a possible option for this project?

Host main's docs on github pages

Once we do a release the docs will be up on docs.rs but I think it will be useful to have a hosted version of the main branches docs too as things change.

Handle 'Syncing with a third node when one node has a missing dependency'

See #394

`pred` should be sorted in encoded change

This issue was found by @endeavour42. The following two changes differ only in the order of the preds on the operation:

{"actor": "f4d13a8a", "seq": 67, "startOp": 140, "time": 6207000, "message": "",
  "deps": ["3a1ebe64fdb4b442317e518e0a03238cbde72291394217e0954061588ca8c2e5"],
  "hash": "1fe5702c0d516dba1bf04847d4bcf64faedc22d6b00a389cb08ddf1da8d42fca",
  "ops": [
    {"obj": "_root", "key": "commonVar", "action": "set", "insert": false, "value": "4091494582",
     "pred": ["139@f4d13a8a", "139@37ea42f2"]}
  ]
}

{"actor": "f4d13a8a", "seq": 67, "startOp": 140, "time": 6207000, "message": "",
  "deps": ["3a1ebe64fdb4b442317e518e0a03238cbde72291394217e0954061588ca8c2e5"],
  "hash": "2c42aea2581b92b157a83c250e7f7a1cd3d528f651774555a996bb7eb6ed3339",
  "ops": [
    {"obj": "_root", "key": "commonVar", "action": "set", "insert": false, "value": "4091494582",
     "pred": ["139@37ea42f2", "139@f4d13a8a"]}
  ]
}

The binary format requires that preds appear in sorted order (in ascending Lamport timestamp order). The JS implementation does this sorting here, but it seems like the Rust implementation is not doing this sorting step. The correct encoded hash for both changes above should be 2c42aea2581b92b157a83c250e7f7a1cd3d528f651774555a996bb7eb6ed3339.

Mismatched heads hashes on documents with javascript numbers

Hello! It appears that some documents fail with a Mismatched heads hashes when sent from a rust Backend to the javascript library that include numbers.

For example:

{ "number": 1 }

Will fail, but:

{ "number": "1" }

Will succeed.

A repository with a reproduction is at adamhjk/automerge-mismatched-hash.

Implement recovery from peer data loss

#363

Can't found any method to apply the Undo / Redo ?

pub fn undo_stack(&self) -> Vec<Vec> {
self.internal_undo_stack
.iter()
.map(|ops| ops.iter().map(|op| self.actors.export_undo(op)).collect())
.collect()
}
pub fn redo_stack(&self) -> Vec<Vec> {
self.internal_redo_stack
.iter()
.map(|ops| ops.iter().map(|op| self.actors.export_undo(op)).collect())
.collect()
}

what should I do with the UndoOperation?

Persistent backend

I've had a little play around with creating a persistent backend in Rust here and it seems to be pretty simple, thanks mainly to this discussion.

I've created an initial sled implementation of this backend for persistence but aim to add some more as we go.

Any thoughts/reviews on this would be appreciated!

(feel free to close this whenever too, this is more of a notification than an issue)

Does not work when using ECMAScript modules in nodejs

The backend is built for two targets: node (wasm-pack build --target nodejs) and bundler (i.e. webpack) (wasm-pack build --target bundler).

The published version of the backend has a subpath exports field in the package.json with 2 conditional exports like so

"exports": {
    ".": { 
      "require": "./cjs/index.js",
      "default": "./mjs/index.js"
    }
  },

Node, starting from 13 something, supports ESM. If we were to use ESM and import automerge-backend-wasm, it would go to the "default" export, which is for a bundler. This fails.

The exports should be changed, at the bare minimum, into:

"exports": {
    ".": { 
      "node": "./cjs/index.js",
      "default": "./mjs/index.js"
    }
  },

Example repo: https://github.com/livingspec/automerge