orbitinghail / sqlsync Goto Github PK

SQLSync is a collaborative offline-first wrapper around SQLite. It is designed to synchronize web application state between users, devices, and the edge.

Home Page: https://sqlsync.dev

License: Apache License 2.0

Rust 87.11% TypeScript 11.02% Just 1.38% JavaScript 0.49%

collaborative database local-first offline-first relational-database sql sqlite sqlsync

sqlsync's Introduction

SQLSync

SQLSync is a collaborative offline-first wrapper around SQLite designed to synchronize web application state between users, devices, and the edge.

Example use cases

A web app with a structured file oriented data model like Figma. Each file could be a SQLSync database, enabling real-time local first collaboration and presence
Running SQLSync on the edge with high tolerance for unreliable network conditions
Enabling optimistic mutations on SQLite read replicas

SQLSync Demo

The best way to get a feel for how SQLSync behaves is to play with the Todo list demo. Clicking this link will create a unique to-do list and redirect you to its unique URL. You can then share that URL with friends or open it on multiple devices (or browsers) to see the power of offline-first collaborative SQLite.

You can also learn more about SQLSync and it's goals by watching Carl's WasmCon 2023 talk. The recording can be found here.

Features

Eventually consistent SQLite
Optimistic reads and writes
Reactive query subscriptions
Real-time collaboration
Offline-first
Cross-tab sync
React library

If you are interested in using or contributing to SQLSync, please join the Discord community and let us know what you want to build. We are excited to collaborate with you!

Installation & Getting started

Please refer to the guide to learn how to add SQLSync to your application.

Tips & Tricks

How to debug SQLSync in the browser

By default SQLSync runs in a shared web worker. This allows the database to automatically be shared between different tabs, however results in making SQLSync a bit harder to debug.

The easiest way is to use Google Chrome, and go to the special URL: chrome://inspect/#workers. On that page you'll find a list of all the running shared workers in other tabs. Assuming another tab is running SQLSync, you'll see the shared worker listed. Click inspect to open up dev-tools for the worker.

My table is missing, or multiple statements aren't executing

SQLSync uses rusqlite under the hood to run and query SQLite. Unfortunately, the execute method only supports single statements and silently ignores trailing statements. Thus, if you are using execute!(...) in your reducer, make sure that each call only runs a single SQL statement.

For example:

// DON'T DO THIS:
execute!("create table foo (id int); create table bar (id int);").await?;

// DO THIS:
execute!("create table foo (id int)").await?;
execute!("create table bar (id int)").await?;

Community & Contributing

If you are interested in contributing to SQLSync, please join the Discord community and let us know what you want to build. All contributions will be held to a high standard, and are more likely to be accepted if they are tied to an existing task and agreed upon specification.

sqlsync's People

Contributors

Stargazers

Watchers

sqlsync's Issues

Add sync state API

The client app needs a way to query the current synchronisation state.

Which databases are subscribed?
How much of our timeline has been sent to the server?
How far behind is the server from applying our changes?

Ideally this ties into the mutation API so the client can also use it to await a specific mutation. This would allow the developer to opt into synchronous request/response if needed.

reducers should be wasm components

npm package usage docs

update readme to include basic docs on using sqlsync-react and sqlsync-worker

storage snapshots

With the rebase sync architecture, compacting the storage log is very easy and safe. At any point the coordinator can snapshot and start a new log from the snapshot.

The snapshot algorithm can work like this to improve efficiency on the clients:

coordinator snapshots the log at a particular LSN
appends a "end log" message to the previous log which contains the new log id and a hash of the snapshot
upon receiving this marker, a client can run the snapshot process locally and check that it gets the same snapshot
if it does, the client knows it can create a new local storage log from that snapshot and continue delta replicating from the server
if anything goes wrong, the client can drop their storage log and download the full snapshot from the server

This algorithm depends on the server continuing to serve the old log for some period of time - this time window determines how many clients will be able to efficiently switch over.

Since the coordinator also wants to be durable in a serverless setting, it's likely that the old logs will already be backed up to storage. Thus the coordinator can always serve from the last log lazily without keeping it in memory. (at least until GC runs and kills old logs, but that should never hit the immediately previous log)

pluggable reducer

Currently only wasm based reducers are possible. By making the reducer pluggable (at least on the coordinator to start), it would make it easier to integrate SQLSync into more systems.

Feature request from @matthewgapp

MobX Integration

Should be possible to use SQLSync alongside/with MobX

go language support for reducers

pluggable authorization

Update rusqlite once the f128 issues are resolved

Currently SQLSync depends on a rusqlite patch maintained by @trevyn which has been recently updated to use a more recent version of SQLite. The main PR is here: rusqlite/rusqlite#1010

Unfortunately, recent SQLite versions are now pulling in various long double (f128) routines:

  (import "env" "__extenddftf2" (func $__extenddftf2 (type 99)))
  (import "env" "__getf2" (func $__getf2 (type 123)))
  (import "env" "__multf3" (func $__multf3 (type 94)))
  (import "env" "__trunctfdf2" (func $__trunctfdf2 (type 118)))
  (import "env" "__lttf2" (func $__lttf2 (type 123)))
  (import "env" "__gttf2" (func $__gttf2 (type 123)))

Until they are either shimmed or SQLite gains compile time flags to disable this behavior, we are stuck on SQLite version 3.37.2. (note: if we need a slightly higher version of SQLite, we can manually re-apply @trevyn's patch to a rusqlite version in the middle, assuming we figure out when the f128 deps were added to SQLite).

reducer error handling

Currently, if a query fails in the reducer it can leave the reactor in a half-open state. Solutions:

Make error handling the responsibility of the reducer - this is probably the right choice
- requires updating the guest/host ffi to support returning errors (ideally using result semantics)
- fbm already supports serializing/deserializing results - so this should be pretty easy
reset the reactor (or entire reducer) on error if we want handle errors in the host
- the danger here is that user state could also be half-open, so the only safe thing to do is a full reset

Probably 1 is better as it gives the user more flexibility and allows reducers to handle errors. If a reducer doesn't want to do so - it can forward the error to SQLSync, thus also clearing the half-open state.

run unit-tests and examples in github actions workflow

query subscriptions

Need to plumb query subscriptions into SQLSync core rather than hacking it at the top level using events.

content addressed replication

Supercedes #45

In this replication variant, pages are replicated based on their content hash. Thus the current state of the file can be computed from the content store (provides access to pages by their content hash) and the current snapshot (list of content hashes).

Hash algorithms

see experiments repo
blake3 is the safest choice which outputs cryptographic 32 byte hashes
polymur is a faster choice which outputs 8 byte hashes at a collision rate of n * 2^-60.2 where n is the input size in bytes

snapshot size

assuming an object size of 4k and a hash output size of 32 bytes, a 1GB database snapshot will consume 8MB

notes on efficient replication

clients always want to update to the latest snapshot
ideally we are able to only store changes to the snapshot rather than re-storing the full snapshot
a tree based structure can allow for efficient structural sharing (conceptually similar to a merkel tree)
blake3 is a tree based hash function that internally computes a tree of hashes over 1kb chunks, and bao is an application of blake3 for verifiable streaming/seeking without needing full access to the data
one of the authors of Blake3 is currently factoring out the needed pieces of the blake3 crate for me to sit my replication algorithm directly on top of it (I think)

schema migrations

timeline durability

Timelines should be durably stored on the client using #15 and also durably stored on the coordinator.

Need to think about what happens if the client loses its timeline state, but still knows its identity. Should it pull down pending mutations from the server, or should it create a new timeline id and act as a new client. Since timeline is durable, this would be correct - although to the client it may appear that data has been temporarily lost until the server incorporates it. On the other hand, this is a pretty rare edge case - and it makes sense that local durability should be treated as all or nothing.

npmjs packages

[BUG] JournalId type error

Running into a type incompat error in the SQLSync npm packages. Issue is related to the folder remapping I'm doing during build between src and dist.

cloud metrics and billing

QR code for sharing the demo

Add a QR code to the demo in order to enable easier sharing.

OPFS journal for browser storage

Eventual Consistency vs Authority Check (mainly a discussion)

Hi there, first applause for the great work.

I made a collaborative docs platform before and now am working on enterprise SaaS. For the former, optimistic read and write ensured local-first functionality. For the latter, strong access control is a necessity.

I am always wondering how to combine the two. In the spirit Stop building databases post and this project, auto sync between frontend and backend databases with eventual consistency made the mental model of building an app much easier and better user experience. But it's kind of conflicting with strict access control in the enterprise SaaS environment.

For example, a client was able to read and write customer records from number 1 to 10, but later the access to 7 to 9 stopped by admin. During the info sync of this access control change from backend to frontend, the client can still operate on customer records 7 to 9, which may violate the access control.

Maybe optimistic read but pessimistic write? The client can still read the info freely locally before it knows the access to 7 to 9 are stopped, but any change to any records including 7 to 9 needs to go through backend first.

Looking forward to some ideas. Thanks.

connection management and status API

apps should be able to access connection and sync status
apps should be able to start, stop, and restart replication

table level query subscriptions

detect which tables are changed by each mutation as well as which tables are depended on by each query subscription - only invalidate queries that reference changed tables

rebase performance

Currently we are rebasing the timeline every time we receive a frame from the server. This is needlessly expensive and can be improved.

Two perf holes:

the server is sending us more frames soon
we have a lot of pending mutations that haven't been seen by the server

2 can sometimes imply 1, however 1 can also happen if other clients are sending tons of changes.

To optimize for 1, the server can send a hint to the client with the number of outstanding frames. This allows the client to make better decisions about rebase.

To optimize for 2, we can try to optimize using the following facts

how many pending mutations have yet to be acknowledged by the server
how fast are other timelines changing (we can determine this by looking at the timelines table in the db)

We can come up with some heuristics regarding these facts to balance user experience and rebases.

Note, we can also optimize rebase perf through mutation batching. Currently we pass in a single mutation to the reducer at a time. This involves many round trips through wasmi which is not very fast. It would be much faster to batch mutations to the reducer allowing reducers to optimize internally. For example, merging many mutations into a single insert/update statement.

Server-initiated mutations

Plan

Expose a new APIs on the CoordinatorDocument type called mutate_direct which takes a callback of the form (tx: &mut Transaction) -> Result<()>. Internally it grabs a txn from sqlite, runs the cb, runs storage.commit and then returns. If the cb fails, the txn will be rolled back (and possibly storage reverted).

Note: this api is specifically designed to discourage long-lived txns as SQLSync is single threaded and thus while this txn runs no other mutations can execute.

Discussed in #41

^{Originally posted by matthewgapp January 4, 2024}
I have an instance of the coordinator running on our server and I am using the pluggable reducer I wrote (#40). But it's not quite enough, because I need to incorporate server-initiated mutations that are then synced to all clients. Is there an existing API I can use? Perhaps the write_lsn method?

document management api

presence tracking (cursors, connections)

Update tsify to use siefkenj's fork

Fork: https://github.com/siefkenj/tsify
Currently in next branch. Need to check and see what their plan is, but it looks like they are fixing most of the outstanding PRs

cloud query api

Do we miss a roadmap?

This project is amazing! Really!

But I think we miss a roadmap.

Can we open an issue to pin to the top to collect ideas about the immediate and long future?

Thank you again for your work.

native sqlite extension

Currently SQLSync wraps SQLite in order to carefully control how SQLite behaves and improve the user experience of using SQLSync. But this makes SQLSync harder to port to new SQLite environments.

This feature request is to build a version of SQLSync that can run entirely within SQLite as a native extension.

Things to figure out:

What does the mutation/reducer layers look like? If the reducer continues to be Wasm, it can run internally; alternatively SQLSync can somehow call out to the host to run mutations when needed.
How easy is it for SQLSync to have a private timelines table if it's entirely inside the SQLite engine?
Will need to add some pragmas/custom functions to inspect and control SQLSync from the host

timeline GC

coordinating reducer version upgrades

Upgrading a reducer requires careful coordination between the server and the client.

Need to look at this from the perspective of client-server reducer version mismatch. When this occurs, the server is in a unique position to coordinate the upgrade.

The other problem is when the client submits mutations that were created by an older reducer version. This may require extending the reducer API to support "mutation migration" or lensing.

Elm integration

Should be possible to use SQLSync in an Elm application.

snapshot replication system

Possibly makes #8 and #9 unneeded.

A more efficient replication system that only tracks page indexes rather than page contents in the log. The insight is that clients only need to fast forward to the latest snapshot. This can be computed by comparing the client and server versions and unioning all page changes in-between them. Then just serve pages from the current snapshot.

Note, that if concurrent writes are desired in this model, this does require some kind of actual snapshot functionality on the send side. To this extent, I've been researching copy-on-write systems to understand various ways to build this in a performant way.

Some rambling thoughts:

use the sqlite wal or build a similar wal of my own to buffer writes
consider that multiple clients can be requesting changes starting from different server versions, thus each needs their own changeset
- although, one observation is that if you ignore truncation, changesets always either get smaller or stay the same size, they never grow
- perhaps this means you could just build one snapshot from the oldest known client version and then send it out to all the clients with a tiny bit of filtering for more up to date clients
- although, it's also desirable to stream out snapshots rather than building them in memory
currently it's nice that both the mutation timeline and the storage log share the same journal and replication abstractions, but this may be preventing me from doing things like easily truncating the timeline
need to be careful to maintain recovery after failure, currently since all writes go into a pending SparsePages object, any failure will result in the database safely rolling back to the last saved snapshot

To Reproduce
Follow the guide.
When running cargo build --target wasm32-unknown-unknown --release, get many errors such as error[E0425]: cannot find function v128_and in this scope. See full error log here: https://gist.github.com/steveruizok/7e2ef31bddaeaedfe738237eeb3d7048