Giter Club home page Giter Club logo

Comments (117)

joe-getcouragenow avatar joe-getcouragenow commented on June 2, 2024 16

The first thing i want to say is to know what your objective is. You can do build very different systems with CRDT.

For example an oldie but an old one is Roshi ( https://github.com/soundcloud/roshi ), which uses CRDT Server side as a WAL ( Write Head Log).

A recent one based on IPFS is https://github.com/StreamSpace/ants-db.
See here: https://github.com/StreamSpace/ants-db/blob/master/go.mod#L10
It is attempting to create a WAL also. So that you can have many of these stores and they are synchronise and each be eventually consistent using CRDT.

Once you have a DB with a WAL, then you can do CRDT in general with HIVE for example.
The Mutations go into the DB WAL, then get transacted against your local DB to make a KV Store from which you can do queries and you can easily push the change event to the Flutter GUI. You need this type of Kappa style architecture because a CRDT mutation can result in many buckets in your KV store mutating once you build real work apps.
Then you broadcast the data in your WAL out to all Peers ( or Server ), and visa Versa. The Visa Versa bit is often the crux. Often it is better to have a relay Server that is a store and forward Public IP where the CRDT WAL mutations can be shared between client. This is because for Group chats or group work on a document you need a relay point. Signal for example does the same for Group chat. For One to One chat it uses Peer to Peer to do One to One sync of the CRDT mutations.

SO from above you can maybe see that there are many ways of building a CRDT based system.

Martin Kleppmann is the main person behind Automerge (https://github.com/automerge/automerge-rs). He is at Cambridge and has been working on it for maybe 15 years.

The JS version that works well is called yjs. https://github.com/yjs/yjs
https://github.com/dmonad

  • He is in Berlin
  • The examples work very well and show off how powerful CRDT can be in the right scenarios.

There is also Yorkie: https://github.com/yorkie-team/yorkie

  • golang and mongo DB with CRDT based synchronisation
  • It has a Server that acts as a relay point and a source of truth with a mongo DB
  • The client is written in JS. Its easy to make that work for Flutter by porting the actual logic which is here: https://github.com/yorkie-team/yorkie/tree/master/pkg/document
  • It is however far better i feel to just cross compile that Document code to WASM and C ad then use FFI or Protocol Buffers ( aka Protobufs) to reuse it in flutter. Why ? Because you then have one code base that is your CRDT engine.

Then you have to decide a Transport if you want to do Client / Server Sync and Client / Clients sync.
For Client / Client most use WebRTC, partly because you also get the Stun and TURN aspects to find the Peers, and so you need some servers anyway. SO many use https://github.com/pion/ion, which has Flutter support :)

I have been working on this and ended up going down the path of not using Dart. Why ?

  1. CRDT is hard and the maths and logic is very complex and difficult to get right.
  2. You are going to need the CRDT Logic is exist on Clients but as well as Servers.
  3. You need network transport libs that do not really exist in the dart world.

SO, i have been working on taking golang code and cross compiling it to work with Flutter Web, Desktop and Mobile.
Many others also take this approach.
This is a great example that works well and is something you need anyway if you want to do Peer to Peer CRDT, because it gives you a local Identity system: https://github.com/jerson/flutter-openpgp

  • It compiles the golang to WASM for flutter Web
  • It compile the golang to c for flutter Mobile and Desktop.

For the Client and Server i use GRPC and GRPC WEB, and so does Yorkie. This works very well with Hive too since hive also has Protobufs Schema Evolution dues to the Fields having numbers. you need this for a real world app because you will have clients that are 1 or 2 versions behind the latest code form the Server or another client.

There is a CRDT implementation that uses Protobufs called Cloud State, where CRDT aspects are formalised into the Protocol buffers which is really the way to go IMHO.
https://cloudstate.io/docs/go/current/index.html
https://cloudstate.io/docs/go/current/gettingstarted.html

https://github.com/cloudstateio
DART: https://github.com/cloudstateio/dart-support
GOLANG: https://github.com/cloudstateio/go-support

Whats really going on is that they have a Event Sourcing ( https://github.com/cloudstateio/go-support/blob/master/cloudstate/eventsourced.go) kappa based Architecture, and the Server is CRDT away and so mutations can be building up on the Server for the clients to use. It is basically the same as Yorkie in this way.
Here is the Server Entry point: https://github.com/cloudstateio/cloudstate

However you need a big Server based on Akka and Java.
I think that can easily be replaced however with golang because what the Server does is actually not that complex.

This is where the real juice of it is: https://github.com/cloudstateio/cloudstate#common-intermediate-representation

SO what you could do is use Hive as the Client KV DB, with the Event Stream being over GRPC to the Server which is itself a CRDT away DB.

Of course all systems need both Data and image blobs. You still need to handle images also but thats pretty easy because the Event stream can just hold the mutation event describing the image CRUD and transmit the actual image over GRPC as BASE64 or other.

Here is the CRDT Proto that is the important bit

Proto:
https://github.com/cloudstateio/cloudstate/blob/master/protocols/protocol/cloudstate/crdt.proto

Proxy Proto: https://github.com/cloudstateio/cloudstate/blob/master/proxy/core/src/main/proto/cloudstate/proxy/crdt_protobufs.proto

Golang: https://github.com/cloudstateio/go-support/blob/master/protobuf/protocol/cloudstate/crdt.proto

Dart: https://github.com/cloudstateio/dart-support/blob/master/lib/src/generated/protocol/cloudstate/crdt.pbgrpc.dart

In the end using GRPC and formalising CRDT into the GRPC Protobufs allows people to build any Project with CRDT, rather than it being an after though.

Note how they use the ANY Proto key word - This is really what allows any other Proto to be encapsulated by it for developers Produce Protos.
https://github.com/cloudstateio/cloudstate/blob/master/protocols/protocol/cloudstate/crdt.proto#L21

And where they finally get to the heart of how the thing really works :)

https://github.com/cloudstateio/cloudstate/blob/master/protocols/protocol/cloudstate/crdt.proto#L29

You can still do Peer to Peer Sync with this also but you need the CRDT Sync engine on the Client also.
I do not think they have this.


Summary

CloudStates approach of modelling the Event Stream and CRDT using Protobufs is really smart because

  • you get strong typing, code generate and all that.
  • People can build Protobufs on top for their Projects
  • You can Schema Evolution which is huge in the real world because you are going to have one client at V1 and another at v2. that just life. Without Schema Evolution there is almost no point building a system iMHO.

People in general build two types of systems and CRDT needs to accommodate that:

  1. A KV Store that you want to do Predicates over, for which a standard CRDT based WAL is perfect. Then you just need to model the CRUD. CR (Create, Read) are not needed to be CRDT, but the UD ( Update, Delete ) mutations do need to be CRDT.
  2. A Document Store, like Google Docs, etc for which you want a CRDT based WAL and then higher order code that handles the OT ( Operational Transforms ) that you want to do on your Document Model.

I think the last thing you want to do is hook your domain model to the Event Stream. Many in the CQRS / Kappa worlds do this and i works, its very very hard to get your Domain model really correct. Its better to have a generic Event Stream modelled in Protobufs that represents your KV Store, and then let higher order code work on that. Many Teams end up refactoring their Domain model many times to get it right, and the Data migration that they have to do once they are past V1 with Live users and live data is "Production Hell" as Elon would call it. God way to kill a project. Cloud State and the Akka devs are experts in Domain Model design. 99.9% of Devs are not !

Hive then fits in perfectly with this as a Store for Case 1 and Case 2. It is also Protobufs aware, and you can do Basic Predicate like queries on it like get me all items where firstname is X with Pagination.

Then you can build systems where the Database can be on a Server ( even once that can be blown away and reconstructed from the event stream). This is important because for all these Serverless approaches you have to be able to loose your Database. On Mobiles like with Signal when you get a new device and put in your QR code, where do you think all the data to reconstruct you Database on your mobile is coming from ? Its coming from either your other device ( where the server is just a relay proxy) or from the Secured at Rest WAL that is stored on the Signal Servers. Same goes for Telegram. That WAL is stored forever and only you can access it because its encrypted against your Private key from your device. When you get a new device and exchange the Private key via a QR codes the WAL being streaming into your new device is being decrypted on your mobile and into your DB ( a Hive Store for example ).

SO then the last choice is what CRDT engine to use and where ?

Automerge is the industry standard that has actually worked out well. It has been 15 years of many effort by some very smart people, and alot of pain along the way. A ton of CRDT implementations have been attemtped and did not make it to production because its very hard. Garbage collection being one of the big ones.
Yorkie is a copy of it where the dev has gone through the code and rewritten it.

I think that Yorkie is the way to go, but with the Protobufs using Cloud State !!! Why because then you can build any Product on top of those Protobufs.
Because your using Protobufs both Cloud State and Yorkie use Envoy ( https://github.com/yorkie-team/yorkie-js-sdk/blob/master/docker/docker-compose.yml#L7). Envoy allow GRPC and GRPC Web to just work and provide a semi firewall because it closes network entry points up tight. Envoy is the thing in side Kubernetes btw.
So your Protobufs have a nice Proxy provided by Envoy and envoy does all the insane magic to make sure it works with Web and Native Clients. There is a hell of a-lot of complex logic to make sure Dart GRPC works across all Routers ( Http 1.1, HTTP 2, etc etc ). This is why people use Envoy.

Then the Yorkie Protobufs which are here: https://github.com/yorkie-team/yorkie-js-sdk/blob/master/src/api/yorkie.proto
As you can see they are modelling Documents and rightly so since they are building a Document editing system.

So what i am saying is that you could add the Cloud State Protobufs to Yorkie, and then get it working with Hive.
The Logic that runs on top of the Document Protobufs is already there in the Golang.
Then you wonder if you shoudl just cross compile the golang OR port it to Dart.
I would cross compile it. Why, Because the Protobufs are already there. So you can sue the Protobufs as a communication link between the Flutter world and the WASM / Naive cross compiled world !!!! You do not have to use Dart FFI, and neither shoudl you because there are many things in Dart FFI that do not work so well.
Moor is the best example of how far you can do with FFI: https://github.com/simolus3/moor/tree/master/moor_ffi
But those bindings are not autogenerated for him.
With a ProtoBufs CRDT you are going to have New Public interfaces for each Project, and so you need perfect Code generation for it. So then use the Protobufs for the Comms link from the Flutter Code to the WASM or Naive code.

SO then look at https://github.com/jerson/flutter-openpgp
He does not use FFI or Protobufs but instead hand coded the bindings because he did not start with Protobufs.

Golang --> Web WASM: https://github.com/jerson/openpgp-mobile/blob/master/Makefile#L24
Golang --> Mobile: https://github.com/jerson/openpgp-mobile/blob/master/Makefile#L29

Then the Flutter binding to use that::
FLutter Method Channel Binding: https://github.com/jerson/flutter-openpgp/blob/master/lib/openpgp.dart
FLutter WASM binding: https://github.com/jerson/flutter-openpgp/blob/master/lib/web/js/wasm.dart

my point is that this binding code can all be code generated because you have Protobufs and so can, at compile time, code generate the Bindings.

from isar.

bgervan avatar bgervan commented on June 2, 2024 4

I didn't know it already support inheritance. It wasn't a year ago. Let me check.

If Isar aims to support sync, then these are the bricks first need to be implemented.

Yes, similar what watermelon or couchbase does.

By CRDT support, I mean, the object could have fields built-in, for time sync. And on flutter side, it could be updated respectively as it needs. If we save an object the save could update those fields as it should be by definition.
The fields I mean are: Object ID, HybridLogicalClock (Timestamp, device id, counter). But of course with inheritance we can define base class to have these fields, and can have save override, push, pull functions easily. All depends on the vision how built-in support you plan.

From the backend side, there are many way it can be done - one possible way is - Rest APIs with websocket support to listen changes real time. If you have plans to implement the sync in Isar, I can create a backend in Django, with socket support. The DB on the backend could store the objects as json in Postgre, and the above fields separately, to have the efficience filtering.

from isar.

thipokch avatar thipokch commented on June 2, 2024 3

Should we put together a draft document?

from isar.

richard457 avatar richard457 commented on June 2, 2024 2

Would like to suggest sync to have some sort of routing data something similar to https://docs.couchbase.com/sync-gateway/current/channels.html This is important because of the following reasons

  1. Managing access to data to multiple users
  2. Control who can access what
  3. Enable users to access just the data they need
  4. Minimize the amount of data synced to mobile devices @leisim
    Example code:
 await Isar.open(
      directory: dbName,
      schemas: [
      
      ],
// users list to access data across sync
     channels:[
        "1",
        "2"
     ]
      inspector: false,
    )

from isar.

CodingSoot avatar CodingSoot commented on June 2, 2024 2

I'm not sure but I think the remote schema changes should be fully handled by the backend. The client should just pass the local lastPulledAt timestamp and the local schemaVersion with each pull/push calls, and let the endpoint adapt the response based on those.

from isar.

peter-mghendi avatar peter-mghendi commented on June 2, 2024 2

@ojm-it I'm not fully caught up with this entire thread, but the original comment talks about peer to peer connections being possible. What happens in this case?

from isar.

jonaswre avatar jonaswre commented on June 2, 2024 2

Hi, I have a pretty low tech proposal.

It would be cool to be able to get the transaction log for certain collections.

This would allow to transmit the transactions as a json patch to a server which can then figure out which modifications should be made and which should be discarded.
This might result in some datalos, but makes it easy to transmit modification to the server.

from isar.

richard457 avatar richard457 commented on June 2, 2024 2

Should we put together a draft document? like a summary of what we have discussed here?

from isar.

jonaswre avatar jonaswre commented on June 2, 2024 2

Hi so I've read the discussion and i m gonna be honest i didn't understand everything but a lot of it sounded for me like over engineering. I personally think that the way WatermelonDb manages it is completely fine and easy to implement. If you agree on that with me I work on an Offline Support for my Appwrite backend and it works but help would be much appreciated so if somebody is interested please join :)
https://github.com/ThalesVonMilet/offline_support_for_db

I agree there is some very complex solutions here.

I think for most use case they are too complex. But I also think your proposal is to simple. If I understand correctly you are using a flag to detect if record was changed. But for example for my use case this is not enough. That's why I proposed to to log the transactions. That way you can either transmit the whole record or just the parts that changed. In your solution modifications on two clients would lead to a conflict even if they changed different parts of the record.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024 1

Sled has CRDT support. Might be useful to see it working in a database environment.

There's also a whole bunch of crates for working with CRDT classes.

from isar.

Manuelbaun avatar Manuelbaun commented on June 2, 2024 1

Hey guys,

recently I was exploring how CRDT can be utilized to build a sync layer. So I did a university project written in Dart. The main purpose was to get some insides of how CRDT works etc and I still have a lot to learn :): If you are interested what I did here is the project: https://github.com/Manuelbaun/sync_layer_crdt_playground

I used some form of delta state CRDT with HLC and basically used them as events when any updates occurred. Updates looked like this in pseudocode:

class Atom {
    ID, //  siteID, hybrid logical clock
    type, // like the table or object 
    ObjectID, 
    data, // any form of data usually a kv map
}

To emulate tables I used a "CRDT Map" (Grow only), where the key represents the column. Note that my implementation does not support to remove keys from a CRDT map.

I was watching the talk from James Long and that is how I designed my CRDT sync layer after. I used a Merkel trie to diff two nodes, and then send all events that happened after that time to the other nodes, so they sync.

What I did not look into, how garbage collection could work. Still on my list though. And there are a lot of things to do, to make a sync layer actually work well.

Great talk on the topic.

The talk from Martin Kleppmann was very informative, how automerge achieved that great compression for the text editing.

For another project I used yjs with WebRTC (because at the time it had the best performance) to make a simple P2P drawing game. While WebRTC is great in theory, it is not working everywhere. On my university campus, I couldn't make it run. So peers would not find each other. But it worked perfectly at home.

from isar.

omidraha avatar omidraha commented on June 2, 2024 1

Hi,
I read this topic,
And I have a syncing experience for the mobile app that I developed as Offline-First Mobile .
Server side db is PostgreSQL and client side db is Realm-DB
Actually Realm support syncing itself by realm-object-server ,
But I would like to have PostgreSQL as back end on the server so I implement my approach for that.

Roles:

  1. There are only two commands on the client for syncing data : pull and push.

  2. There are only two api on the server for syncing data /pull/ and /push/.

  3. Models in client and server have only three extra fields that are inherited for facilitation:

abstract class SyncModel
   DateTime lastUpdate;
   Boolean synced;
   Boolean deleted;

class User extends SyncModel
   String uid
   String name
   String email
   String mobile
  1. It always runs first pull and then push.
  2. The pull always get records after lastUpdate.
  3. If pull failed, It stop and don't continue to push.
  4. The pull only apply changes on the records that are currently synced in the client.
  5. The push only push out of synced records.
  6. The push always need to confirm before applying synced.
  7. Client will delete a record while record is both of deleted and synced.
  8. Conflict resolve on the server just by prefer new records by lastUpdate.
  9. The pull and push are atomic and If one of them failed finally, transaction will rollback.

In my opinion, it would be great if this feature be something plugin-able, so that the server side can be written for any language and database, for example for Python/Django with PostgreSQL, and we have plugins for each of them, for example:

isar-flutter-sync
isar-django-sync

from isar.

bgervan avatar bgervan commented on June 2, 2024 1

Hi,
I read this topic,
And I have a syncing experience for the mobile app that I developed as Offline-First Mobile .
Server side db is PostgreSQL and client side db is Realm-DB
Actually Realm support syncing itself by realm-object-server ,
But I would like to have PostgreSQL as back end on the server so I implement my approach for that.

Roles:

  1. There are only two commands on the client for syncing data : pull and push.
  2. There are only two api on the server for syncing data /pull/ and /push/.
  3. Models in client and server have only three extra fields that are inherited for facilitation:
abstract class SyncModel
   DateTime lastUpdate;
   Boolean synced;
   Boolean deleted;

class User extends SyncModel
   String uid
   String name
   String email
   String mobile
  1. It always runs first pull and then push.
  2. The pull always get records after lastUpdate.
  3. If pull failed, It stop and don't continue to push.
  4. The pull only apply changes on the records that are currently synced in the client.
  5. The push only push out of synced records.
  6. The push always need to confirm before applying synced.
  7. Client will delete a record while record is both of deleted and synced.
  8. Conflict resolve on the server just by prefer new records by lastUpdate.
  9. The pull and push are atomic and If one of them failed finally, transaction will rollback.

In my opinion, it would be great if this feature be something plugin-able, so that the server side can be written for any language and database, for example for Python/Django with PostgreSQL, and we have plugins for each of them, for example:

isar-flutter-sync
isar-django-sync

Funny, that's exactly what I plan. The DB will be propably isar, because it has date filter as I required, the sync will be done on websocket and the other clients can get real time the new data.

Is there anyone who would like to join? The backend is with django channels, with redis/ or with google pubsub. An experienced flutter developer for the frontend stuff would be welcomed, since I am a backend heavy developer, I am not sure I can implement with best practices the frontend code by myself.

The frontend (can be named as isar-flutter-sync) will based on the https://github.com/cachapa/crdt package, but it's not perfect for my needs, so need to rethink a bit.

from isar.

richard457 avatar richard457 commented on June 2, 2024 1

https://www.youtube.com/watch?v=B5NULPSiOGw (A very good explanation of CRDts)

from isar.

ansarizafar avatar ansarizafar commented on June 2, 2024 1

Check this https://condensation.io/

from isar.

devon avatar devon commented on June 2, 2024 1

https://wiki.nikitavoloboev.xyz/distributed-systems/crdt

https://github.com/alangibson/awesome-crdt

More information about CRDT.

from isar.

CodingSoot avatar CodingSoot commented on June 2, 2024 1

I like the sync mechanism of WatermelonDB (which is a React/React native database) :Β https://nozbe.github.io/WatermelonDB/Advanced/Sync.html

Here are the implementation details : https://nozbe.github.io/WatermelonDB/Implementation/SyncImpl.html#sync-procedure

from isar.

richard457 avatar richard457 commented on June 2, 2024 1

@ojm-it I'm not fully caught up with this entire thread, but the original comment talks about peer to peer connections being possible. What happens in this case?

From the "General design" section :

  • master/replica - server is the source of truth, client has a full copy and syncs back to server (no peer-to-peer syncs)
  • two phase sync: first pull remote changes to local app, then push local changes to server
  • client resolves conflicts
  • content-based, not time-based conflict resolution
  • conflicts are resolved using per-column client-wins strategy: in conflict, server version is taken except for any column that was changed locally since last sync.
  • local app tracks its changes using a _status (synced/created/updated/deleted) field and _changes field (which specifies columns changed since last sync)
  • server only tracks timestamps (or version numbers) of every record, not specific changes
  • sync is performed for the entire database at once, not per-collection
  • eventual consistency (client and server are consistent at the moment of successful pull if no local changes need to be pushed)
  • non-blocking: local database writes (but not reads) are only momentarily locked when writing data but user can safely make new changes throughout the process

Peer to peer I would say is important if it is possible

from isar.

CodingSoot avatar CodingSoot commented on June 2, 2024 1

Drift

can you elaborate more on Drift?

Sorry I meant Isar

from isar.

richard457 avatar richard457 commented on June 2, 2024 1

https://wiki.nikitavoloboev.xyz/distributed-systems/crdt

https://github.com/alangibson/awesome-crdt

More information about CRDT.

This is really awesome!

from isar.

jeff9315 avatar jeff9315 commented on June 2, 2024 1

Is the intent to allow peer-to-peer decentralized databases without a central server? Here is my use-case ...

I am in a group called CERT (Community Emergency Response Team). There are nationwide groups (in the US) with no relationship to each other. I would like to develop an opensource project for any group to use ... but each group's members would only see data within their group.

I'm trying to find a way to have an open-source project without incurring database costs and without having users have to configure databases outside of the app.

Does it sound like this would be doable within the framework you are thinking about?

Thanks ... Jeff

from isar.

richard457 avatar richard457 commented on June 2, 2024 1

https://github.com/ljwagerfield/crdt @leisim

from isar.

simc avatar simc commented on June 2, 2024

Also a very interesting approach is the Gun database.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Speaking as a (prospective) user, optionally allowing users handle sync conflicts is probably a good idea.

The debate here is whether automatic conflict resolution and CRDTs (essentially for database sync) is worth the pain relative to manual conflict resolution.

I'll read that paper that you posted and come back with further thoughts.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

It's also worth considering the options for future development that CRDTs would provide.

Given that a large percentage of Hive's current users use it as a persistent cache, and that is also the suggested usage of LMDB and IndexedDB, interoperability with foreign databases is high on the list of priorities.

We could also write integrations (or have third-party integrations) into compatible databases. Mongo, Redis, and more support the CRDT style architecture.

from isar.

simc avatar simc commented on June 2, 2024

@AKushWarrior Yes, I agree. Strong eventual consistency would allow us to provide "batteries included" integrations without user interference or manual conflict resolution.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

On the other hand, this architecture may introduce problems if the database is not compatible with CRDTs. Given that this is kind of a weedy implementation detail that's not clearly evident to the end user, CRDT might hurt Isar's goal of being user and platform agnostic.

from isar.

cachapa avatar cachapa commented on June 2, 2024

I have a HLC + CRDT implementation in dart here: https://github.com/cachapa/crdt
Though I suspect that you probably want to implement it in Rust, it might still be interesting as a reference.

Please note that the current implementation does not include node IDs. I'm working on that on branch 2.0.0 - they're relevant in order to perform deterministic conflict resolution.

from isar.

simc avatar simc commented on June 2, 2024

this architecture may introduce problems if the database is not compatible with CRDT

I think we will have to implement the database integration anyway since there is no standardized protocol to distribute changes across the network. It should be possible to do that with almost any backend database.

I have a HLC + CRDT implementation in dart

Very cool! I'll definitely take a look πŸ‘

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

I think we will have to implement the database integration anyway since there is no standardized protocol to distribute changes across the network.

Unfortunately true.

On another note, we should look into rust implementations of CRDTs.

from isar.

simc avatar simc commented on June 2, 2024

Great talk on the topic.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Looks interesting. I'll watch it...

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Finished the talk. I wonder if those papers in the references have pseudocode we can use?

from isar.

simc avatar simc commented on June 2, 2024

The paper I linked before has a reference implementation and a port to rust.

But we will probably need to come up with a mix of existing solutions. The automerge algorithm for example keeps the entire change history which is not suitable for mobile devices with limited resources.

from isar.

joe-getcouragenow avatar joe-getcouragenow commented on June 2, 2024

THis discussion is really good !!

from isar.

joe-getcouragenow avatar joe-getcouragenow commented on June 2, 2024

Hey guys,

recently I was exploring how CRDT can be utilized to build a sync layer. So I did a university project written in Dart. The main purpose was to get some insides of how CRDT works etc and I still have a lot to learn :): If you are interested what I did here is the project: https://github.com/Manuelbaun/sync_layer_crdt_playground

I used some form of delta state CRDT with HLC and basically used them as events when any updates occurred. Updates looked like this in pseudocode:

class Atom {
    ID, //  siteID, hybrid logical clock
    type, // like the table or object 
    ObjectID, 
    data, // any form of data usually a kv map
}

To emulate tables I used a "CRDT Map" (Grow only), where the key represents the column. Note that my implementation does not support to remove keys from a CRDT map.

I was watching the talk from James Long and that is who I designed my CRDT sync layer after. I used a Merkel trie to diff two nodes, and then send all events that happened after that time to the other nodes, so they sync.

What I did not look into, how garbage collection could work. Still on my list though. And there are a lot of things to do, to make a sync layer actually work well.

Great talk on the topic.

The talk from Martin Kleppmann was very informative, how automerge achieved that great compression for the text editing.

For another project I used yjs with WebRTC (because at the time it had the best performance) to make a simple P2P drawing game. While WebRTC is great in theory, it is not working everywhere. On my university campus, I couldn't make it run. So peers would not find each other. But it worked perfectly at home.

This is normal. WebRTC will not work if either party is behind a Symmetric Router. See: https://superuser.com/questions/1525664/how-to-fix-symmetric-nat-router-router

Most Governments runs Cisco stuff and its Symmetric. i see this all the time.
https://community.cisco.com/t5/other-network-architecture/difference-between-asymmertic-vs-symmetric-routing/td-p/50978

The solution used by Google and everyone is to have a Relay Proxy for when one party is behind a Symmetric Router.
This a standard part of a WebRTC Setup and called a SFU ( Selective Forwarding Unit )
This works well: https://github.com/pion/ion-sfu
https://webrtcglossary.com/sfu/

from isar.

simc avatar simc commented on June 2, 2024

@joe-getcouragenow Thanks for all the resources and topics to think about. I'll take me some time to explore all of them and properly respond.

Once you have a DB with a WAL, then you can do CRDT in general

Why does the database need a WAL? Isn't any kind of transaction mechanism sufficient?

You need this type of Kappa style architecture because a CRDT mutation can result in many buckets in your KV store mutating once you build real work apps

LMDB is incredibly fast and should be able to handle the log.

Often it is better to have a relay Server

Yes I agree but I still hope that we can create a system that can rely on pure p2p if necessary. These networks will very small for example a local WiFi or a Bluetooth mesh network. If internet connection is available, a server or cluster can be used.

I have been working on this and ended up going down the path of not using Dart

I agree but the problem with WASM is that is can be VERY slow when heavy interaction with JavaScript or in our case cross-compiled Dart is required. WASM, for example, cannot directly access IndexedDB.

"The raw execution of an algorithm in WASM is almost always faster than in JavaScript. However, the cost of writing data into the WASM module’s memory can be so high that it removes the benefit of using WASM in the first place."

There are multiple proposals to solve this problem but it will take years until these solutions will be implemented in the majority of browsers. Especially with Apple taking forever to implement new WASM features in Safari.

Garbage collection being one of the big ones.

This is exactly the problem I'm currently trying to find a solution that works with mobile devices and their limited resources.

You do not have to use Dart FFI, and neither shoudl you because there are many things in Dart FFI that do not work so well.

How else could we communicate with the Rust backend? I hope that most of the very bad problems like finalizers dart-lang/sdk#35770 and copy-free Strings dart-lang/sdk#39787 will be solved until Isar is released.

FLutter Method Channel Binding

They are SLOOOW :(

from isar.

simc avatar simc commented on June 2, 2024

@Manuelbaun Thanks, this looks very interesting. We'll probably still end up implementing it in Rust and JS respectively.

from isar.

joe-getcouragenow avatar joe-getcouragenow commented on June 2, 2024

@joe-getcouragenow Thanks for all the resources and topics to think about. I'll take me some time to explore all of them and properly respond.

Once you have a DB with a WAL, then you can do CRDT in general

Why does the database need a WAL? Isn't any kind of transaction mechanism sufficient?

Well it depends what your building. In the README there is not actual definitive statement of the thing your building so i had to guess a bit.

SO sometimes, on the Server you want a WAL that then takes the Event Stream and materialises your KV Tables ready for non CRDT usrs to be able to do a RPC query on that. Its CDC ( Chaneg Data Capture) but in reverse. SO its essentially KAPPA.

On the Client in a P2P situation you want the same because somewhere with Event Streaming you want a place where all mutations come in before the CQRS code takes over to do something with that event.

This is what signal and Cloudstate and others do. Not everyone does this though. It really depends. I think that not doing it is asking for trouble. Look at any DB and the first thing is does is put all mutations into a WAL so that if the power goes, and then you restart the device the WAL is there and the DB can play back the last few writes into the DB.

You need this type of Kappa style architecture because a CRDT mutation can result in many buckets in your KV store mutating once you build real work apps

LMDB is incredibly fast and should be able to handle the log.

Often it is better to have a relay Server

Yes I agree but I still hope that we can create a system that can rely on pure p2p if necessary. These networks will very small for example a local WiFi or a Bluetooth mesh network. If internet connection is available, a server or cluster can be used.

Ah so what Berty are doing basically. https://github.com/berty/go-ipfs-log
They use Bluetooth and local WIFI. Telegram also does this.

I have been working on this and ended up going down the path of not using Dart

I agree but the problem with WASM is that is can be VERY slow when heavy interaction with JavaScript or in our case cross-compiled Dart is required. WASM, for example, cannot directly access IndexedDB.

For me WASM is screaming fast, when you not touching the DOM which i dont. You can use a Service worker an access the IndexDB.
See: https://developers.google.com/web/ilt/pwa/live-data-in-the-service-worker
Numerous examples on the web....

"The raw execution of an algorithm in WASM is almost always faster than in JavaScript. However, the cost of writing data into the WASM module’s memory can be so high that it removes the benefit of using WASM in the first place."

Can you let me knwo where you got that quote from ? Often its a matter of Context, and am curious of their context.

There are multiple proposals to solve this problem but it will take years until these solutions will be implemented in the majority of browsers. Especially with Apple taking forever to implement new WASM features in Safari.

YEAH Safari is the one holding it back a bit. Microsoft using Chrome as a god send, and Firefox is pretty close to Chrome on Supporting PWA and WASM. WASM does work with Safari on Desktop and IOS now. I honestly have no tested it that much though.
https://caniuse.com/#feat=wasm

Garbage collection being one of the big ones.

This is exactly the problem I'm currently trying to find a solution that works with mobile devices and their limited resources.

You do not have to use Dart FFI, and neither shoudl you because there are many things in Dart FFI that do not work so well.

How else could we communicate with the Rust backend? I hope that most of the very bad problems like finalizers dart-lang/sdk#35770 and copy-free Strings dart-lang/sdk#39787 will be solved until Isar is released.

FLutter Method Channel Binding

They are SLOOOW :(

Yeah i agree Method Channel is slow. But when you do the tradeoffs you have a few things to consider:

  1. How often are calls between the flutter and native layer happening ? Are you using Hive as a Cache of Materialsied View ? your could.
  2. Developer Productivity
  3. You will need to make network calls ( p2P and Client Server) and Protobufs will make that easy and fast.

https://github.com/google/note-maps/blob/main/note_maps/lib/mobileapi/repository.dart#L207
They have a 100% golang based embedded layer with a golang DB and Protobufs.
The use Protobufs as their transport FLutter <--> Embedded, Embedded <--> Server.

Now when you are later doing P2P connections your are going to want to be using Protobufs also i posit.

So the protobufs are used fo the 3 reasons protobufs are always used. Code Gen, Speed ( runtime and dev time) and Evolution ( not breaking against version).

FFI are faster and maybe you can use them with Protobufs ? Link a PB <--> FFI marshaller.

Hard to say as this is getting very specific. I bet someone has looked into it :)
https://github.com/ajrcarey/dart-rust-minimal-ffi-grpc-example

from isar.

simc avatar simc commented on June 2, 2024

Can you let me knwo where you got that quote from ? Often its a matter of Context, and am curious of their context.

https://blog.sqreen.com/webassembly-performance/

The context is communication with JavaScript or any Browser API. WASM cannot even do network calls without sending the data through JavaScript afaik.

You will need to make network calls ( p2P and Client Server) and Protobufs will make that easy and fast

But network calls will be made by Rust anyway. It is not important whether Dart communicates via Dart > Rust or Dart > Java > Rust.

FFI are faster and maybe you can use them with Protobufs

Protobuf / Cap'n Proto etc. can be used by Rust anyway.

from isar.

Manuelbaun avatar Manuelbaun commented on June 2, 2024

@Manuelbaun Thanks, this looks very interesting. We'll probably still end up implementing it in Rust and JS respectively.

yeah, I also would not recommend to implement it in Dart. It was ok for a prototype. Before I started to implement it, I was thinking of doing it in Rust, since there are already some CRDTs implementations to use.

I regard to the garbage collection of crdt:

the author of yjs, Kevin Jahns, did implement a snapshot feature. I have not looked into it, how he did it, but you can see it on his website https://yjs.dev/ under demos in action. This might be a way to garbage collect, the log of mutations on the database.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

I think something that we should consider is forking automerge and seeing how integrated/easy to remove the change history is. As far as I can tell, it seems to be the foremost technology in the field of CRDTs.

I have no comment on the architectural (rust ffi vs WASM) stuff yet, I need to read up a bit.

from isar.

simc avatar simc commented on June 2, 2024

the author of yjs, Kevin Jahns, did implement a snapshot feature.

Thanks, I'll check it out.

seeing how integrated/easy to remove the change history is

Unfortunately this is more of a conceptual problem then a coding problem. Automerge and most other CRDTs use the log to guarantee divergence in the case of network partition (e.g. a peer goes offline and commits concurrent changes)

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Automerge and most other CRDTs use the log to guarantee divergence in the case of network partition (e.g. a peer goes offline and commits concurrent changes)

Maybe I'm missing something here. Why would we want to "guarantee divergence"? Isn't the whole point to reconverge when the peer goes back online?

from isar.

simc avatar simc commented on June 2, 2024

Isn't the whole point to reconverge when the peer goes back online?

Sorry, avoid divergence, guarantee convergence :D

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Sure, but there's no need for a permanent log. Just a log of every change since the last convergence event, right?

from isar.

Manuelbaun avatar Manuelbaun commented on June 2, 2024

If your app only communicates with a server to sync, then I think it is ok to only persist the logs since the last sync, and after successful sync, you can delete those logs.

It gets a bit more tricky if you want to support a P2P setup. If several nodes, let's say A, B, C mutate data offline and then node A syncs with Node B. Node A and B are in the same state now. If they would now delete the logs of every change and Node C comes and want to merge. Node C sends his log to A and B, and they can apply the changes from C and how would you sync now Node C? You can't get any logs from A or B since they deleted those logs already. The only way I can think of right now is to send the whole state, form either A or B to C, and basically override your local state. There are probably other ways to solve this too.

So the question is, when is it ok to delete the logs? If you can guarantee that all nodes in a system received the logs of changes, then those logs can be deleted.

from isar.

simc avatar simc commented on June 2, 2024

So the question is, when is it ok to delete the logs? If you can guarantee that all nodes in a system received the logs of changes, then those logs can be deleted.

Maybe a good compromise would be to introduce a parameter Duration keepHistoryFor which could be either null for indefine storage, 0 for immediate deletion after sync to server or something else for p2p. The developer probably knows the situation best and if there is a good explanation in the docs, it could work well...

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

@Manuelbaun I hadn't thought about P2P at all. You're right, of course, but I think that Simon has found a happy medium.

@leisim As long as it's very clear that changes will be wiped after that duration, I agree.

It's important that the default is keep the changes indefinitely though, to avoid unpredictable behavior when syncing. I think that an optional parameter in dart (they default to null, which "represents" indefinite storage) will take care of that for us.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Okay, I think we have a (general) plan. @leisim could you review when you have a minute?

TODO:

  • Fork https://github.com/automerge/automerge/tree/performance to support history wiping
  • Create Rust library for automerge (no mature ones are available)
  • Provide hooks for commit/sync using the above libraries
    • We should do this in the Dart layer, since sync should be a unified interface

from isar.

simc avatar simc commented on June 2, 2024

Almost complete collection of research on CRDTs

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Almost complete collection of research on CRDTs

Very cool. I'm reading through the papers now...

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

@leisim done reading through them. I haven't really been enlightened or anything, but I might be able to parameterize this issue better now. As I can see it, there is a few use cases:

  • Merge an Isar-Web db with an Isar-Native db
  • Merge an Isar-Web db with an Isar-Web db
  • Merge an Isar-Native db with an Isar-Native db
  • Merge an Isar-Web/Native db with arbitrary foreign db

The last one is by far the most code-heavy: we'll need to write drivers for any foreign db we want to sync with, or expose hooks for users to do it themselves. The other three will be a matter of porting a CRDT library to the platforms we're using, and editing to our needs.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

@leisim thoughts here? I'd be happy to start writing a port of automerge, in either Dart or Rust.

EDIT: On further thought, we can't really effectively abstract this. It has to be done in Rust and TS.

from isar.

AndryHTC avatar AndryHTC commented on June 2, 2024

Thanks to everybody is doing efforts for this new thrilling project.

@omidraha I've done a similar Sync Engine with completely personalized back-end and front-end. My solution is a little bit more comprehensive, but also more complicated and counterintuitive (because of the nature of the system), so I will continue the discussion with your solution.

I have a few questions though:

  • How does the backend know what to transmit on pull? Does the backend transmit the entire DB on pull and the client applies only the synced data? Would be an enormous waste of syncing time/bandwidth/client space/client activity/server activity.
  • Said that, what about historical data? Should the sync engine delete from client the cached data after some time that is not queried (so both read/write)?
  • What about conflicts in the same row? Could be that multiple offline users change the same row in multiple different columns. So, when syncing, the changed columns of those multiple rows should be kept and merged together. Should the merge action be done in the server instead that on the client?

I know that the target is doing the synchronization in the client-side, but maybe is not possible having a reliable Syncronization without an adaptor in the back-end. Do AWS AppSync works for the need?
Just for knowledge, an analysis of Firestore synchronization could be useful.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

I have a few questions though:

  • How does the backend know what to transmit on pull? Does the backend transmit the entire DB on pull and the client applies only the synced data? Would be an enormous waste of syncing time/bandwidth/client space/client activity/server activity.

We would record the changes to the database, and transmit the change history.

The change history, in our model, can optionally represent only the changes in the last (user-defined period of time). This helps limit the amount of data transferred over networks.

  • Said that, what about historical data? Should the sync engine delete from client the cached data after some time that is not queried (so both read/write)?

Not an architectural decision that should be made by a client side database. The user is free to implement this functionality if they wish.

  • What about conflicts in the same row? Could be that multiple offline users change the same row in multiple different columns. So, when syncing, the changed columns of those multiple rows should be kept and merged together. Should the merge action be done in the server instead that on the client?

This database will support direct P2P merging. There doesn't have to be a "central" database. For more, see operational transactions vs CRDTs. Every client will have an algorithm that can merge two conflicting rows in a symmetric fashion.

I know that the target is doing the synchronization in the client-side, but maybe is not possible having a reliable Syncronization without an adaptor in the back-end. Do AWS AppSync works for the need?

We plan on writing adaptors for various backends.

Just for knowledge, an analysis of Firestore synchronization could be useful.

Agreed!

from isar.

AndryHTC avatar AndryHTC commented on June 2, 2024

We would record the changes to the database, and transmit the change history.

The change history, in our model, can optionally represent only the changes in the last (user-defined period of time). This helps limit the amount of data transferred over networks.

OK So, I know you plan to transfer only the changes but... I mean another thing... Let me be more clear:
What about the data that are unuseful for the clients? This still could be a problem with projects with very complicated business domains: Let's say that there are tables with 10+ millions of records, with thousands of changes every minute... would the sync engine still sync ALL the changes to keep eventual consistency? Or would the sync engine sync only the changes related to the client used data/queries instead (like Firestore does)?

If all the changes are synchronized, even if you only apply the changes on the used data/queries the entire 10+ million record table's changes are still transferred over the network (with possibly a security issue when constantly receiving sensible data not related to the user, even if not used in the client-side), and processed for applying the changes with consequently battery drain, data usage, performance issues.

Not an architectural decision that should be made by a client-side database. The user is free to implement this functionality if they wish.

Sure... but I think this could still be a "native feature" of the DB (again, as Firestore), as in an offline-first mobile app storage usage is still one of the main concerns

This database will support direct P2P merging. There doesn't have to be a "central" database. For more, see operational transactions vs CRDTs. Every client will have an algorithm that can merge two conflicting rows in a symmetric fashion.

Wow! I didn't go deep in the topic, I will have a more detailed look soon

We plan on writing adaptors for various backends.

Great, maybe also something like an SDK to let someone build his own solution with legacy back-end, could be useful also.

I've used Firestore as example too many times! πŸ˜„ But I think the team has done a great job optimizing synchronization especially for low-end devices

Thank you very much

from isar.

simc avatar simc commented on June 2, 2024

@leisim thoughts here?

I'm still experimenting with different solutions... I have no definitive ideas yet.

In my opinion, it would be great if this feature be something plugin-able, so that the server side can be written for any language and database, for example for Python/Django with PostgreSQL, and we have plugins for each of them.

I completely agree.

How does the backend know what to transmit on pull? Does the backend transmit the entire DB on pull and the client applies only the synced data?

This could for example be solved using vector clocks so the client can tell the backend its last state and the backend can only send the changes.

Should the sync engine delete from client the cached data after some time that is not queried

I don't think this should be part of the implementation but rather be handled by the user. Maybe we can add the option for a TTL for entries?

Should the merge action be done in the server instead that on the client?

With strong eventual consistency, there should be no need for a server that resolves the conflicts. Conflicts will be resolved on a property level. An interesting question is how to guarantee inter-property consistency.

What about the data that are unuseful for the clients?

I suggest some kind of pub sub functionality. This would also allow easy access control.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

@leisim I agree that pub-sub is a cool (and probably necessary) addition here. I'd suggest another term: selective sync.

My reasoning is that pub-sub has stream processing implications, which many people have tried (and failed) to implement efficiently on top of relational databases. We aren't really providing data streams to subscribers; instead, we are allowing CRDT merging on some attributes instead of the whole table. Thus, a client isn't really "subscribing" to anything, because the client still has to initiate one-off synchronization events.

Obviously, this is kind of semantic, but might be important for future documentation purposes.

from isar.

simc avatar simc commented on June 2, 2024

I'd suggest another term: selective sync.

Could you please briefly explain what you mean by this? Are there resources on this topic?

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

I'd suggest another term: selective sync.

Could you please briefly explain what you mean by this? Are there resources on this topic?

@leisim I totally made the term up, as an alternative to pub-sub. In my mind, what this looks like is:

  • the user passes a where clause
  • The where clause is run on both the foreign and local database; the results of both are merged using CRDT.
  • The merge then is written over the correspondent rows in the local database (which have been temporarily earmarked by the original where clause)

This is a pretty simple approach, and will probably fail in some more complex cases (relationships). It's also inefficient if we have a highly columnar database (where individual rows contain a lot of data).

I don't know if there's literature on the topic; my argument was more semantic in that we should call whatever we do selective sync instead of pub-sub.

from isar.

medmedmedic avatar medmedmedic commented on June 2, 2024

Hi this not a place for such a comment, but I guess you can delete this after reading as unrelated. I am a medical teacher and newbie to programming started with flutter. Excited to see the solution to some of my problems

You are solving a genuine problem. So far I have not found a query efficient database in a flutter. Medical student's use-case is searching through a large textual database, even with embedded indexing sometimes it takes too long to search through the medical database.

And looks like you are also solving another problem of "connectivity", it is a big issue in our college. Slide sync esp. like sli.do don't work well, maybe because of dependency of the central server, P2P might be the solution I hope.

Also P2P sync, this helps much more opening new scenarios for peer gamification of study content

I think you are solving a genuine problem, Do you have a todo list page, so all actionable pieces are in one place and someone from outside can recognize the need at the point of time and try to be a part.

I wish I could contribute to this project, I will try my best, I hope with my non-technical background I can do something at least.

from isar.

AKushWarrior avatar AKushWarrior commented on June 2, 2024

Hi this not a place for such a comment, but I guess you can delete this after reading as unrelated. I am a medical teacher and newbie to programming started with flutter. Excited to see the solution to some of my problems

You are solving a genuine problem. So far I have not found a query efficient database in a flutter. Medical student's use-case is searching through a large textual database, even with embedded indexing sometimes it takes too long to search through the medical database.

And looks like you are also solving another problem of "connectivity", it is a big issue in our college. Slide sync esp. like sli.do don't work well, maybe because of dependency of the central server, P2P might be the solution I hope.

Also P2P sync, this helps much more opening new scenarios for peer gamification of study content

I think you are solving a genuine problem, Do you have a todo list page, so all actionable pieces are in one place and someone from outside can recognize the need at the point of time and try to be a part.

I wish I could contribute to this project, I will try my best, I hope with my non-technical background I can do something at least.

Thanks for the kind words! Until the underlying P2P architecture is solved, the database itself can't be written, so this project is on hold until a plan is decided regarding this issue. There is a TODO.md in this project, which you can look at.

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim I think having abstractions for synchronization would be wonderful. GRPC, GraphQL, Websockets, Swagger, and more. Will have a choice on the developer side.

from isar.

simc avatar simc commented on June 2, 2024

@listepo I agree 100% but it's hard to find an API that's flexible enough and not too complicated. Do you have a suggestion?

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim I have a couple of ideas to think about, I hope there will be something to discuss

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim I would like to have isar relationships before going into sync detail.

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim what do you think about https://nozbe.github.io/WatermelonDB/Advanced/Sync.html ?

from isar.

simc avatar simc commented on June 2, 2024

@listepo Sorry I must've missed your previous posts.

I have a couple of ideas to think about, I hope there will be something to discuss

Great let's definitely do that. I need input and ideas because I want to make sync as universal as possible.

I would like to have isar relationships before going into sync detail.

Done :) IsarLinks are implemented

what do you think about https://nozbe.github.io/WatermelonDB/Advanced/Sync.html ?

That looks very interesting and I like the fact that developers can implement the actual sync themselves. We could provide examples for common backends. The only issue I see here is that conflict resolution is hard.

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim we can detect conflicts and provide an opportunity to resolve them by the method resolveConflicts. Manual resolved results or current, incoming, abort changes like merge conflicts:)

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim what do you think?

from isar.

simc avatar simc commented on June 2, 2024

@listepo Could you maybe share a small code sample how that would work?

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim yes, I will try to find time this weekend

from isar.

listepo avatar listepo commented on June 2, 2024

@leisim

// Part of Isar sync
class Resolver<T> {
  List<T> current;
  List<T> incoming;
  void acceptCurrent() {
    // Force push changes to the backend
  }
  void acceptIncoming() {
    // Force push changes to the db
  }
  void accept(List<T> data) {
    // For manually resolving
    // Force push changes to the db and backend
  }

  void abort() {
    // Abort synchronization
  }
}

// Implemented by the user and called when conflicts detected
void resolveConflicts(Resolver resolver) {
  resolver.acceptIncoming();
  // or show ui for user
}

from isar.

listepo avatar listepo commented on June 2, 2024

link with models

syncAdapter.addResolver([User, City], resolveConflicts);

from isar.

omidraha avatar omidraha commented on June 2, 2024

Do you know what method Telegram uses for syncing data between clients and cloud?
Its knowledge may be useful.

Related package:
https://github.com/mobync/flutter-client

from isar.

AndryHTC avatar AndryHTC commented on June 2, 2024

https://jepsen.io/consistency/models/causal
https://doc.replicache.dev/design

Replicache handles it in a server-client environment. I know you're trying to achieve peer-to-peer bidirectional sync... But having a look to the implementation (and to the APIs) could bring new ideas to how to handle this complexity ;)

from isar.

bgervan avatar bgervan commented on June 2, 2024

Btw, the class inheritence would work as the above example?

from isar.

bgervan avatar bgervan commented on June 2, 2024

@leisim Is there any workaround for class inheritance? Is there plan for creating BaseCollection decorator? Currently, an abstract class' fields is not appear in the generated file. The links could be a workaround maybe, but seems impractical

from isar.

AndryHTC avatar AndryHTC commented on June 2, 2024

Is this still a WIP?

from isar.

richard457 avatar richard457 commented on June 2, 2024

Wanted to add my comments, I like this initiative, few things I wish to see a way of authenticating data that lands to a user device something like https://docs.couchbase.com/sync-gateway/current/channels.html to be in mind in the first place, also I don't know but would be nice for sync to be open-source too!

from isar.

richard457 avatar richard457 commented on June 2, 2024

Wanted to add my comments, I like this initiative, few things I wish to see a way of authenticating data that lands to a user device something like https://docs.couchbase.com/sync-gateway/current/channels.html to be in mind in the first place, also I don't know but would be nice for sync to be open-source too!

This is like to say that specific data will be pulled by specific users, not all users, etc...

from isar.

richard457 avatar richard457 commented on June 2, 2024

Check this https://condensation.io/

Also is open source https://github.com/CondensationDS/Condensation

from isar.

richard457 avatar richard457 commented on June 2, 2024

I like that it has Peer-to-peer synchronisation which I believe in my opinion adding to isar will add more value than the competitor.

from isar.

jamieastley avatar jamieastley commented on June 2, 2024

There are multiple proposals to solve this problem but it will take years until these solutions will be implemented in the majority of browsers. Especially with Apple taking forever to implement new WASM features in Safari.

Just noticed that Isar fails to load when opening my Flutter Web site in Safari with
Can't find variable: BroadcastChannel

... before opening a new issue- is this expected and related to the lacking WASM features mentioned above?

from isar.

richard457 avatar richard457 commented on June 2, 2024

Also, I think the relay Server can be built using https://condensation.io/

from isar.

richard457 avatar richard457 commented on June 2, 2024

https://wiki.nikitavoloboev.xyz/distributed-systems/crdt

https://github.com/alangibson/awesome-crdt

More information about CRDT.

I am really waiting for this, reading stories like https://www.figma.com/blog/rust-in-production-at-figma/ make me believe that this will be dope

from isar.

simc avatar simc commented on June 2, 2024

@ojm-it I also like it very much. The big question is how we handle schema changes. Do you have an idea?

How should we handle remote changes for example that contain a field that the local schema does not have yet πŸ€”

from isar.

CodingSoot avatar CodingSoot commented on June 2, 2024

@ojm-it I'm not fully caught up with this entire thread, but the original comment talks about peer to peer connections being possible. What happens in this case?

From the "General design" section :

  • master/replica - server is the source of truth, client has a full copy and syncs back to server (no peer-to-peer syncs)

  • two phase sync: first pull remote changes to local app, then push local changes to server

  • client resolves conflicts

  • content-based, not time-based conflict resolution

  • conflicts are resolved using per-column client-wins strategy: in conflict, server version is taken except for any column that was changed locally since last sync.

  • local app tracks its changes using a _status (synced/created/updated/deleted) field and _changes field (which specifies columns changed since last sync)

  • server only tracks timestamps (or version numbers) of every record, not specific changes

  • sync is performed for the entire database at once, not per-collection

  • eventual consistency (client and server are consistent at the moment of successful pull if no local changes need to be pushed)

  • non-blocking: local database writes (but not reads) are only momentarily locked when writing data but user can safely make new changes throughout the process

from isar.

CodingSoot avatar CodingSoot commented on June 2, 2024

@richard457 The WatermelonDB sync implementation relies on the fact that there is a single source of truth which is the server. This greatly simplifies things like conflict resolution, specially when different clients can have different versions of the database.

I have never used WatermelonDB before (I don't know React), but I digged into the implementation details of their syncing system, and it seems that Isar has everything needed to implement a similar system.

Peer-to-peer is a completely different sync mechanism with a different set of challenges, so I guess it could be implemented separately.

Edit: typo

from isar.

richard457 avatar richard457 commented on June 2, 2024

Drift

can you elaborate more on Drift?

from isar.

richard457 avatar richard457 commented on June 2, 2024

I personally think even though Peer-to-peer is complex can give us an edge if it is something we start with. in this case we can beat both https://www.couchbase.com/ and https://objectbox.io/ so I don't see this feature as just a future I see it as something that will give this database a competitive advantage.

from isar.

bgervan avatar bgervan commented on June 2, 2024

I personally think even though Peer-to-peer is complex can give us an edge if it is something we start with. in this case we can beat both https://www.couchbase.com/ and https://objectbox.io/ so I don't see this feature as just a future I see it as something that will give this database a competitive advantage.

Objectbox is closed, I didn't get access in a year, but maybe because that is not per user sync (didn't get back to me). P2P is not that important, as a backend sync imo and honestly not even sure the DB needs a feature like this. Sure it's good to have that, but for now, the inheritance feature would at least allow to come up plugins for Isar which does that. CRDT requires fields, which in this case is impossible to define into a base class.

For P2P and Backend solutions, the solution needs to be flexible, so everyone can implement in their favorite/already used llang or framework. So

  • Inheritance
  • Push and pull functions - to get data to push and save data from pull
  • Migration helper functions
  • Built-in CRDT support - if we have this, inheritance is not mandatory

would be a good start point, the community can help implement the backend side with Rest api/graphql/websocket syncs. And with some framework maybe the p2p sync is possible too. (You need backend to connect the 2 phone to each other anyway)

from isar.

simc avatar simc commented on June 2, 2024

but for now, the inheritance feature would at least allow to come up plugins for Isar which does that

What do you mean by "inheritance feature"? Isar already supports inheritance of collections

Push and pull functions

Do you mean like watermelondb?

Migration helper functions

What exactly are you missing?

Built-in CRDT support

How do you envision this to work? What should Isar provide?

from isar.

richard457 avatar richard457 commented on June 2, 2024

I didn't know it already support inheritance. It wasn't a year ago. Let me check.

If Isar aims to support sync, then these are the bricks first need to be implemented.

Yes, similar what watermelon or couchbase does.

By CRDT support, I mean, the object could have fields built-in, for time sync. And on flutter side, it could be updated respectively as it needs. If we save an object the save could update those fields as it should be by definition. The fields I mean are: Object ID, HybridLogicalClock (Timestamp, device id, counter). But of course with inheritance we can define base class to have these fields, and can have save override, push, pull functions easily. All depends on the vision how built-in support you plan.

From the backend side, there are many way it can be done - one possible way is - Rest APIs with websocket support to listen changes real time. If you have plans to implement the sync in Isar, I can create a backend in Django, with socket support. The DB on the backend could store the objects as json in Postgre, and the above fields separately, to have the efficience filtering.

Could this enable Peer-to-peer sync @bgervan ?

from isar.

ThalesVonMilet avatar ThalesVonMilet commented on June 2, 2024

Hi so I've read the discussion and i m gonna be honest i didn't understand everything but a lot of it sounded for me like over engineering. I personally think that the way WatermelonDb manages it is completely fine and easy to implement. If you agree on that with me I work on an Offline Support for my Appwrite backend and it works but help would be much appreciated so if somebody is interested please join :)
https://github.com/ThalesVonMilet/offline_support_for_db

from isar.

ThalesVonMilet avatar ThalesVonMilet commented on June 2, 2024

Hi so I've read the discussion and i m gonna be honest i didn't understand everything but a lot of it sounded for me like over engineering. I personally think that the way WatermelonDb manages it is completely fine and easy to implement. If you agree on that with me I work on an Offline Support for my Appwrite backend and it works but help would be much appreciated so if somebody is interested please join :)
https://github.com/ThalesVonMilet/offline_support_for_db

I agree there is some very complex solutions here.

I think for most use case they are too complex. But I also think your proposal is to simple. If I understand correctly you are using a flag to detect if record was changed. But for example for my use case this is not enough. That's why I proposed to to log the transactions. That way you can either transmit the whole record or just the parts that changed. In your solution modifications on two clients would lead to a conflict even if they changed different parts of the record.

Your absolutely right with the last part I wanted to solve that by adding a change log the other way would be to add for each field a extra last updated field but I think the change log is much better.

from isar.

jonaswre avatar jonaswre commented on June 2, 2024

Interesting Idea.
Depending on your needs have a look at object box. That might serve you better.
Or if you are not bound to flutter look at gundb or orbitdb maybe they fit better in a P2P.

from isar.

xVemu avatar xVemu commented on June 2, 2024

Hi so I've read the discussion and i m gonna be honest i didn't understand everything but a lot of it sounded for me like over engineering. I personally think that the way WatermelonDb manages it is completely fine and easy to implement. If you agree on that with me I work on an Offline Support for my Appwrite backend and it works but help would be much appreciated so if somebody is interested please join :) https://github.com/ThalesVonMilet/offline_support_for_db

What happened with repo?

from isar.

richard457 avatar richard457 commented on June 2, 2024

This https://pocketbase.io/ can also help @simc

from isar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.