Giter Club home page Giter Club logo

replicache-old's Introduction

replicache-old's People

Contributors

aboodman avatar arv avatar frederikhors avatar phritz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

replicache-old's Issues

upstream sync / client followups

  • 1. fix sync return value rocicorp/replicache-client#87 (review)
  • 2. add replay to the sync_integration_test
  • 3. limit batch push payloads to 4MB. we should not accept a mutation on the client that is > 4MB and we should stop adding pending mutations to the push payload and send it when it is >= 4MB. and we need this in the docs.
  • 4. In rocicorp/replicache#30 (comment) i proposed some invariants that we should check during sync to reduce the chances that we have a bug / data corruption. i had planned to do that here https://github.com/rocicorp/replicache-client/blob/8defd6d1ad365de4de3e34b2c5bed2c014ff3408/db/sync.go#L130
  • 5. fix sandbox auth, which is hardcoded
  • 6. per discussion here rocicorp/replicache#8 separate the role of clientid into two separate things
  • 7. add timeouts to all outgoing network connections. super important! the http client default is no timeout. (this is one of those things that had we chose to lock during sync could have killed us.)
  • 8. take a pass through designdoc and update anything that has changed

mutate() - write transactions

This bug builds on #27 to add mutations.

Overview

var rep = Replicache(...);

var createTodo = rep.register('create-todo',
  (WriteTransaction tx, dynamic args) async {
    var id = args['id'];
    var listId = args['listId'];
    var text = args['text'];
    var complete = args['complete'];
    var key = '/todo/$id';
    if (!await tx.has(key)) {
      await tx.put('/list/$listId', {
        title: 'Untitled List',
      });
    }
    await tx.put('/todo/$id', {'text': text, 'order': order, 'complete': complete});
  },
);

// Modifies the cache locally and queues the remote write for next sync.
await createTodo({
  'id': uuid.new(),
  'listId': 42,
  'text': 'Take out the trash',
  'order': 0.5,
  'complete': false,
});

Concurrency

The local half of mutations execute against the cache using optimistic locking of the entire cache provided by the underlying Noms database. If the optimistic lock fails, the mutation is retried internally until the optimistic lock succeeds.

There should be some limit to the number of times a transaction will be retried to prevent degenerate iloop cases. Maybe 10?

Details

  • The result of executing a mutation is a new pending transaction in the underlying Noms database that records the mutation name and args
  • During sync, Replicache may re-execute registered transactions by name. Users must ensure that any registered transactions are registered across application sessions.

Publicly Available Product v0

This bug describes the big picture and motivation behind Milestone 1.

(It's here because Milestones can't have rich text, grr)

I believe that our next major product milestone should be a public announcement of a working product that anyone can download and try.

Why is a public announcement important? The majority of attention from our marketing announcement came through social media (by far). The vast majority of the sign-ups were people we didn't have any connection to personally. So reaching out to contacts to try Replicache individually is naturally extremely limited. Especially given that the effort to try Replicache is relatively high. The probability of turning any individual contact into a sale is low. We need broad reach.

What is required for a public announcement? It's foggy, but it should intuitively be "a complete product". Or using the popular metaphor, it should be a scooter:

The Skateboard Philosophy of Product Design

What is missing for Replicache to be a scooter? This is a bit subjective, but to me, it feels like:

  • Accounts, so anyone can try it by themselves, in production, without having to setup infrastructure
  • License and pricing, so people know what they're getting into
  • Simple client-side GC, so Replicache is deployable
  • Bidirection sync (write transactions and push), because that's the fundamental promise of Replicache

At that point, Flutter mobile app developers could try Replicache by themselves to get simple bidirectional offline sync. There would still be many, many things left to get to a car:

  • Push, to get responsiveness up higher
  • Delta support for client view
  • Larger payloads for client view
  • Bindings for other environments
  • Dramatically smaller client side binary
  • Indexes in the client
  • ... etc ...

This milestone associated with this bug will include these tasks and any others we identify as being prerequisites for a public announcement.

Refresh the Replicache homepage

  • We no longer have bundles
  • CTA -> Github (and then Github can point people to the typeform as a fallback)
  • Consider emphasizing realtime more

Ensure we round-trip empty arrays correctly

@arv says he read about a bug/known-issue/by-design quirk of Golang where empty arrays are deserialized to null and thus don't roundtrip. This seems crazy to me, but if so we need to be super careful to return user data faithfully to the application. Translating an empty array to null would be infuriating as a user.

Flesh out login in the sample app ever so slightly more

Right now the userID is hard-coded into the sample app. Let's push this one step further to completion. After this it would actually be possible to use the Todo app by multiple humans on phones.

  • On first-run the TODO app should ask the user for their email address. There will be no verification of the email address at all.
  • Use the email to login to the data layer. This returns a user ID.
  • The user ID itself will be sent as the auth token to the data layer.
  • The user ID and email address of the current user will be stored persistently in the client so that when it starts up offline it still knows it.
  • It will be possible to "log out" of the client by just clearing the stored email address
  • There is machinery in Replicache client currently to create multiple named caches. Use this to create one for each user so that we can demonstrate switching users offline and without redownloading everything.
  • Deal with badAuth

On the backend this implies:

  • Adding email address to User table
  • Adjusting the create-todo handler and a few other pieces

remove obvious/easy crashes in the client

this issue is for the part of #34 that is required for milestone 1: just make a pass through the client and the code it depends on in diffserver and convert chk/etc to returning errors where possible and sensible.

Unit test the Flutter bindings somehow

I can think of a few ways, increasingly ambitious:

  1. Mock out repm and just test the bindings. It seems like the replicache package doesn't rely on any Flutter machinery except MethodChannel so this should be easy.

  2. Include repm but fake MethodChannel and skip the ios/android code. This implies actually having a desktop build of repm, which would be additional but easyish work.

  3. Include repm and the ios/android code. This seems difficult -- we'd have to fake out MethodChannel in the Flutter code, ios, and android. But this would give us the most coverage.

  4. Another option completely could be to try and get the whole thing working on Flutter desktop. If it uses the same machinery - MethodChannel and friends - maybe that would make the most sense!

Do something about spamminess of log output

This is a problem from even as far back as Turtle trying it out. Right now it's just super hard to follow what's going on in your app because Replicache is syncing so frequently and outputs so much.

But simply turning down the rate makes the experience unsatisfying of trying it out because you have to wait so long for changes to show up in the app.

In the long run, this obvious thing to do is push a notification to the device (or even a payload), but that's a much bigger discussion. Is there something we can do to make this less oft-putting?

It's challenging because everytime I turn this down I end up wanting the logs when debugging something.

I guess that probably the right thing to do is have a logVerbosely setting on Replicache. And when it's false, we only print out errors (and note that we shouldn't consider network problems to be errors).

See also #35

Read-only transactions: query() and subscribe()

This bug supersedes #6.

Query

Query lets you execute read-only "transactions" against the Replicache local cache.

They are "transactions" only in the sense that they will see a consistent snapshot of the cache while they are executing. The cache won't change state between reads within the same transaction function.

var result = await rep.query((ReadTransaction tx) async {
  var res = [];
  for (var item in await tx.scan("/idx/todo-by-owner/$ownerID")) {
    res.add(await tx.get(item.value));
  }
  return res;
});

print(result);

Concurrency

Multiple Replicache read transactions can execute concurrently against the same local cache. Each transaction executes in the context of a particular version of the cache, regardless of whether it changes while the transaction is running.

Re-entrancy

Ideally, nested transactions of any sort would be disallowed. However, due to Future and async/await, this is quite complicated to test for.

In Dart, I think we can use Zones to enforce this (by keeping a per-zone bit of state that we're in a transaction).

If that doesn't work, and we can't figure out how to catch this any other way, the retry catcher on mutate will catch the worst case, of a nested, awaited mutate. See https://dart.dev/articles/archive/zones.

Transaction Parameter

The param received by the query function has the read-only API of the Replicache database on it.

Subscribe

subscribe() is just like query() except that it returns a Dart Stream that automatically fires an event whenever the query result changes.

var subscription = await rep.subscribe((tx) {
  var res = [];
  for (var item in await tx.scan("/idx/todo-by-owner/$ownerID")) {
    res.add(tx.get(item.value));
  }
  return res;
});

subscription.listen((result) {
  setState(() {
    _result = result;
  });
});

Details

  • Replicache always generates one initial event for the stream with the first result
  • Cancelation is via the normal Stream machinery: https://api.dart.dev/stable/2.7.2/dart-async/StreamSubscription/cancel.html
  • Replicache re-runs the provided transaction function whenever it needs to re-check the query.
  • There is no guarantee that a stream event will be generated for each query re-run.
  • There is no guarantee that each subscription event will differ from the prior event.
  • As a first pass, the implementation can simply re-run subscriptions every time the db changes and use a hash function to know whether to generate a new event. In the future this can get super fancy by tracking read records from the cache.

Prevent reentrancy in Flutter SDK

A transaction should not be able to open another transaction. This can happen if the tx function references a Replicache object.

In a non async world this could be detected by checking the call stack. In an async world this can be done using Zones. Zones was also ported to Angular but it does require a bunch of monkey patches. See https://blog.strongbrew.io/how-the-hell-do-zones-really-work/

Maybe we can achieve this by passing around enough context to all the transactions?

Auth

The design doc and getting started guides both gloss over the question of authentication.

The current codebase has an auth mechanism, but that authenticates clients to the Replicache server. It says nothing about interaction with the data layer. But our design requires that clients also authenticate to the data layer.

Specifically, our auth-related requirements seem to be:

  1. The data layer must be able to authenticate the user the client is posting on behalf of, when the client calls the batch endpoint (just as normally, when Replicache is not in the picture)
  2. The data layer must be able to authenticate the user the diff server is posting on behalf of, when the diff server calls the client view endpoint.
  3. The diff server must be able to authenticate the Replicache Account the client is posting on behalf of when the client fetches diffs to the client view.
  4. Eventually, the Replicache service will have a concept of users, which will be members of accounts, which have billing details and so-on. The Replicache service will need to authenticate the Replicache user that the CLI or UI is posting on behalf of, and to ensure the user is authorized to act on behalf of particular accounts.
  5. Auth-related work to integrate should be minimized.

Any other related requirements?

consider establishing a canonical communication channel for our users

we will have to easily reach our users for announcements (electron is ready, try this new feature, there was a security issue you should update) and soliciting feedback. when we go public we should have a way to funnel users into this channel. slack is not good. maybe twitter or email.

Rationalize kv/db abstractions

Concrete problems:

  • In several places the entire map is read from Noms to compute the initial checksum
  • In the call stacks to store and read data, we go back and forth between Noms/JSON several times
  • kv doesn't support Has or Scan operations that client code needs
  • In the client db has an editor and so does Map and they do almost the same thing, the only reason the former is needed is because of lacking Has support.

Subjective problems:

  • kv presents an abstraction of a persistent kv map that hides implementation details, e.g., Noms. However Noms is not completely hidden from caller (db) because db is still responsible for all the commiting and other machinery of Noms. So it seems to me that the main benefit of kv is not attained in its current form. An abstraction that was more complete seems like it would just look likedb.
  • Abstracting away the impl detail of Noms is not a requirement that we have right now. Although we think we might use something other than Noms on the client, (a) in that universe we'd also not likely use Go, obviating the need for the abstraction, (b) if we did need it in the future, we could just do it then, when we had more information.

Sample app #1: Todo List

I'd like to take the existing flutter todo list app we have from Replicant and make it into a sample app for the new protocol.

At a high level this is a matter of building a web service that implements a simple authenticated REST/JSON protocol for todo lists. Then once we have that, we will offline-enable it with Replicache.

The service should be easy to run (we'll want to just leave it running for people to play with) and should be a reasonable model of real running services we want to support.

For these reasons I propose that the sample is written in Go (we use it elsewhere, lots of other folks use it), use simple REST/JSON APIs (commonly used today), run on Zeit Now (everything else we have currently runs there, it's super easy), and store data in Amazon RDS/Postgres (our customers will commonly use relational databases).

full launch scratch pad big list

this is a dumping ground for things we might want to do for full launch, kept here so i don't have to keep it on paper. feel free to add your own items if useful.

Major projects

  • web support
  • accounts
  • billing
  • client-side gc
  • support bigger client view responses (eep - get off lambda??)
  • ios support
  • android support
  • various projects to improve performance and cost of server integration
    • patch responses in client view
    • "hints" to client view
  • push (websocket / mobile push)
  • rearchitect the diffserver to better fit the problem

Smaller projects

  • alert on diffserver problems in production
  • stop log spamming eg #35 , stop storing response in noms
  • AWS key thing
  • get a sense of resource growth over time
  • logging cleanup/restructure
  • add timing information to logs

Stuff that has come up in talks w/potential customers:

  • better command line debugging tools
  • built in indexing
  • prioritized downstream sync
  • upping the 4MB limit (could be by pagination or could be by moving infra)
  • realtime

Few issues running the redo sample

  1. Scary looking error on every pull:

Screen Shot 2020-04-28 at 1 25 04 AM

  1. Error above formatted badly so that message not visible.

  2. Crash when adding a new todo:

Screen Shot 2020-04-28 at 1 25 36 AM

  1. Error above also formatted badly.

Ensure we disallow storing the empty key

Some kv stores (e.g. leveldb) allow this. For our case we would need to make our jspath syntax more complex (to distinguish between removing the root and removing the empty key).

Also there just doesn't seem to be a reason to allow this.

ensure logging is consistent policy- and mechanism-wise

Background:

Problem:

  • when running esp in production we want to have a clear signal that something truly unexpected has gone wrong and a developer should look at the problem. often by convention this signal is an ERROR log line. we don't have a consistent policy or mechanism for making this signal. (note that what we do with an ERROR can be context dependent. maybe in a server in production we log it and have an alert notify us. or when in development of the client we panic as a way to require the issue not be ignored.)
  • similarly, since we haven't had the need for conventions around logging there is no way to tune how spammy or what kind of log information is desired beyond the noms verbose setting. or even to know what is or is not appropriate to log. some conventions would be helpful here, as well as mechanisms. for example we may want to log verbosely in the client for development but only log errors in production so we don't fill up their phone. or we might want logs to normally be quiet to save money but to be able to enable spewing debugging information while we're tracking down a problem.
  • noms as a library takes an opinion about how logging should happen: it says logging should use golang's log package and callers can't change that. if a consumer of noms wants to log differently, say using a more featureful logger, a structure dlogger, or adding some additional context to log lines, it can't because there is no mechanism. concretely if say replicache wanted to annotate log lines with a requestid, it can't do that to noms lines, they're inaccessible to the caller. (libraries often solve this problem by defining the log interface they expect and enabling callers to pass in a thingy that maches.)

I think it's also important to look down the road at problems that are likely coming soon on the server side:

  • when using a cloud logging service like we must with zeit...
    • these services often work better if our server emits structured logs (eg, json). this makes searching the logs easier because the logging service doesn't have to be taught how to parse log lines and it can infer data types. for example a structured json log field emitted as an integer is typically immediately available for range queries without having to tell the log service to parse out /lastMutationId: (\d+)/ and treat $1 as an Integer. (note that IF we had structured logging this doesn't mean we would HAVE to use structured logs everywhere, eg when developing locally; it's easy enough to change log formats depending on environment)
    • we are likely to want to include meta information in log lines that enables us to aggregate across requests eg the accountid or path requested, or to correlate across process boundaries eg a requestid enabling us to locate the log lines from the server from a particular client request. it is annoying and error prone to try to include this information manually in each log line so often a context logger is created for the duration of a request that automatically annotates log lines with this information.
  • we will likely want to start tracking timing of operations and suboperations. this is often easiest to roll into something like a context logger, though there are many strategies.

High-level Proposal

Let's create a logging policy and just a little more structure to give us a place to add/tweak logging behavior that we know is coming. Let's do this in a way that:

  • is as simple as possible and easy to use
  • is easily satisfiable by the plain vanilla golang 'log'
  • introduces only small changes to the code
  • gives us a place to to put logging logic should we need it

Sketch of a Proposal

  1. Establish three log levels:

Having the ERROR level makes it easy to do something with this important signal. Having INFO makes it easy to know what to show by default. Having DEBUG makes it possible to spew lots of information when you need it. (We probably log at DEBUG level by default until things are stable.)

There are obviously some grey areas but I'm confident we can work them out.

  1. Expand https://github.com/rocicorp/diff-server/blob/master/util/log/log.go to have three new public functions that our code uses to log. These functions have the signature of Printf and are thus easily satisfiable by golang's log.Printf:
func Error(format string, v ...interface{}) {...}
func Info(format string, v ...interface{}) {...}
func Debug(format string, v ...interface{}) {...}

For now all they do is write to golang's log.Printf with a "ERROR"/"INFO"/"DEBUG" prefix.

  1. Convert replicache client and diffserver to use rlog.Error/Info/Debug consistent with the policy above.

  2. In order to get consistent log output (eg that has the appropriate annotations) I think it is probably also worthwhile to convert noms to use these interfaces (and maybe for rlog to live there). If we don't want to change the output of noms for some reason we can always do this transparently, eg by configuring noms rlog to write all of error/info/debug directly to log.Printf without a prefix. Consumers of noms should be able to easily pass their own logger in if they want to control what it does (eg if they use a more feature logging library it should be easy to wrap it and pass it in for noms to use).

Additional Info

  • at some point we might want a way to get logs from clients in the field
  • this issue is somewhat related to rocicorp/replicache#34

Run tests continuously

We can just use Github for this, but only once the relevant repos are public (e.g., this is dependent upon #19)

Simple GC in the Client

Right now Replicache never deletes any data so it will just fill up flash forever.

The simplest thing we could probably do is make use of Noms' sync natural pruning of the tree and sync just the commits we want to a new repo, then switch over to that repo atomically.

replicache-client: make failed http connections less dire

From log output from replicache-client:

2020-05-11 23:05:29.895279-1000 Runner[44693:1863873] [VERBOSE-2:ui_dart_state.cc(157)] Unhandled Exception: Exception: Error invoking "beginSync": PlatformException(UNAVAILABLE, sync failed: Post "http://localhost:7001/pull": dial tcp [::1]:7001: connect: connection refused, null)
#0      Replicache._invoke (package:replicache/replicache.dart:397:7)
<asynchronous suspension>
#1      Replicache._beginSync (package:replicache/replicache.dart:243:35)
<asynchronous suspension>
#2      Replicache._sync (package:replicache/replicache.dart:237:28)
<asynchronous suspension>
#3      Replicache.sync (package:replicache/replicache.dart:346:13)
<asynchronous suspension>
#4      _rootRun (dart:async/zone.dart:1122:38)
#5      _CustomZone.run (dart:async/zone.dart:1023:19)
#6      _CustomZone.runGuarded (dart:async/zone.dart:925:7)
#7      _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:965:23)
#8      _rootRun (dart:async/zone.dart:1126:13)
#9      _CustomZone.run (dart:async/zone.dart:1023:19)
#10     _CustomZone.bindCallback.<anonymous closure> (dart:async/zone.dart:949:23)
#11     Timer._createTimer.<anonymous closure> (dart:async-patch/timer_patch.dart:23:15)
#12     _Timer._runTimers (dart:isolate-patch/timer_impl.dart:384:19)
#13     _Timer._handleMessage (dart:isolate-patch/timer_impl.dart:418:5)
#14     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:174:12)

We are offline-first, this shouldn't look like such a big deal.

Polish docs, samples

I think the last bit of TODO here is:

  • Remove /serve prefix from diff-server (#26 )
  • Rename --clientview flag to --client-view (twitch)
  • The SDK is getting a folder structure with build/out prefixes. Remove those.
  • Ensure the TODO client still works against production after rocicorp/diff-server@74c34e0
  • Change the flutter app to something that shows todo data so it can use the backend part of the tutorial
  • Add something at the end of the two tutorials (the replicache one and the flutter one) tying the two together

Should we require batch endpoint to "synchronously" process all mutations?

Right now, in the client API sketch, we require users to "register" all transaction implementations by name. That is, users have to do:

replicache.register("plus", (tx, key, incr) {
  let val = db.get(key) || 0;
  val += incr;
  tx.put(key, val);
});
replicache.exec("plus", "foo", 42);

Rather than:

replicache.exec((tx) {
  let val = db.get("foo") || 0;
  val += 42;
  tx.put("foo", val);
});

Beyond the ergonomic improvement, the former also has a hidden additional complexity, because we also require that the set of named transaction implementations is relatively stable over time, even across app restarts: Replicache might need to replay a transaction that was initially executed on a previous session of the app. Users should ensure that when Replicache goes to find that tx impl, it can find it.

I think the only reason we need these named implementations in the first place, is because the sync protocol allows the batch endpoint to accept a transaction without it subsequently being visible in the client view. As a result, the client needs to maintain the pending transaction -- and be able to replay it -- indefinitely, even across restarts.

If instead, we required that the effects of a transaction sent to the batch endpoint were visible in the client view as soon as the batch returned indicating success, then we could use the simpler anonymous API. Because in that case we would only ever need to replay transactions in the case that they were executed while a sync was in progress, which cannot happen while the app is not running.

Also, if we made that change, it seems like clients could optionally avoid implementing the ClientVersion storage map in their backend, so long as they already had some idempotency mechanism that they could use instead.

Finally, It seems like this proposal is what people must already be doing. Because right now if an app makes a write via an API and then immediately makes a read, the backend already has to deal with the fact that caller is going to probably expect the change to be reflected. At least developers already have to think about this and make a choice. It doesn't seem like Replicache is changing anything.

So questions:

  1. Am I getting this right?
  2. Are there other tradeoffs I'm not considering?
  3. Should we do this?

list of noms sharp edges should we want to extract key functionality

feel free to directly edit this comment or add your own separate comment

  • Ref.Hash() frequently used instead of Ref.TargetHash()
  • types should implement Stringer, which should have a default limit on how much it will dump high enough that test cases just work
  • canonicalization should be built in
  • zero values should be useful or at lest not sharp (eg marshaling a zero ref panics)
  • thread safety should be explicit
  • something about omitempty: i feel like its frequently a source of mistakes for me, eg rocicorp/replicache-client@08d9200
  • null, zero values should round trip consistently or at least behavior should be clearly articulated
  • nomsdl should maybe be json so less to hold in your head?
  • logging should be tunable
  • function calls that can't fail should not return an error
  • let's not have thingy.thingy eg client db has a dataset, which has a db

Upstream Sync

Status: 4/24/2020: general agreement to proceed

Commit

We update Commits https://github.com/rocicorp/replicache-client/blob/c07cacc2bc1de7564a2e823a833557de70554ebe/db/commit.go#L18 to have names that reflect the current design and to introduce MutationID:

  • Genesis => Snapshot.
  • Tx => Local. The set of Local commits after the latest Snapshot is the set of pending commits.
  • Commit.Original gets renamed Commit.NomsStruct because:
  • Reorder goes away and is replaced with an Original field on Local that is present in replayed Local commits to point to their originals for the purposes of debugging/history
  • Remove lastMutationID from Snapshot and have a MutationID in Value so that every commit has it. When rebasing we copy the MutationID (and name and args) from the original commit being replayed to the new Local commit.
  • Simplify the parent relationship by not including the original if any in parents. Every commit now has exactly one parent, though we still have a strong reference to the original via that field.
  • We call the set of local transactions that are children of the latest snapshot on master "pending" transactions: they are the set of mutations that have executed locally that have yet to be confirmed by the server. (Local transactions that were replayed or that were included in a snapshot on master are not pending.)

This scheme is nice because:

  • it's simpler and models rebase directly
  • every commit self-contains all its information
  • selecting the next mutation id is easy: it is one more than the MutationID of the basis.

Sync Overview

Outside of sync, transactions work as they do today (with one small exception, see db.setHead below).

Here's how sync works at a high level. From a git point of view, we are rebasing onto new state from the data layer. There are three phases:

  1. Bindings call BeginSync which pushes mutations to the data layer and pulls the latest client view into a new Snapshot
  2. Bindings call MaybeEndSync which checks if there are pending mutations to replay. If not, it sets head of master to the new state and sync is complete. If so, it returns the set of mutations to replay.
  3. Bindings replay the mutations returned then calls MaybeEndSync again. New mutations might have landed on master while replay was happening, in which case the cycle repeats until all pending mutations have been replayed.

To illustrate, assume master is in the following state: S1 is a snapshot commit and L1-3 are pending local commits.

... - S1 - L1 - L2 - L3  <- head

BeginSync is called it sends pending commits to the batch endpoint and pulls a new client view. It creates a new dangling Snapshot commit S2 with the new state and sets S1 as its basis. We call the commit in this chain with no children the sync head. Note that it is not an actual branch (dataset) in noms, it's just a name we have (and sometimes a ref we have) for the detatched head of this chain. We call S1 the head snapshot and we call S2 the sync snapshot.

... - S1 - L1 - L2 - L3  <- head
         \        
          S2  (sync head)

BeginSync returns and the bindings call MaybeEndSync. Suppose S2 includes L1 but not L2 or L3 (that is, S2.MutationID >= L1.MutationID and S2.MutationID < L2/3.MutationID). BeginSync notices that pending mutations L2 and L3 need to be but have not been replayed so it returns L2,L3.

The bindings replay L2 and L3 serially. For each mutation to be replayed, the client Transaction needs to know the basis / sync head on top of which to execute the mutation. The original proposal rocicorp/replicache#30 (comment) suggested assigning each sync an id and using it to map to its sync head. However we could avoid this bookkeeping because we have almost all the information we need in the bindings already. The new sync head is returned from each commit as commitTransactionResponse.Ref, so all that is currently missing is the first. So the bindings can supply the sync head to client Transactions as follows:

  • return the initial sync head from BeginSync; bindings pass it to the first opentransaction call
  • in each successive opentransaction bindings pass the ref returned in the previous commitTransactionResponse
  • we pass the ref from the final commitCommitTransactionResponse into MaybeEndSync so it knows where to point the head of master

This chaining strategy is elegant in that it introduces nothing sync-related into the database and in that it enforces serial execution of replayed transactions.

OK, so the bindings replay pending mutations serially, each of which creates a new local commit as a child of the previous sync head. Each of these replayed commits is a Local commit that has its Original set to the original local transaction:

... - S1 --- L1 - L2 -- L3  <- head
         \        ·     ·     
          S2 ---- L2' - L3'  (sync head)

The bindings now call MaybeEndSync again which would like to set head of master to the sync head. However it is possible that when it goes to do this it finds new pending commits have landed on master while replay was happening, in this case L4:

... - S1 --- L1 - L2 -- L3 -- L4  <- head
         \        ·     ·     
          S2 ---- L2' - L3'  (sync head)

The bindings need to replay this newly discovered mutation, so MaybeEndSync doesn't end the sync, it returns the newly pending commit L4 to the bindings for replay (hence the name MaybeEndSync). The bindings replay it:

... - S1 --- L1 - L2 -- L3 -- L4  <- head
         \        ·     ·     ·
          S2 ---- L2' - L3' - L4'  (sync head)

Bindings now call MaybeEndSync again which discovers no new pending commits to master. It sets master head to the sync head and the sync is complete.

... - S1 --- L1 - L2 -- L3 -- L4 
         \        ·     ·     ·
          S2 ---- L2' - L3' - L4'  <- head

Yay.

Invariants

I would love to identify important constraints around sync and the commit history in general so that we could do debug checks on them. For example, we should probably check that these are true of the sync head chain before we set it has head of master.

  • No ancestor of a Commit has a greater MutationID than it does. This means at least two things:
    • we will refuse to go back in time by landing a snapshot with a smaller mutation id as a child of one with a larger mutation id
    • all pending commits have mutation id greater than the latest snapshot, as they should
  • A Local commit's MutationID is exactly one greater than that of its basis
  • Immediately after we set head of master to the new sync head one of two
    things must be true of each commit that was pending:
    1. it is no longer pending and it was not replayed because it has MutationID <= the snapshot
    2. it was replayed and its replayed local commit is pending
  • There are exactly zero Local commits among all the ancestors of the most recent Snapshot commit
  • Head parenthood: the sync head must be a child of head of master. A weaker version of this is enforced by fast forward (sync head must be a descendent of head of master) but is sufficient because we never have intervening commits in this proposal.

Database

I don't think the above strictly requires any changes to the db.DB. However

Overlapping syncs

There was discussion above about overlapping syncs and what to do about them. We accommodate them; there is no shared anything between syncs in this proposal. If two syncs are running concurrently one or the other will land their sync head on master first. The second sync will refuse to land its changes because it will notice that a new Snapshot landed on master while it was running. (If things are totally crazy in that head was set to a commit that is not an ancestor of the sync head, fast forward will fail.)

Some edge cases to consider to convince yourself of correctness: two sync with exactly the same snapshot pending commits; two sync with different snapshots but the same pending commits; two syncs with different/same snapshots and more pending commits in one than the other.

Sketch of changes and implementations

// Formerly PullRequest https://github.com/rocicorp/replicache-client/blob/c07cacc2bc1de7564a2e823a833557de70554ebe/repm/types.go#L72
type BeginSyncRequest struct {
        DataLayerAuth string // Authorization head for pushing mutations and fetching client view
}
type BeginSyncResponse struct {
        SyncHead *jsnoms.Hash // Passed into MaybeEndSync so it can check if we are done.
	Error            *SyncError
}
// Previously PullResponseError: https://github.com/rocicorp/replicache-client/blob/c07cacc2bc1de7564a2e823a833557de70554ebe/repm/types.go#L77
type SyncError struct {
	BadAuth string // pre-existing
}

type MaybeEndSyncRequest struct {
	SyncHead *jsnoms.Hash // MaybeEndSync will attempt to set this commit as head of master
}
type MaybeEndSyncResponse struct {
	Ended bool // If true the sync is complete
	SyncHead         *jsnoms.Hash // Basis on which to play the first mutation
	PendingMutations []PendingMutation
}
type PendingMutation struct {
	Name     string          // Used to relpay
	Args     json.RawMessage // Used to replay; is this type right?
	Original *jsnoms.Hash    // Passed into OpenTransaction to set on the new replayed commit
}

// Note: previously empty
type OpenTransactionRequest struct {
	RebaseOpts RebaseOpts
}
type RebaseOpts struct {
	Basis    *jsnoms.Hash // Basis commit on which to play this transaction
	Original *jsnoms.Hash // Original Commit we are replaying. When the current
                              // transaction commits the Original is used to set its
	                      // Original, Name, Args, and MutationID.
}
// OpenTransactionResponse stays the same


BeingSync(bsReq) {
   // Push
   head := db.Head()
   pendingMutations := getPendingMutations(head)
   err := http.Send(makeRequest(bsReq.DataLayerAuth, pendingMutations))
   if err != nil
      log(err)

   // Pull
   err, newServerState := db.Pull(bsReq.DataLayerAuth)
   if err != nil
      return err

   // Set up new sync head 
   syncHeadBasis := findMostRecentSnapshotCommit(head)
   syncHead := makeSnapshot(newServerState, basis=syncHeadBasis)
   db.noms.writeValue(syncHead)

   return BeginSyncResponse{SyncHead: syncHead}
}

MaybeEndSync(mesReq) {
   syncHead := db.loadValue(mesReq.syncHead)

   db.wlock()
   defer db.unlock()

   // Stop if someone else has landed a sync since we started.
   if latestSnapshot(db.head) != latestSnapshot(syncHead).Basis
       return MaybeEndSyncResponse{Ended: true, StatusMsg: "Canceled this sync bc another landed"}

   // Determine if there are any pending mutations that we need to replay.
   pendingMutations := getPendingMutations(db.head)
   mutationsToReplay := findMutationsWithIDsGreaterThan(pendingMutations, syncHead.MutationID)
   if len(mutationsToReplay) > 0
      mesResp := MaybeEndSyncResponse{Ended: false, SyncHead: ref(syncHead)}
      mesResp.PendingMutations = mutationsToReplay 
      return mesResp

   // Here check invariants for all commits from syncHead back to sync snapshot.
   ...

   err := db.setHeadLocked(syncHead)
   if err != nil
      return MaybeEndSyncResponse{Ended: true, StatusMsg: fmt("Unexpected error: %w", err)}

   return MaybeEndSyncResponse{Ended: true}
}


Bindings.Sync:
   // BeginSync
   bsResp, err := client.BeginSync(dataLayerAuth)
   if err != nil 
      return err
   basis := bsResp.basis

   // MaybeEndSync
   mesResp := client.MaybeEndSync(basis)
   while !mesResp.Ended
      err, basis = Bindings.Replay(mesResp.SyncHead, mesResp.PendingMutations)
      if err != nil
         return err
      mesResp = client.MaybeEndSync(basis)

Bindings.Replay(basis, pendingMutations) (error, basis) {
   for each pending mutation pm in pendingMutations
      // basis and original below go into OpenTransaction.RebaseOpts
      commitTransactionResponse, err := execute(basis, pm.Name, pm.Args, pm.Original) 
      if err != nil
          return err, nil
      basis = commitTransactionResponse.Ref

   return nil, basis
}

Misc details to not forget

  • batch endpoint sketch and impl
  • fix db.head access
  • fix same in diffserver!
  • sketch how we would do this if in production
  • axe execinternal
  • local -> master
  • remove remote
  • respect 4MB limit for push
  • remove old rebase when complete

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.