Implementing counting state machines,about atomix/copycat

Comments (13)

kuujo commented on May 23, 2024

I have run into this problem in the past too. I think you're right that some form of snapshotting is the only way to do this completely safely and efficiently. I'm certainly not opposed to allowing some state to be stored in snapshots.

The alternative is to provide the state machine with a way to commit entries to the Raft log. In this case, state machines can essentially periodically store their own small snapshots as a single commit. For instance, for DistributedAtomicLong, the state machine would periodically commit some tombstone command to the log and then clean any prior commits. This would work fine, but it's not very graceful. Ultimately I'd favor some way to specify how the log is compacted for state machine state.

There are some challenges here with respect to session events. Copycat transparently ensures that commands that trigger messages sent to a client will not be removed from the log until received by the client so that servers can safely restart and replay event messages. In order to ensure the same is true with snapshots, a state machine would have to take a snapshot of its state and wait for the completeIndex (which indicates the lowest event index received by all clients) to meet the snapshot index before persisting it, but by using log cleaning underneath this really wouldn't be an issue. But if snapshots and log cleaning were working together, we'd need some way to determine the entries that can be removed as a result of a snapshot being installed (or the state machine could still be responsible for that), but that's not hard.

The most recent PR actually makes some changes that may make it easier to extend to add snapshots as well. It could be pretty trivial to come up with a decent implementation of support for small snapshots this week. I'll try to come up with some ideas. Basically, the new PR already adds a ConfigureRequest to send configurations to other nodes. This could also be used to send the most recent snapshot as well. I may take a stab at that.

BTW DistributedAtomicLong obviously has this problem as well. What I did there was just use optimistic locking to set counter values in commands. That's obviously not ideal, but ultimately that's what Apache Curator has to do for counters in ZooKeeper as well, so I don't feel so bad since at least there's a (not ideal) solution :-)

On Nov 29, 2015, at 5:59 PM, Madan Jampani [email protected] wrote:

If I want to implement a state machine that tracks the number of times it was updated then the current implementation presents some difficulties. We work under the assumption that entries that no longer contribute to state machine evolution need not be replicated. But for operation counting state machines all operations are meaningful. I guess if we annotate the command entry as a tombstone we can clean it up after it is replicated to all instances. But a cluster configuration change to add new members poses a challenge.

I'm opening this issue to get some thoughts on this subject. My initial thought was that for some state machine types snapshotting is the only way to go if we want to garbage collect log entries from time to time.

—
Reply to this email directly or view it on GitHub.

from copycat.

kuujo commented on May 23, 2024

@madjam I hacked together a commit that shows what this could look like: f6dfb2d

What I did was added a SnapshotAware interface for state machines. When that interface is implemented, state machines will periodically take snapshots. A snapshot is held in memory until lastCompleted equals the snapshot index to ensure events are received by clients before commands are removed from the log, at which point it's persisted to the MetaStore file.

In this rudimentary implementation, snapshots are unopinionated with respect to the commits they affect. It's still up to the state machine to clean commits that were replaced by the snapshot. This allows for a highly flexible system that will allow Atomix to use both snapshotting and log cleaning where it makes sense, but it's still not ideal. We could have more identifier interfaces to indicate whether a state machine does log cleaning at all.

Alternatively, individual commits could be cleanable/snapshottable based on the state machine interface. A command with a state machine method with a CleanableCommit parameter rather than Commit can be expected to be cleaned, otherwise snapshotting is used. There are a bunch of potential ways to go about this. I think I tend to favor something that makes state machine responsibilities obvious to the state machine author. Maybe the CleanableCommit approach works the best since the behavior is defined by the state machine interface. Also, snapshotting introduces the possibility that command commits can be garbage collected (or actually released back to the commit pool) without being cleaned for a state machine that cleans all prior commits after a snapshot, but with CleanableCommit we can still throw exceptions when commits that should be cleaned aren't cleaned properly.

There are also other ways to go about it like marking commands as cleanable or perhaps adding a third option to Command.PersistenceLevel (ooh this actually seems nice) like Command.PersistenceLevel.SNAPSHOT. I'm actually not a fan of the way command persistence is done right now since it's really the state machine logic that determines whether a command should be treated as a tombstone and needs to be retained, and similarly it's the state machine logic that determines whether a commit needs to be cleaned or can be cleaned automatically after a snapshot is taken. But I think maybe commands are close enough to state machines that defining the persistence level there is justified. But I digress.

Thoughts? Does the idea of adding a PersistenceLevel.SNAPSHOT seem reasonable? Unfortunately, there's no great way to enforce that snapshots store the state of commands marked with the SNAPSHOT persistence level. I suppose forcing all commands to be cleaned sort of does that insofar as state machines must at least retain commands and explicitly clean them when a snapshot is taken, but that feels like unnecessary work for a lot of use cases.

from copycat.

kuujo commented on May 23, 2024

One more note: the commit above is really only designed for small snapshots. I think this is acceptable for the time being as state machines with larger state like data structures can use log cleaning, and I think the idea of using a mixture of both is pretty intriguing. The commit above sends a snapshot with the ConfigureRequest that is sent when the leader begins sending entries to a new follower or a configuration change occurs. The logic here is actually not correct though. It should be based on nextIndex and not matchIndex, but you get the idea.

ConfigureRequest was added in the client-server-transports PR from which this was branched.

from copycat.

madjam commented on May 23, 2024

TBH I'm a bit confused by the PersistenceLevel options currently available. In a couple of state machines I implemented I did not pay attention to this field at all (may be I should). In my state machine I know when I should clean up a prior commit so this additional piece of state seemed a bit superfluous. May be I missed something important or is this what you mean when you say you are not a fan of how command persistence is done right now?

For this reason I'm not very clear on how a state machine developer will use and interpret the proposed PersistenceLevel.SNAPSHOT

Obviously a state machine that can snapshot itself periodically is the simplest to implement (for a state machine developer). All the low level details of which log entries to clean up and when it is safe to clean them is managed by copycat. To me this is the ideal state (from a developer perspective). Anything above this means the developer is willing to pay some cost (in terms of implementation complexity) in return for potential performance gain.

Also I thought each state machine instance that is SnapshotAware would take its own snapshot. Is that not the case?

from copycat.

kuujo commented on May 23, 2024

So, PersistenceLevel right now is what defines whether a Command should be persisted as a normal entry or a tombstone. PersistemceLevel.EPHEMERAL indicates that a command can be safely removed from the log once clean is called, and PersistenveLevel.PERSISTENT indicates that a command should be retained until it's applied on all servers, even if the command was already applied to and cleaned by the state machine.

The problem is, state machines have no context wrt the cluster. State machines will presumably always clean a commit at the same point regardless of whether it's actually safe for it to be physically removed from disk. So, state machines say when it's safe for a command to be removed from disk from the state machine's perspective (it no longer contributes to the state machine's state), but the PersistenceLevel dictates when it's actually safe to be removed from disk by essentially indicating (right now) whether a command needs to be applied on all servers. Without PersistenceLevel indicating commands that remove state machine state, the following could happen in a ValueState in Atomix:

Client commits "set 1" to S1, S2 and S3
S3 is partitioned
Client commits "remove 1" to S1 and S2
S1 and S2 state machines both clean "set 1" since it was overwritten and clean "remove 1" since it no longer contributes to their state (the value is nil)
S1 and S2 both compact their logs, removing "set 1" and "remove 1"
Partition heals and S3 still has the state resulting from "set 1"

Obviously, as you know Copycat will retain tombstones in this scenario to ensure they're sent to S3. The PersistenceLevel is just a marker to indicate that the CommandEntry that holds the Command is a tombstone.

There are also other use cases for the PERSISTENT persistence level. It can simply be used for any case where a user wants to ensure a command is applied on all servers. State machines can't reliably determine when that's the case, so state machines are responsible only for cleaning commits once they're no longer needed, and Copycat will ensure they're applied on as many or as few servers as is necessary, even after they're cleaned, according to the PersistenceLevel.

So, the idea with adding the SNAPSHOT persistence level is similar in that it will allow Copycat to handle the removal of command entries from the log without the state machine ever having to even call clean on any commit. So, in that sense the SNAPSHOT persistence level is just a third way of saying when a command can be removed (after a snapshot is taken). If all of a state machine's commands have PersistenceLevel.SNAPSHOT (which could be the default), the state machine only has to store and load a snapshot and doesn't have to clean any commits. When a snapshot is taken, Copycat will automatically clean all prior SNAPSHOT entries from the log. This also enables a state machine that can use a mixture of both.

Another option for state machines is to make the Commit implementation indicative of the command type. State machine methods could accept SnapshottableCommit or CleanableCommit or TombstoneCommit as parameters, and state machines could be responsible for handling the commits based on the contract defined by the state machine interface. What that does is places the word Snapshot inside the commit class name, perhaps simplifying the understanding that the state machine is responsible for storing snapshots of state resulting from that commit, or that the state machine is responsible for calling clean on a CleanableCommit. But I still think PersistenceLevel is fine, particularly if snapshots are the default.

So, I think my favorite idea is to just make SNAPSHOT the default persistence level to ensure that the persistence level must be explicitly overridden to enable log cleaning inside the state machine.

As to the last question, Atomix state machines are multiplexed and managed by a single core state machine - ResourceManager. The resource manager creates logical sessions for each resource/state machine instance to allow different resource state machines to interact with specific objects on the client side. It will do something similar for snapshots. Copycat will trigger a snapshot of the ResourceManager state machine, and the resource manager will compile a snapshot of each of the SnapshotAware resource state machines. So, each Atomix state machine can take a snapshot, but they'll be stored in a single snapshot file. This will also mean different Atomix state machines can use log cleaning or snapshots at will, which is a nice side effect of having done snapshots after log cleaning.

On Nov 30, 2015, at 5:23 PM, Madan Jampani [email protected] wrote:

TBH I'm a bit confused by the PersistenceLevel options currently available. In a couple of state machines I implemented I did not pay attention to this field at all (may be I should). In my state machine I know when I should clean up a prior commit so this additional piece of state seemed a bit superfluous. May be I missed something important or is this what you mean when you say you are not a fan of how command persistence is done right now?

For this reason I'm not very clear on how a state machine developer will use and interpret the proposed PersistenceLevel.SNAPSHOT

Obviously a state machine that can snapshot itself periodically is the simplest to implement (for a state machine developer). All the low level details of which log entries to clean up and when it is safe to clean them is managed by copycat. To me this is the ideal state (from a developer perspective). Anything above this means the developer is willing to pay some cost (in terms of implementation complexity) in return for potential performance gain.

Also I thought each state machine instance that is SnapshotAware would take its own snapshot. Is that not the case?

—
Reply to this email directly or view it on GitHub.

from copycat.

kuujo commented on May 23, 2024

If the SNAPSHOT persistence level becomes the default - which I think is wise and you're right it's the simplest and probably most common use - I'll have to take some extra time to support large snapshots. Shouldn't be a big deal. Just a longish day of work to handle replicating snapshots without elections timing out.

On Nov 30, 2015, at 5:23 PM, Madan Jampani [email protected] wrote:

TBH I'm a bit confused by the PersistenceLevel options currently available. In a couple of state machines I implemented I did not pay attention to this field at all (may be I should). In my state machine I know when I should clean up a prior commit so this additional piece of state seemed a bit superfluous. May be I missed something important or is this what you mean when you say you are not a fan of how command persistence is done right now?

For this reason I'm not very clear on how a state machine developer will use and interpret the proposed PersistenceLevel.SNAPSHOT

Obviously a state machine that can snapshot itself periodically is the simplest to implement (for a state machine developer). All the low level details of which log entries to clean up and when it is safe to clean them is managed by copycat. To me this is the ideal state (from a developer perspective). Anything above this means the developer is willing to pay some cost (in terms of implementation complexity) in return for potential performance gain.

Also I thought each state machine instance that is SnapshotAware would take its own snapshot. Is that not the case?

—
Reply to this email directly or view it on GitHub.

from copycat.

madjam commented on May 23, 2024

Thanks for clarifying the intent behind PersistenceLevel. I now feel EPHEMERAL and PERSISTENT are a bit misleading given what they accomplish. Basically by selecting a specific PersistenceLevel what one is selecting is a pre-condition for command entry cleaning. SNAPSHOT is a good name as it tells you a command entry can be cleaned once a snapshot is taken. We probably want something similar for the other two to better convey their purpose. My first suggestions are QUORUM_COMMIT in place of EPHEMERAL and GLOBAL_COMMIT in place of PERSISTENT. Also rename PersistenceLevel to something like GarbageCollectionPreCondition. May be you can think of better names.

Also, I like SNAPSHOT option being the default. It is very easy to implement (and get right) for newbie state machine writer. More advanced users can play around with other options.

from copycat.

kuujo commented on May 23, 2024

I agree. I think they're poorly named, and actually they could even be switched if looked at from a the perspective of a state machine (EPHEMERAL could be thought of as a command that is immediately cleaned, e.g. as tombstones typically are, and PERSISTENT as a command that remains in the state machine until overridden) rather than from the perspective of the log/replication, so the lack of clarity in naming there is not good.

I like your suggestions. I'll get a PR opened for this hopefully tomorrow.

On Nov 30, 2015, at 9:11 PM, Madan Jampani [email protected] wrote:

Thanks for clarifying the intent behind PersistenceLevel. I now feel EPHEMERAL and PERSISTENT are a bit misleading given what they accomplish. Basically by selecting a specific PersistenceLevel what one is selecting is a pre-condition for command entry cleaning. SNAPSHOT is a good name as it tells you a command entry can be cleaned once a snapshot is taken. We probably want something similar for the other two to better convey their purpose. My first suggestions are QUORUM_COMMIT in place of EPHEMERAL and GLOBAL_COMMIT in place of PERSISTENT. Also rename PersistenceLevel to something like GarbageCollectionPreCondition. May be you can think of better names.

Also, I like SNAPSHOT option being the default. It is very easy to implement (and get right) for newbie state machine writer. More advanced users can play around with other options.

—
Reply to this email directly or view it on GitHub.

from copycat.

kuujo commented on May 23, 2024

Alternatively, this could also be implemented as something like a GarbageCollectionStrategy with implementations indicating whether it's safe to remove an entry from disk based on the current cluster state. The problem with the strategy pattern is that it implies a user could provide a custom strategy, but I'm not sure there's actually any other safe use case than these three, so that may not make sense.

On Nov 30, 2015, at 9:11 PM, Madan Jampani [email protected] wrote:

Thanks for clarifying the intent behind PersistenceLevel. I now feel EPHEMERAL and PERSISTENT are a bit misleading given what they accomplish. Basically by selecting a specific PersistenceLevel what one is selecting is a pre-condition for command entry cleaning. SNAPSHOT is a good name as it tells you a command entry can be cleaned once a snapshot is taken. We probably want something similar for the other two to better convey their purpose. My first suggestions are QUORUM_COMMIT in place of EPHEMERAL and GLOBAL_COMMIT in place of PERSISTENT. Also rename PersistenceLevel to something like GarbageCollectionPreCondition. May be you can think of better names.

Also, I like SNAPSHOT option being the default. It is very easy to implement (and get right) for newbie state machine writer. More advanced users can play around with other options.

—
Reply to this email directly or view it on GitHub.

from copycat.

madjam commented on May 23, 2024

I see what you mean and I agree that we should come up with a name that makes sense from the perspective of state machine. SNAPSHOT is already such a name. Another way of looking at it is: we want to know if a command is safe to cleaned up immediately. For EPHEMERAL commands the answer is yes and for PERSISTENT commands the answer is no. Diving one more level the question is: what class of commands that are PERSISTENT? Looks like commands that delete stuff are usually PERSISTENT? These could be commands that remove keys/entries from a map/set, release locks, etc. I'm not sure if there a way to generalize this. But if we can, this would mean the state machine writer can specify that a given command is an information shedding operation and we take that as a hint to make it persistent.
But in the end I suspect this might be end up being very state machine specific and there might not be a way to generalize this simply for all state machines.

Another alternative for a name is GarbageCollectionTrigger. This feels less like something you can customize.

from copycat.

kuujo commented on May 23, 2024

@madjam I improved upon the snapshotting branch pretty significantly. I'll write tests for it today and hopefully open a PR. I imagine it will take a few days of testing and cleanup to get it right since there's a lot that can go wrong here, but overall it's not particularly complicated.

What I did with the snapshotting branch is extend that initial hack to provide full support for snapshots. What this means in the initial implementation is:

State machines can take arbitrarily large snapshots (with a configurable size limit)
Snapshots are sent in a separate InstallRequest when nextIndex equals the leader's last snapshot index
The InstallRequest supports multi-part snapshots. By default, the leader will send something like 32KiB chunks at a time. Each InstallRequest has a snapshot index, offset, and complete boolean to indicate when the last snapshot chunk is sent. This IIRC is exactly what Diego did in LogCabin
The log compaction algorithm will now automatically remove commands where compact is SNAPSHOT and a snapshot has been taken or received at a later index

In order to maintain safety for the session events system, snapshots taken of state machines are stored on disk but aren't completed until all events up to the snapshot index have been received. That ensures that entries that contribute unacknowledged events are not removed from disk. In practice, this should represent a very short time window.

When a snapshot is taken of the state machine and the snapshot is completed, rather than trying to find entries in the log that can be cleaned, this branch modifies the log compaction algorithm itself to recognize SNAPSHOT entries during compaction. So, this should effectively have no impact on performance on the logging side. But there are potentially optimization a that can be made in the future wrt taking snapshots. There's no parallelism here that would allow state machine operations to continue while a snapshot is taking place, but I think that's an optimization that needs to be made after the feature is stabilized and released.

This is also still a little sloppy. As new features have been added recently, the code is feeling more and more disorganized. For instance, in this branch the ServerStateMachine sets the snapahotIndex for log compaction, but in other areas specific server states set the minorIndex and majorIndex. But again, if that's not cleaned up now, that internal stuff can be cleaned up over time.

from copycat.

madjam commented on May 23, 2024

@kuujo This is very cool. I will look forward to your PR request.

from copycat.

kuujo commented on May 23, 2024

#78

from copycat.

Implementing counting state machines about copycat HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent