Comments (13)
I have run into this problem in the past too. I think you're right that some form of snapshotting is the only way to do this completely safely and efficiently. I'm certainly not opposed to allowing some state to be stored in snapshots.
The alternative is to provide the state machine with a way to commit entries to the Raft log. In this case, state machines can essentially periodically store their own small snapshots as a single commit. For instance, for DistributedAtomicLong, the state machine would periodically commit some tombstone command to the log and then clean any prior commits. This would work fine, but it's not very graceful. Ultimately I'd favor some way to specify how the log is compacted for state machine state.
There are some challenges here with respect to session events. Copycat transparently ensures that commands that trigger messages sent to a client will not be removed from the log until received by the client so that servers can safely restart and replay event messages. In order to ensure the same is true with snapshots, a state machine would have to take a snapshot of its state and wait for the completeIndex (which indicates the lowest event index received by all clients) to meet the snapshot index before persisting it, but by using log cleaning underneath this really wouldn't be an issue. But if snapshots and log cleaning were working together, we'd need some way to determine the entries that can be removed as a result of a snapshot being installed (or the state machine could still be responsible for that), but that's not hard.
The most recent PR actually makes some changes that may make it easier to extend to add snapshots as well. It could be pretty trivial to come up with a decent implementation of support for small snapshots this week. I'll try to come up with some ideas. Basically, the new PR already adds a ConfigureRequest to send configurations to other nodes. This could also be used to send the most recent snapshot as well. I may take a stab at that.
BTW DistributedAtomicLong obviously has this problem as well. What I did there was just use optimistic locking to set counter values in commands. That's obviously not ideal, but ultimately that's what Apache Curator has to do for counters in ZooKeeper as well, so I don't feel so bad since at least there's a (not ideal) solution :-)
On Nov 29, 2015, at 5:59 PM, Madan Jampani [email protected] wrote:
If I want to implement a state machine that tracks the number of times it was updated then the current implementation presents some difficulties. We work under the assumption that entries that no longer contribute to state machine evolution need not be replicated. But for operation counting state machines all operations are meaningful. I guess if we annotate the command entry as a tombstone we can clean it up after it is replicated to all instances. But a cluster configuration change to add new members poses a challenge.
I'm opening this issue to get some thoughts on this subject. My initial thought was that for some state machine types snapshotting is the only way to go if we want to garbage collect log entries from time to time.
—
Reply to this email directly or view it on GitHub.
from copycat.
@madjam I hacked together a commit that shows what this could look like: f6dfb2d
What I did was added a SnapshotAware
interface for state machines. When that interface is implemented, state machines will periodically take snapshots. A snapshot is held in memory until lastCompleted
equals the snapshot index to ensure events are received by clients before commands are removed from the log, at which point it's persisted to the MetaStore
file.
In this rudimentary implementation, snapshots are unopinionated with respect to the commits they affect. It's still up to the state machine to clean
commits that were replaced by the snapshot. This allows for a highly flexible system that will allow Atomix to use both snapshotting and log cleaning where it makes sense, but it's still not ideal. We could have more identifier interfaces to indicate whether a state machine does log cleaning at all.
Alternatively, individual commits could be cleanable/snapshottable based on the state machine interface. A command with a state machine method with a CleanableCommit
parameter rather than Commit
can be expected to be cleaned, otherwise snapshotting is used. There are a bunch of potential ways to go about this. I think I tend to favor something that makes state machine responsibilities obvious to the state machine author. Maybe the CleanableCommit
approach works the best since the behavior is defined by the state machine interface. Also, snapshotting introduces the possibility that command commits can be garbage collected (or actually released back to the commit pool) without being cleaned for a state machine that cleans all prior commits after a snapshot, but with CleanableCommit
we can still throw exceptions when commits that should be cleaned aren't cleaned properly.
There are also other ways to go about it like marking commands as cleanable or perhaps adding a third option to Command.PersistenceLevel
(ooh this actually seems nice) like Command.PersistenceLevel.SNAPSHOT
. I'm actually not a fan of the way command persistence is done right now since it's really the state machine logic that determines whether a command should be treated as a tombstone and needs to be retained, and similarly it's the state machine logic that determines whether a commit needs to be cleaned or can be cleaned automatically after a snapshot is taken. But I think maybe commands are close enough to state machines that defining the persistence level there is justified. But I digress.
Thoughts? Does the idea of adding a PersistenceLevel.SNAPSHOT
seem reasonable? Unfortunately, there's no great way to enforce that snapshots store the state of commands marked with the SNAPSHOT
persistence level. I suppose forcing all commands to be clean
ed sort of does that insofar as state machines must at least retain commands and explicitly clean them when a snapshot is taken, but that feels like unnecessary work for a lot of use cases.
from copycat.
One more note: the commit above is really only designed for small snapshots. I think this is acceptable for the time being as state machines with larger state like data structures can use log cleaning, and I think the idea of using a mixture of both is pretty intriguing. The commit above sends a snapshot with the ConfigureRequest
that is sent when the leader begins sending entries to a new follower or a configuration change occurs. The logic here is actually not correct though. It should be based on nextIndex
and not matchIndex
, but you get the idea.
ConfigureRequest
was added in the client-server-transports
PR from which this was branched.
from copycat.
TBH I'm a bit confused by the PersistenceLevel
options currently available. In a couple of state machines I implemented I did not pay attention to this field at all (may be I should). In my state machine I know when I should clean up a prior commit so this additional piece of state seemed a bit superfluous. May be I missed something important or is this what you mean when you say you are not a fan of how command persistence is done right now?
For this reason I'm not very clear on how a state machine developer will use and interpret the proposed PersistenceLevel.SNAPSHOT
Obviously a state machine that can snapshot itself periodically is the simplest to implement (for a state machine developer). All the low level details of which log entries to clean up and when it is safe to clean them is managed by copycat. To me this is the ideal state (from a developer perspective). Anything above this means the developer is willing to pay some cost (in terms of implementation complexity) in return for potential performance gain.
Also I thought each state machine instance that is SnapshotAware
would take its own snapshot. Is that not the case?
from copycat.
So, PersistenceLevel
right now is what defines whether a Command
should be persisted as a normal entry or a tombstone. PersistemceLevel.EPHEMERAL
indicates that a command can be safely removed from the log once clean
is called, and PersistenveLevel.PERSISTENT
indicates that a command should be retained until it's applied on all servers, even if the command was already applied to and clean
ed by the state machine.
The problem is, state machines have no context wrt the cluster. State machines will presumably always clean
a commit at the same point regardless of whether it's actually safe for it to be physically removed from disk. So, state machines say when it's safe for a command to be removed from disk from the state machine's perspective (it no longer contributes to the state machine's state), but the PersistenceLevel
dictates when it's actually safe to be removed from disk by essentially indicating (right now) whether a command needs to be applied on all servers. Without PersistenceLevel
indicating commands that remove state machine state, the following could happen in a ValueState
in Atomix:
- Client commits "set 1" to S1, S2 and S3
- S3 is partitioned
- Client commits "remove 1" to S1 and S2
- S1 and S2 state machines both clean "set 1" since it was overwritten and clean "remove 1" since it no longer contributes to their state (the value is nil)
- S1 and S2 both compact their logs, removing "set 1" and "remove 1"
- Partition heals and S3 still has the state resulting from "set 1"
Obviously, as you know Copycat will retain tombstones in this scenario to ensure they're sent to S3. The PersistenceLevel
is just a marker to indicate that the CommandEntry
that holds the Command
is a tombstone.
There are also other use cases for the PERSISTENT
persistence level. It can simply be used for any case where a user wants to ensure a command is applied on all servers. State machines can't reliably determine when that's the case, so state machines are responsible only for cleaning commits once they're no longer needed, and Copycat will ensure they're applied on as many or as few servers as is necessary, even after they're cleaned, according to the PersistenceLevel
.
So, the idea with adding the SNAPSHOT
persistence level is similar in that it will allow Copycat to handle the removal of command entries from the log without the state machine ever having to even call clean
on any commit. So, in that sense the SNAPSHOT
persistence level is just a third way of saying when a command can be removed (after a snapshot is taken). If all of a state machine's commands have PersistenceLevel.SNAPSHOT
(which could be the default), the state machine only has to store and load a snapshot and doesn't have to clean any commits. When a snapshot is taken, Copycat will automatically clean
all prior SNAPSHOT
entries from the log. This also enables a state machine that can use a mixture of both.
Another option for state machines is to make the Commit
implementation indicative of the command type. State machine methods could accept SnapshottableCommit
or CleanableCommit
or TombstoneCommit
as parameters, and state machines could be responsible for handling the commits based on the contract defined by the state machine interface. What that does is places the word Snapshot
inside the commit class name, perhaps simplifying the understanding that the state machine is responsible for storing snapshots of state resulting from that commit, or that the state machine is responsible for calling clean
on a CleanableCommit
. But I still think PersistenceLevel
is fine, particularly if snapshots are the default.
So, I think my favorite idea is to just make SNAPSHOT
the default persistence level to ensure that the persistence level must be explicitly overridden to enable log cleaning inside the state machine.
As to the last question, Atomix state machines are multiplexed and managed by a single core state machine - ResourceManager
. The resource manager creates logical sessions for each resource/state machine instance to allow different resource state machines to interact with specific objects on the client side. It will do something similar for snapshots. Copycat will trigger a snapshot of the ResourceManager
state machine, and the resource manager will compile a snapshot of each of the SnapshotAware
resource state machines. So, each Atomix state machine can take a snapshot, but they'll be stored in a single snapshot file. This will also mean different Atomix state machines can use log cleaning or snapshots at will, which is a nice side effect of having done snapshots after log cleaning.
On Nov 30, 2015, at 5:23 PM, Madan Jampani [email protected] wrote:
TBH I'm a bit confused by the PersistenceLevel options currently available. In a couple of state machines I implemented I did not pay attention to this field at all (may be I should). In my state machine I know when I should clean up a prior commit so this additional piece of state seemed a bit superfluous. May be I missed something important or is this what you mean when you say you are not a fan of how command persistence is done right now?
For this reason I'm not very clear on how a state machine developer will use and interpret the proposed PersistenceLevel.SNAPSHOT
Obviously a state machine that can snapshot itself periodically is the simplest to implement (for a state machine developer). All the low level details of which log entries to clean up and when it is safe to clean them is managed by copycat. To me this is the ideal state (from a developer perspective). Anything above this means the developer is willing to pay some cost (in terms of implementation complexity) in return for potential performance gain.
Also I thought each state machine instance that is SnapshotAware would take its own snapshot. Is that not the case?
—
Reply to this email directly or view it on GitHub.
from copycat.
If the SNAPSHOT
persistence level becomes the default - which I think is wise and you're right it's the simplest and probably most common use - I'll have to take some extra time to support large snapshots. Shouldn't be a big deal. Just a longish day of work to handle replicating snapshots without elections timing out.
On Nov 30, 2015, at 5:23 PM, Madan Jampani [email protected] wrote:
TBH I'm a bit confused by the PersistenceLevel options currently available. In a couple of state machines I implemented I did not pay attention to this field at all (may be I should). In my state machine I know when I should clean up a prior commit so this additional piece of state seemed a bit superfluous. May be I missed something important or is this what you mean when you say you are not a fan of how command persistence is done right now?
For this reason I'm not very clear on how a state machine developer will use and interpret the proposed PersistenceLevel.SNAPSHOT
Obviously a state machine that can snapshot itself periodically is the simplest to implement (for a state machine developer). All the low level details of which log entries to clean up and when it is safe to clean them is managed by copycat. To me this is the ideal state (from a developer perspective). Anything above this means the developer is willing to pay some cost (in terms of implementation complexity) in return for potential performance gain.
Also I thought each state machine instance that is SnapshotAware would take its own snapshot. Is that not the case?
—
Reply to this email directly or view it on GitHub.
from copycat.
Thanks for clarifying the intent behind PersistenceLevel
. I now feel EPHEMERAL
and PERSISTENT
are a bit misleading given what they accomplish. Basically by selecting a specific PersistenceLevel
what one is selecting is a pre-condition for command entry cleaning. SNAPSHOT
is a good name as it tells you a command entry can be cleaned once a snapshot is taken. We probably want something similar for the other two to better convey their purpose. My first suggestions are QUORUM_COMMIT
in place of EPHEMERAL
and GLOBAL_COMMIT
in place of PERSISTENT
. Also rename PersistenceLevel
to something like GarbageCollectionPreCondition
. May be you can think of better names.
Also, I like SNAPSHOT
option being the default. It is very easy to implement (and get right) for newbie state machine writer. More advanced users can play around with other options.
from copycat.
I agree. I think they're poorly named, and actually they could even be switched if looked at from a the perspective of a state machine (EPHEMERAL
could be thought of as a command that is immediately cleaned, e.g. as tombstones typically are, and PERSISTENT
as a command that remains in the state machine until overridden) rather than from the perspective of the log/replication, so the lack of clarity in naming there is not good.
I like your suggestions. I'll get a PR opened for this hopefully tomorrow.
On Nov 30, 2015, at 9:11 PM, Madan Jampani [email protected] wrote:
Thanks for clarifying the intent behind PersistenceLevel. I now feel EPHEMERAL and PERSISTENT are a bit misleading given what they accomplish. Basically by selecting a specific PersistenceLevel what one is selecting is a pre-condition for command entry cleaning. SNAPSHOT is a good name as it tells you a command entry can be cleaned once a snapshot is taken. We probably want something similar for the other two to better convey their purpose. My first suggestions are QUORUM_COMMIT in place of EPHEMERAL and GLOBAL_COMMIT in place of PERSISTENT. Also rename PersistenceLevel to something like GarbageCollectionPreCondition. May be you can think of better names.
Also, I like SNAPSHOT option being the default. It is very easy to implement (and get right) for newbie state machine writer. More advanced users can play around with other options.
—
Reply to this email directly or view it on GitHub.
from copycat.
Alternatively, this could also be implemented as something like a GarbageCollectionStrategy
with implementations indicating whether it's safe to remove an entry from disk based on the current cluster state. The problem with the strategy pattern is that it implies a user could provide a custom strategy, but I'm not sure there's actually any other safe use case than these three, so that may not make sense.
On Nov 30, 2015, at 9:11 PM, Madan Jampani [email protected] wrote:
Thanks for clarifying the intent behind PersistenceLevel. I now feel EPHEMERAL and PERSISTENT are a bit misleading given what they accomplish. Basically by selecting a specific PersistenceLevel what one is selecting is a pre-condition for command entry cleaning. SNAPSHOT is a good name as it tells you a command entry can be cleaned once a snapshot is taken. We probably want something similar for the other two to better convey their purpose. My first suggestions are QUORUM_COMMIT in place of EPHEMERAL and GLOBAL_COMMIT in place of PERSISTENT. Also rename PersistenceLevel to something like GarbageCollectionPreCondition. May be you can think of better names.
Also, I like SNAPSHOT option being the default. It is very easy to implement (and get right) for newbie state machine writer. More advanced users can play around with other options.
—
Reply to this email directly or view it on GitHub.
from copycat.
I see what you mean and I agree that we should come up with a name that makes sense from the perspective of state machine. SNAPSHOT
is already such a name. Another way of looking at it is: we want to know if a command is safe to cleaned up immediately. For EPHEMERAL
commands the answer is yes and for PERSISTENT
commands the answer is no. Diving one more level the question is: what class of commands that are PERSISTENT
? Looks like commands that delete stuff are usually PERSISTENT
? These could be commands that remove keys/entries from a map/set, release locks, etc. I'm not sure if there a way to generalize this. But if we can, this would mean the state machine writer can specify that a given command is an information shedding operation and we take that as a hint to make it persistent.
But in the end I suspect this might be end up being very state machine specific and there might not be a way to generalize this simply for all state machines.
Another alternative for a name is GarbageCollectionTrigger. This feels less like something you can customize.
from copycat.
@madjam I improved upon the snapshotting
branch pretty significantly. I'll write tests for it today and hopefully open a PR. I imagine it will take a few days of testing and cleanup to get it right since there's a lot that can go wrong here, but overall it's not particularly complicated.
What I did with the snapshotting
branch is extend that initial hack to provide full support for snapshots. What this means in the initial implementation is:
- State machines can take arbitrarily large snapshots (with a configurable size limit)
- Snapshots are sent in a separate
InstallRequest
whennextIndex
equals the leader's last snapshot index - The
InstallRequest
supports multi-part snapshots. By default, the leader will send something like 32KiB chunks at a time. EachInstallRequest
has a snapshot index, offset, andcomplete
boolean to indicate when the last snapshot chunk is sent. This IIRC is exactly what Diego did in LogCabin - The log compaction algorithm will now automatically remove commands where
compact
isSNAPSHOT
and a snapshot has been taken or received at a later index
In order to maintain safety for the session events system, snapshots taken of state machines are stored on disk but aren't completed until all events up to the snapshot index have been received. That ensures that entries that contribute unacknowledged events are not removed from disk. In practice, this should represent a very short time window.
When a snapshot is taken of the state machine and the snapshot is completed, rather than trying to find entries in the log that can be cleaned, this branch modifies the log compaction algorithm itself to recognize SNAPSHOT
entries during compaction. So, this should effectively have no impact on performance on the logging side. But there are potentially optimization a that can be made in the future wrt taking snapshots. There's no parallelism here that would allow state machine operations to continue while a snapshot is taking place, but I think that's an optimization that needs to be made after the feature is stabilized and released.
This is also still a little sloppy. As new features have been added recently, the code is feeling more and more disorganized. For instance, in this branch the ServerStateMachine
sets the snapahotIndex
for log compaction, but in other areas specific server states set the minorIndex
and majorIndex
. But again, if that's not cleaned up now, that internal stuff can be cleaned up over time.
from copycat.
@kuujo This is very cool. I will look forward to your PR request.
from copycat.
from copycat.
Related Issues (20)
- Support concurrent access of logs
- Remove indexing in logs
- Relax threading model in transports
- Remove serialization
- Create state machine test framework
- Support relaxed client consistency models
- Use unique node identifiers HOT 2
- Catch up joining nodes in the PASSIVE state before promotion to ACTIVE HOT 1
- question for performance HOT 4
- Join request from leader is proxied back to itself
- Leader election sends LEADER state change event twice
- New leader triggers onLeave event
- Use trace log level for requests/responses
- Don't submit unnecessary keep-alives
- Nodes can't communicate over the internet HOT 1
- copycat 1.2.4 server fails build test HOT 2
- question about pessimistic case HOT 1
- Allow KeepAliveRequest to manage multiple sessions HOT 2
- Under certain disconnection circumstances, the copycat-client-io thread takes 100% CPU HOT 3
- ServerCommit GC Warning
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from copycat.