Comments (13)
@juanjovazquez, on a related note, what would ConsulCoordination
do if you stop a system and start it again quickly, i.e. before Consul has removed the respective entry? Currently EtcdCoordination
fails in this case, because it expects a 201, but it gets a 200. I wonder if failing is the right thing here – probably not.
from constructr.
@hseeberger, in the process of checking this out, we've discovered a bug related to the way we're creating the consul sessions. It's easy to fix. We'll raise a PR ASAP. Please, stay tuned.
from constructr.
@hseeberger, answering your question about stopping and quickly starting the system, ConsulCoordination
is not failing, but I think it should. Looking into addSelf
function, as the ttl associated to the previous session is not outdated, the new session cannot acquire the key, so the ConstructrMachine
will have a sessionId
that is not the one who owns the key. After the ttl, the previous session will expire and the key will be deleted so that the machine state will be inconsistent.
This can be easily fixed by doing something similar as what it's done in the lock
function, i.e. to fail in case of the key is already acquired by a previous session, and come back probably to BeforeGettingNodes
to try again.
However, in the process of study this edge case, we've realized that other edge cases can also occur, e.g. if the node is already in the list after a reboot, how do we know what's its state in the cluster?. I've seen that you've created this related issue #31 , so I think you must be reaching similar conclusions. We're now experimenting this edge cases in the context of a demo project trying to better understand all trouble can be derived from coordination between Consul and Akka cluster.
from constructr.
I think that stopping and quickly starting a node again should not cause any issues. Hence both EtcdCoordination
and ConsulCoordination
must not fail in addingSelf
if there's already an entry. For EtcdCoordination
I can easily fix this by allowing Created
and OK
.
@juanjovazquez, what do you think is needed for Consul?
from constructr.
I have created a new issue for that: #38.
from constructr.
@hseeberger, sorry for the delay. Unfortunately, the way Consul deals with locks and ttls makes this harder (i'm starting to feel guilty for bring this drawbacks here :)). Basically, if we don't want to change the current workflow in ConstructrMachine
we'd have to do the following:
(1) Catch the failure in addingSelf
thanks to the fact that the response is 200 OK but carries a false
in the body.
(2) In that case, retrieve the sessionId
that (still) owns the lock with a GET request to the key. The response from Consul contains the sessionId
and looks as follows:
[
{
"CreateIndex": 100,
"ModifyIndex": 200,
"LockIndex": 200,
"Key": "zip",
"Flags": 0,
"Value": "dGVzdA==",
"Session": "adf4238a-882b-9ddc-4a9d-5b6758e4159e"
}
]
(3) Refresh the obtained session.
(4) Manage the response of this last refresh due to the fact that in the meantime the session could have expired. If so, act as usual creating a new session and acquiring the key with this new session.
(5) As usual, raise the SelfAdded
event with the sessionId
that really owns the lock, the previous or a new one.
Please, tell me if you like this approach.
We could avoid the step 2 if we could keep more information in the Node Type after getNodes
. For example, our Node type could be something as follows:
type Node[B <: Coordination.Backend] = (Address, B#Context)
Please, consider this just an example to illustrate the idea. That way, we could keep the sessionId
upfront and wouldn't need to retrieve it later.
Of course, it would be much easier for us, just to come back to BeforeGettingNodes
in case of failure with the consequence of having to wait until the lock expires.
Please, tell me your point of view before we start with the implementation.
from constructr.
@juanjovazquez, wait my friend. So Consul returns 200 in both cases, i.e. when self was added and when it was already there? That's perfect, not?
from constructr.
@hseeberger, yes, it is, but the devil is in the details. We cannot silently continue with the new generated session, we need to retrieve the previous one and refresh it. If we just do the former, we wouldn't be able to associate a proper session to the key and so the expiration mechanism wouldn't work. To be clear, the first session would block the second one and the key would disappear after the ttl.
Is it clearer now?.
from constructr.
Ah, now I see. I'm starting to wonder whether I like Consul or not ;-)
Your approach makes sense to me. Rather sooner than later we need an extensive test suite for, then we'll see.
from constructr.
Yes, Consul is way more involved than etcd. However the user has some benefits in return. For example, think that you could associate a single session to a set of keys setting the same expiration behavior to all of them. That gives you a lot of power implementing different coordination and locking algorithms.
OTOH, If you agree with the approach I'll raise a ticket to face it. We'll improve the test suite also. Thanks!.
from constructr.
ACK
from constructr.
The ticket dealing with the issue discussed here #42.
from constructr.
Closed somewhere ;-)
from constructr.
Related Issues (20)
- Setting etcd-port via environment HOT 2
- Get rid of retry limit for communication with coordination service HOT 15
- When Connecting to another akka using TCP it returns exception can not bind to the port HOT 1
- Publish 0.16.0 to maven central HOT 2
- Java API HOT 3
- Multiple ActorSystems and ClusterActorRefProvider cause duplicate init of akka-remoting
- Fixed path of cluster nodes in EtcdCoordination HOT 3
- Risk of island creation if max-nr-of-seed-nodes < actual number of seeds and host is reused HOT 2
- Can't build project due to java.lang.NoSuchMethodError
- akka.remote.netty.tcp.bind-hostname vs hostname HOT 1
- Release for 2.12 HOT 1
- Add close() method to Coordination for resource cleanup
- Join timing out in handover scenario HOT 8
- Delete key-value entries from backend if possible HOT 3
- Propagate ConstructR failures to user-space HOT 2
- Discussion: Given Akka Cluster Bootstrap, do we need ConstructR any longer HOT 21
- Deadlock when error during ConstructrMachine initialization
- Make the path in etcd customisable
- constructr 1.8.1 not available for scala 2.11 HOT 2
- Retirement HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from constructr.