Giter Club home page Giter Club logo

Comments (13)

hseeberger avatar hseeberger commented on July 20, 2024

@juanjovazquez, on a related note, what would ConsulCoordination do if you stop a system and start it again quickly, i.e. before Consul has removed the respective entry? Currently EtcdCoordination fails in this case, because it expects a 201, but it gets a 200. I wonder if failing is the right thing here – probably not.

from constructr.

juanjovazquez avatar juanjovazquez commented on July 20, 2024

@hseeberger, in the process of checking this out, we've discovered a bug related to the way we're creating the consul sessions. It's easy to fix. We'll raise a PR ASAP. Please, stay tuned.

from constructr.

juanjovazquez avatar juanjovazquez commented on July 20, 2024

@hseeberger, answering your question about stopping and quickly starting the system, ConsulCoordination is not failing, but I think it should. Looking into addSelf function, as the ttl associated to the previous session is not outdated, the new session cannot acquire the key, so the ConstructrMachine will have a sessionId that is not the one who owns the key. After the ttl, the previous session will expire and the key will be deleted so that the machine state will be inconsistent.

This can be easily fixed by doing something similar as what it's done in the lock function, i.e. to fail in case of the key is already acquired by a previous session, and come back probably to BeforeGettingNodes to try again.

However, in the process of study this edge case, we've realized that other edge cases can also occur, e.g. if the node is already in the list after a reboot, how do we know what's its state in the cluster?. I've seen that you've created this related issue #31 , so I think you must be reaching similar conclusions. We're now experimenting this edge cases in the context of a demo project trying to better understand all trouble can be derived from coordination between Consul and Akka cluster.

from constructr.

hseeberger avatar hseeberger commented on July 20, 2024

I think that stopping and quickly starting a node again should not cause any issues. Hence both EtcdCoordination and ConsulCoordination must not fail in addingSelf if there's already an entry. For EtcdCoordination I can easily fix this by allowing Created and OK.

@juanjovazquez, what do you think is needed for Consul?

from constructr.

hseeberger avatar hseeberger commented on July 20, 2024

I have created a new issue for that: #38.

from constructr.

juanjovazquez avatar juanjovazquez commented on July 20, 2024

@hseeberger, sorry for the delay. Unfortunately, the way Consul deals with locks and ttls makes this harder (i'm starting to feel guilty for bring this drawbacks here :)). Basically, if we don't want to change the current workflow in ConstructrMachine we'd have to do the following:

(1) Catch the failure in addingSelf thanks to the fact that the response is 200 OK but carries a false in the body.
(2) In that case, retrieve the sessionId that (still) owns the lock with a GET request to the key. The response from Consul contains the sessionId and looks as follows:

[
  {
    "CreateIndex": 100,
    "ModifyIndex": 200,
    "LockIndex": 200,
    "Key": "zip",
    "Flags": 0,
    "Value": "dGVzdA==",
    "Session": "adf4238a-882b-9ddc-4a9d-5b6758e4159e"
  }
]

(3) Refresh the obtained session.
(4) Manage the response of this last refresh due to the fact that in the meantime the session could have expired. If so, act as usual creating a new session and acquiring the key with this new session.
(5) As usual, raise the SelfAdded event with the sessionId that really owns the lock, the previous or a new one.

Please, tell me if you like this approach.

We could avoid the step 2 if we could keep more information in the Node Type after getNodes. For example, our Node type could be something as follows:

type Node[B <: Coordination.Backend] = (Address, B#Context)

Please, consider this just an example to illustrate the idea. That way, we could keep the sessionId upfront and wouldn't need to retrieve it later.

Of course, it would be much easier for us, just to come back to BeforeGettingNodes in case of failure with the consequence of having to wait until the lock expires.

Please, tell me your point of view before we start with the implementation.

from constructr.

hseeberger avatar hseeberger commented on July 20, 2024

@juanjovazquez, wait my friend. So Consul returns 200 in both cases, i.e. when self was added and when it was already there? That's perfect, not?

from constructr.

juanjovazquez avatar juanjovazquez commented on July 20, 2024

@hseeberger, yes, it is, but the devil is in the details. We cannot silently continue with the new generated session, we need to retrieve the previous one and refresh it. If we just do the former, we wouldn't be able to associate a proper session to the key and so the expiration mechanism wouldn't work. To be clear, the first session would block the second one and the key would disappear after the ttl.

Is it clearer now?.

from constructr.

hseeberger avatar hseeberger commented on July 20, 2024

Ah, now I see. I'm starting to wonder whether I like Consul or not ;-)

Your approach makes sense to me. Rather sooner than later we need an extensive test suite for, then we'll see.

from constructr.

juanjovazquez avatar juanjovazquez commented on July 20, 2024

Yes, Consul is way more involved than etcd. However the user has some benefits in return. For example, think that you could associate a single session to a set of keys setting the same expiration behavior to all of them. That gives you a lot of power implementing different coordination and locking algorithms.

OTOH, If you agree with the approach I'll raise a ticket to face it. We'll improve the test suite also. Thanks!.

from constructr.

hseeberger avatar hseeberger commented on July 20, 2024

ACK

from constructr.

juanjovazquez avatar juanjovazquez commented on July 20, 2024

The ticket dealing with the issue discussed here #42.

from constructr.

hseeberger avatar hseeberger commented on July 20, 2024

Closed somewhere ;-)

from constructr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.