Hey hiredis-cluster team, We encountered an issue where hiredis-clus

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Great. It would also help to have an accurate deion of exactly what should trigg

Support cluster scale down (AWS) about hiredis-cluster HOT 13 CLOSED

Moo64c commented on September 25, 2024 2

Support cluster scale down (AWS)

from hiredis-cluster.

Comments (13)

zuiderkwast commented on September 25, 2024

Hi Amir!

Are you suggesting a failed reconnect should trigger update slot mapping?

from hiredis-cluster.

zuiderkwast commented on September 25, 2024

If there are many queries, the slot mapping will be updated and the issue will resolve itself after a while, but it's better to update slot mapping at failed reconnect too. Thanks for the idea!

We'll accept a PR. Maybe we'll do it ourselves, but I'm not sure when we'll have time.

from hiredis-cluster.

Moo64c commented on September 25, 2024

@zuiderkwast thanks for answering so quickly!

Failed reconnect is not exactly the description - in this case we have other connected nodes - and some are failing. This does sound like the situation where we can still connect to the configured host and let the cluster node mapping take over.
I will check if we can devote some time for a PR :)

from hiredis-cluster.

zuiderkwast commented on September 25, 2024

Great. It would also help to have an accurate description of exactly what should trigger updating the slot mapping.

from hiredis-cluster.

Moo64c commented on September 25, 2024

We will analyze it more throughly and add more info tomorrow. Here's a breakdown of possible cases:

When can we fail when running a command? (breaking down redis_cluster_command_execute - feel free to correct me if I'm wrong here)

Bad config (host/port, ssl, etc.)

Never connected to any node
Bad reply from cluster nodes or cluster slots command (i.e. wrong addresses)

Network issues

Slow network reaching timeout
Disconnected client

Node has gone away (cluster failover to replica)

Node connection will close, driver sends the command to arbitrary node, reply should be MOVED which will trigger reconnect.

Node has gone away (scale down), driver was connected before to the specific node

Like the previous option, with a chance to successfully run commands on the arbitrary node since it might have gotten the relevant shard (will not trigger a reconnect).

Node has gone away (scale down), driver was not connected before to the specific node

connection fails, connection returns NULL - goto error.

Our issue stems from the last option. The driver was connected to at least one node but not to the node that was scaled down. After the scale down it would try to connect to the node that no longer exists - reaching an edge case in the driver that would not try to re-map the cluster (as far as I can tell from the code). You would need a relatively low-traffic application instance.

@zuiderkwast If you would like to discuss further, we can also schedule a Webex (or other) meeting.

Thanks

from hiredis-cluster.

zuiderkwast commented on September 25, 2024

Yes, I think your analysis makes sense. As you might know, we have taken over the maintenance so we are not fully aware of why things are the way they are. This is what I think when I read through redis_cluster_command_execute:

    node = node_get_by_table(cc, (uint32_t)command->slot_num);
    if (node == NULL) {
        __redisClusterSetError(cc, REDIS_ERR_OTHER, "node get by table error");
        return NULL;
    }

This means the slot is not covered in the cluster. Maybe we should update the routes in this case too? But maybe not every time because it may not help, but only if we didn't do it within some period of time.

    c = ctx_get_by_node(cc, node);
    if (c == NULL) {
        return NULL;
    } else if (c->err) {
        node = node_get_which_connected(cc);

As you mentioned, if connect or reconnect fails, ctx_get_by_node returns NULL. Maybe we should fall through to the else and send the command to a random node here? (node_get_which_connected returns an arbitrary node.) If the command is sent to a random node, we'll get a MOVED redirect and then we'll update the routes.

I don't know how c->err can ever be true here. After ctx_get_by_node it seems impossible. Maybe it's a mistake. WDYT?

If we change this, should we change the pipelining functions (redisClusterAppendCommand family) and the async API functions (redisClusterAsyncCommand, actx_get_by_node) too in the same way? I'm not sure it's possible to fallback to a random node in these cases. @bjosv are you familiar with this code?

Some test for these scenarios would be useful. :-)

from hiredis-cluster.

bjosv commented on September 25, 2024

I don't know how c->err can ever be true here. After ctx_get_by_node it seems impossible. Maybe it's a mistake. WDYT?

My guess is that if the hiredis function redisReconnect() in ctx_get_by_node() fails c->err would be true.
I'm no sure of the benefits of reusing the hiredis context via redisReconnect() instead of starting from scratch via redisConnect() here.

If we change this, should we change the pipelining functions (redisClusterAppendCommand family) and the async API functions (redisClusterAsyncCommand, actx_get_by_node) too in the same way? I'm not sure it's possible to fallback to a random node in these cases. @bjosv are you familiar with this code?

It should be possible to fallback to a random node here as well. The async API have similar actions before sending commands and its mostly the responses that are handled different via the callback code.
It would be nice to have a more common handling of errors since the async seem to have its own additional band-aid
in its internal callback (which forwards to the user-callback or handles retires):

static void redisClusterAsyncRetryCallback(redisAsyncContext *ac, void *r,
                                           void *privdata) {
....
        // Note:
        // I can't decide which is the best way to deal with connect
        // problem for hiredis cluster async api.
        // But now the way is : when enough null reply for a node,
        // we will update the route after the cluster node timeout.
        // If you have a better idea, please contact with me. Thank you.
        // My email: [email protected]

from hiredis-cluster.

Moo64c commented on September 25, 2024

..As you might know, we have taken over the maintenance so we are not fully aware of why things are the way they are. ..

Thank you for your work for the community :) not many projects of this size have people familiar with every part of it, so no worries.

This means the slot is not covered in the cluster. Maybe we should update the routes in this case too?...

You are right, but I'm not sure this is a possibility if we already have a connected cluster context unless we somehow got bad inputs (like a slot number over the limit or a partial node mapping parse results).

I don't know how c->err can ever be true here. After ctx_get_by_node it seems impossible. Maybe it's a mistake. WDYT?

I think the redis connection could have an error from a previous call. In a scale down scenario for a connected client, it might be a socket close error after trying to read from it (educated guess).

As you mentioned, if connect or reconnect fails, ctx_get_by_node returns NULL. Maybe we should fall through to the else and send the command to a random node here? (node_get_which_connected returns an arbitrary node.) If the command is sent to a random node, we'll get a MOVED redirect and then we'll update the routes.

It's a good option, but consider a scale down from two nodes to one node in the cluster - the driver will keep going to a "randomly" selected node - the only one left will have all the shards in the cluster. We will not get a MOVED reply and cluster reroute will not trigger.

This is not really terrible since in node_get_which_connected we only run the ping command if the connection exists and has no error - so a max of one ping until we dismiss the node without a real performance degradation. However keeping a weird state in the driver might have other issues. Among them could be losing support for pipelining or other special command flows if they do not have the same fallback scenario.

Also I was going to go into detail with the AWS scale down scenario (DNS/address behavior, etc.), is it still necessary? I feel like the root of the problem has been communicated :)

from hiredis-cluster.

zuiderkwast commented on September 25, 2024

if the hiredis function redisReconnect() in ctx_get_by_node() fails c->err would be true

@bjosv Right, connect/reconnect is the difference between 5 and 3-4 in @Moo64c's list. It's weird that the function returns NULL in one case and c->err in the other.

I'm no sure of the benefits of reusing the hiredis context via redisReconnect() instead of starting from scratch via redisConnect() here.

redisReconnect (hiredis) looks like a small optimization compared to doing redisFree() followed by redisConnect(). Some allocations are reused, but that's nothing compared to what a reconnect costs. I'm fine with scrapping reconnect. Or we can just make sure we handle reconnect errors the same way as connect errors.

from hiredis-cluster.

zuiderkwast commented on September 25, 2024

consider a scale down from two nodes to one node in the cluster - the driver will keep going to a "randomly" selected node - the only one left will have all the shards in the cluster. We will not get a MOVED reply and cluster reroute will not trigger.

@Moo64c A Redis cluster must have 3 masters or more, at least when you start it. I don't think it's possible to scale down below 3.

Either way, we don't have to wait for the MOVED. We can update the slot routes when we get a reconnect failure. I'm fine with either solution.

from hiredis-cluster.

Moo64c commented on September 25, 2024

Yeah, so failing to connect to a node that we got from a cluster mapping should trigger the slot routing (second solution you suggested). It is a clear indication something changed.

I've also verified with our OPS team - it is possible to go down to a single master (in our case, with one replica). Of course this is a tradeoff in other parameters, but cost is a major one :)

from hiredis-cluster.

lior-parsi commented on September 25, 2024

Hi :)
First, thank you for your work
We are experiencing this issue as well, do you know when this fix will be added to the driver?

from hiredis-cluster.

bjosv commented on September 25, 2024

We have #87 to mitigate this problem, and we aim to get the fix in soon (maybe with more tests..).

from hiredis-cluster.

Support cluster scale down (AWS) about hiredis-cluster HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent