Hi to all, I have a problem with a haproxy instance (1.9.4) in front of a redis cluste

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Server goes UP without tcp-check if it resolves again about haproxy HOT 12 OPEN

Leen15 commented on September 7, 2024 6

Server goes UP without tcp-check if it resolves again

from haproxy.

Comments (12)

wtarreau commented on September 7, 2024 6

From what I'm seeing in the code, the resolver sets the server into the maintenance state, which makes sense and matches what appears in Luca's logs above. So what puts the server UP is that it leaves maintenance mode. After all being able to configure this state when leaving maintenance is more general and not specific to the resolver. For instance, an admin who disabled a server for an upgrade could want it to start with checks first when turning it back on.

The code responsible for this is in srv_update_status(), in this block :
else if ((s->cur_admin & SRV_ADMF_MAINT) && !(s->next_admin & SRV_ADMF_MAINT)) {
More specifially this part :

                if (s->check.state & CHK_ST_ENABLED) {
                        s->check.state &= ~CHK_ST_PAUSED;
                        check->health = check->rise; /* start OK but check immediately */
                }

I'm seeing that we already support switching out of maintenance to the down or starting state, it's the case when the server tracks another one, in which case it'll turn to this other server's state. So for me this proves that all the logic to handle the transition exists and is safe to reuse. Thus we could have this "init-state" config element to change this behaviour.

I'm tagging this as a good first issue in case someone is interested in jumping into this development which seems quite accessible to me.

from haproxy.

git001 commented on September 7, 2024

What are the settings in your resolvers block?

from haproxy.

Leen15 commented on September 7, 2024

global
  daemon
  maxconn 1000

resolvers kubedns
  nameserver namesrv1 kube-dns.kube-system.svc.cluster.local:53
  resolve_retries  3
  timeout retry 1s
  hold other 1s
  hold refused 1s
  hold nx 1s
  hold timeout 1s
  hold valid 1s

defaults REDIS
  mode tcp
  timeout connect  4s
  timeout server  30s
  timeout client  30s
  option  log-health-checks

frontend ft_redis
  bind :6379 name redis
  default_backend bk_redis

backend bk_redis
  option tcp-check
  tcp-check send AUTH\ RedisTest\r\n
  tcp-check expect string +OK
  tcp-check send PING\r\n
  tcp-check expect string +PONG
  tcp-check send info\ replication\r\n
  tcp-check expect string role:master
  tcp-check send QUIT\r\n
  tcp-check expect string +OK
  default-server  check resolvers kubedns inter 1s downinter 1s fastinter 1s fall 1 rise 30 maxconn 330 no-agent-check on-error mark-down
  server redis-0 redis-ha-server-0.redis-ha.redis-ha.svc.cluster.local:6379
  server redis-1 redis-ha-server-1.redis-ha.redis-ha.svc.cluster.local:6379
  server redis-2 redis-ha-server-2.redis-ha.redis-ha.svc.cluster.local:6379

from haproxy.

Leen15 commented on September 7, 2024

It's ok that it changes IP, because a new istance exists. It's not normal that haproxy sets it as UP before do it the tcp-check and respect the rise parameter...

from haproxy.

lukastribus commented on September 7, 2024

It is expected behavior that a new servers are actually by default up, not down, before the first health check is done.

This of course does not work with the configuration you are using, because here you are basically misusing the health check system for application master/slave logic - which is not the use-case it's designed for.

I can see how it would be useful to be able to configure the default state for new servers coming from DNS though.

@bedis any opinion about this?

from haproxy.

Leen15 commented on September 7, 2024

I dont understand how this is possible... If the server is DOWN first of all it should follow the "rise" parameter logic before go UP.. No?
It's the same of when the master role pass to another server... The replica is down (but the dns is ok), it follows the rise logic for pass to UP so why the DNS resolver has an higher priority compared to the rise one?

from haproxy.

lukastribus commented on September 7, 2024

Servers are UP by default.

When health checks are not used, all servers are UP. When health checks are used, but did not start yet or the status is not yet determined, then the current server status will be UP.

This is documented and expected behavior.

"rise" is about health check behavior. Not about pre health-check behavior.

from haproxy.

wtarreau commented on September 7, 2024

We wanted to have an "init-state {up|down}" setting a while ago when developing the server-template stuff, and we figured that "init-addr none" already covered that so it was not implemented. But here is an example where it proves this is not the case. While we were originally focused on the server state when the process starts, we didn't think about the state once the server has an address again. I'm wondering if it's a resolver thing or a status thing. I don't know what happens when we set an IP on a server from the CLI, does it automatically go up. If not then we could address this by an extra resolver option. If it does, it's a wider thing to address : we need to set the server state after it is assigned an address.

from haproxy.

sveniu commented on September 7, 2024

I'm hitting the same issue. My use case is for often-changing cloud infrastructure in AWS:

Running HAProxy on AWS ECS Fargate.
Running server backends on AWS ECS Fargate, too.
Backend ECS tasks (the containers, basically) register their IPs using AWS Cloud Map (service discovery).
Backends end up being available on service.sd.example.com, resolving to all IPs.
haproxy.cfg uses server-template mybackend 16 ... to handle a maximum of 16 backends.
Backend ECS tasks go up and down as new deploys and autoscaling happens.
HAProxy does the "right thing", reusing backends by detecting that IPs change.
When detecting new backends via DNS, they're immediately marked as UP.
The TCP check then follows a bit after, marking them down since they're still starting up.
The TCP check succeeds a bit after that, marking them up again.

Brief excerpt from haproxy.cfg:

resolvers dnsserver
  parse-resolv-conf
  hold valid 1s

defaults
  default-server init-addr none resolvers dnsserver weight 50 check inter 10s fastinter 2s fall 3 rise 20

listen myservice
  option tcp-check
  tcp-check connect
  tcp-check send-binary ...
  server-template mybackend 16 myservice.sd.example.com:80

from haproxy.

rayitopy commented on September 7, 2024

I find this behavior very annoying too. I think that assuming a server is UP without performing the healthcheck is not a good thing.

from haproxy.

VigneshSP94 commented on September 7, 2024

@wtarreau can I send a fix for this?

from haproxy.

wtarreau commented on September 7, 2024

I don't have this one in mind anymore, but if you see that it stlil affects 2.8-dev, feel free to give it a try. We're going to tighten the rule of merging for 2.8 now but if it's a simple one it has its chance. Thanks!

from haproxy.

Server goes UP without tcp-check if it resolves again about haproxy HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent