I am trying a simple resilience test. Is this a valid scenario? I am

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

New Node is not created when cloud VM is destroyed about machine-api-operator HOT 3 CLOSED

openshift commented on September 23, 2024

New Node is not created when cloud VM is destroyed

from machine-api-operator.

Comments (3)

enxebre commented on September 23, 2024 1

@sohrab- thanks for reporting feedback. The machine API is your source of truth for managing machines therefore terminate nodes via console is not recommended. As a workaround for the inconsistent status you have two choices:
1 - To add the machine.openshift.io/exclude-node-draining annotation to the orphan machine so then deletion will go through (this is fixed in new versions so there won't be need to add the annotation)
2 - Manually approve the new generated CSR via kubectl certificate approve so new instance becomes a node. This behaviour is likely to change soon in favour of no new instance being generated for the old machine object. So we encourage to delete corrupted a machines and let the MachineSet to create a new one.

If this is expected, is there any other way to protect against worker loss automatically (self-healing)?

There's a "tech preview" feature "Machine health checking" where you can targets machines that you want to be self-healed and automatically recreated when a problem is detected

from machine-api-operator.

vikaschoudhary16 commented on September 23, 2024

@sohrab- This a known behavior. Removing machine by deleting the cloud instance is not expected way. Machine should be removed through machineset only.

Now explanation for the behavior:
Newly created instance could not get registered as k8s node. Why? Because csr approver, https://github.com/openshift/cluster-machine-approver, did not approve the csr for the new instance as a security policy. And because it did not become node, machine object was also not updated to point to the new instance.
Why old machine not getting deleted? There is a finalizer on machine which does not let the machine deletion until succesful response from the cloud instance deletion api. In this case, i guess that would be returning an error because you deleted it from console and therefore machine not getting deleted.

/close

from machine-api-operator.

sohrab- commented on September 23, 2024

Thanks, @vikaschoudhary16.

A few more questions:

When you say "known behaviour", do you mean this is working as designed or a known bug?
If this is expected, is there any other way to protect against worker loss automatically (self-healing)?
How can we clean-up those orphan Machines now? The underlying instances are gone. Do I need to get into etcd for this?

from machine-api-operator.

Recommend Projects

New Node is not created when cloud VM is destroyed about machine-api-operator HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent