Giter Club home page Giter Club logo

fence-agents-remediation's People

Contributors

beekhof avatar clobrano avatar jcanocan avatar k-keiichi-rh avatar mshitrit avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar razo7 avatar slintes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fence-agents-remediation's Issues

Unable to fence nodes with `fence_azure_arm` agent

Hi!

I'm currently playing around with FAR with Azure VMs. I've been able to install NHC, FAR in an OCP 4.13 cluster, to create the FAR Template and start the remediation process. This is the FAR Template I'm currently using:

apiVersion: fence-agents-remediation.medik8s.io/v1alpha1
kind: FenceAgentsRemediationTemplate
metadata:
  name: fenceagentsremediationtemplate-default
  namespace: openshift-operators
spec:
  template:
    spec:
      sharedparameters:
        '--action': reboot
        '-l': ea6bxxx
        '-p': y~xxx
        '--resourceGroup': jcano-cluster-mfxww-rg
        '--tenantId': 60xxx
        '--subscriptionId': 89xxx
      nodeparameters:
        '--plug=':
          jcano-cluster-mfxww-master-0: jcano-cluster-mfxww-master-0
          jcano-cluster-mfxww-master-1: jcano-cluster-mfxww-master-1
          jcano-cluster-mfxww-master-2: jcano-cluster-mfxww-master-2
          jcano-cluster-mfxww-worker-germanywestcentral1-b58kw: jcano-cluster-mfxww-worker-germanywestcentral1-b58kw
          jcano-cluster-mfxww-worker-germanywestcentral2-h6zwd: jcano-cluster-mfxww-worker-germanywestcentral2-h6zwd
          jcano-cluster-mfxww-worker-germanywestcentral3-xd7h5: jcano-cluster-mfxww-worker-germanywestcentral3-xd7h5
      agent: fence_azure_arm

I've tried with fence_azure_arm tool standalone locally to restart a faulty VM where an OCP node is running. For that purpose, I stopped the kubelet process to bring a node to an unhealthy state, and it worked but requires a tiny modification, see: Azure/azure-sdk-for-python#30983 (comment)

Nevertheless, it is not working along with FAR operator. It throws the following errors:

2023-10-10T15:08:07.128294848Z	INFO	controllers.FenceAgentsRemediation	Begin FenceAgentsRemediation Reconcile
2023-10-10T15:08:07.128341449Z	INFO	controllers.FenceAgentsRemediation	Check FAR CR's name
2023-10-10T15:08:07.138883921Z	INFO	controllers.FenceAgentsRemediation	Finalizer was added	{"CR Name": "jcano-cluster-mfxww-worker-germanywestcentral2-h6zwd"}
2023-10-10T15:08:07.138914222Z	INFO	controllers.FenceAgentsRemediation	Updating Status Condition	{"processingConditionStatus": "True", "fenceAgentActionSucceededConditionStatus": "Unknown", "succededConditionStatus": "Unknown", "reason": "RemediationStarted", "LastUpdateTime": "2023-10-10 15:08:07.138913322 +0000 UTC m=+23184.695547222"}
2023-10-10T15:08:07.151777431Z	INFO	controllers.FenceAgentsRemediation	Finish FenceAgentsRemediation Reconcile
2023-10-10T15:08:07.151923434Z	INFO	controllers.FenceAgentsRemediation	Begin FenceAgentsRemediation Reconcile
2023-10-10T15:08:07.151954534Z	INFO	controllers.FenceAgentsRemediation	Check FAR CR's name
2023-10-10T15:08:07.152025935Z	INFO	controllers.FenceAgentsRemediation	Try adding FAR (Medik8s) remediation taint	{"Fence Agent": "fence_azure_arm", "Node Name": "jcano-cluster-mfxww-worker-germanywestcentral2-h6zwd"}
2023-10-10T15:08:07.170359134Z	INFO	taints	Taint was added	{"taint effect": "NoExecute", "taint list": [{"key":"node.kubernetes.io/unreachable","effect":"NoSchedule","timeAdded":"2023-10-10T15:03:06Z"},{"key":"node.kubernetes.io/unreachable","effect":"NoExecute","timeAdded":"2023-10-10T15:03:12Z"},{"key":"medik8s.io/fence-agents-remediation","effect":"NoExecute","timeAdded":"2023-10-10T15:08:07Z"}]}
2023-10-10T15:08:07.170395735Z	INFO	controllers.FenceAgentsRemediation	Fetch FAR's pod
2023-10-10T15:08:07.170512137Z	INFO	controllers.FenceAgentsRemediation	Combine fence agent parameters	{"Fence Agent": "fence_azure_arm", "Node Name": "jcano-cluster-mfxww-worker-germanywestcentral2-h6zwd"}
2023-10-10T15:08:07.170539037Z	INFO	controllers.FenceAgentsRemediation	Execute the fence agent	{"Fence Agent": "fence_azure_arm", "Node Name": "jcano-cluster-mfxww-worker-germanywestcentral2-h6zwd"}
2023-10-10T15:08:07.340974815Z	ERROR	executer	Failed to run exec command	{"stdout": "", "stderr": "time=\"2023-10-10T15:08:07Z\" level=error msg=\"exec failed: unable to start container process: exec: \\\"fence_azure_arm\\\": executable file not found in $PATH\"\n", "error": "command terminated with exit code 255"}
github.com/medik8s/fence-agents-remediation/pkg/cli.executer.Execute
	/remote-source/app/pkg/cli/cliexecuter.go:92
github.com/medik8s/fence-agents-remediation/controllers.(*FenceAgentsRemediationReconciler).Reconcile
	/remote-source/app/controllers/fenceagentsremediation_controller.go:203
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226
2023-10-10T15:08:07.341030816Z	ERROR	controllers.FenceAgentsRemediation	Fence Agent response was a failure	{"CR's Name": "jcano-cluster-mfxww-worker-germanywestcentral2-h6zwd", "error": "command terminated with exit code 255"}
github.com/medik8s/fence-agents-remediation/controllers.(*FenceAgentsRemediationReconciler).Reconcile
	/remote-source/app/controllers/fenceagentsremediation_controller.go:206
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226
2023-10-10T15:08:07.350733575Z	INFO	controllers.FenceAgentsRemediation	Finish FenceAgentsRemediation Reconcile

It looks like FAR it's not able to find the fence_azure_arm tool in PATH for its purpose.

Environment:

  • OCP version: 4.13
  • NHC version: 0.6.0
  • FAR version: 0.2.0

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.