Giter Club home page Giter Club logo

kubernetes-crd's Introduction

Kubernetes CRD/operator for running Chaos Toolkit experiments on-demand

Build Docker Pulls

This repository contains a Kubernetes operator to control Chaos Toolkit experiments on-demand by submitting custom-resource objects.

Read its documentation.

Contribute

If you wish to contribute more functions to this package, you are more than welcome to do so. Please fork this project, make your changes following the usual PEP 8 code style, add appropriate tests and submit a PR for review.

The Chaos Toolkit projects require all contributors must sign a Developer Certificate of Origin on each commit they would like to merge into the master branch of the repository. Please, make sure you can abide by the rules of the DCO before submitting a PR.

kubernetes-crd's People

Contributors

dmartin35 avatar hchenxa avatar lawouach avatar tgpski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-crd's Issues

Add support for custom POD labels

When defining the ChaosExperiment CRO, we could add the possibility to add custom labels to the POD that will be created.

This can be used to add the experiment/verification ID/URL as part of labels.

It can then be used to retrieve a POD for a specific experiment.
This can make a link between the experiment IDs/URLs from the console, with the ChaosExperiment resources created.

Add shortcut to support `chaos verify`command

From now on, the chaos experiment template only support the chaos run command.

We can override this command, but it requires to pass the entire pod template, which is a bit complex just to verify instead of run.

I suggest we add at chaos experiment spec level an option to easily switch to verify mode runAsVerification: true

---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
  name: my-chaos-verif
  namespace: chaostoolkit-crd
spec:
  namespace: chaostoolkit-run
  pod:
    image:
      name: chaosiq/chaostoolkit
    runAsVerification: true

Cannot create experiment when chaosArgs contains empty value

When using a manifest with a trailing empty value as chaosArgs, as in example below,

spec:
  namespace: chaostoolkit-run
  pod:
    image: chaosiq/chaostoolkit
    chaosArgs:
    - --verbose
    - run
    - http://console.chaosiq.dev/assets/experiments/13c95573-db46-4472-b1a8-c335102e4cb6.json
    - 

This manifest is yaml-compliant, and can be loaded. However, once loaded, this produces a None in the chaosArgs list

{ 
    ..., 
    'chaosArgs': ['--verbose', 'run', 'http://console.chaosiq.dev/assets/experiments/13c95573-db46-4472-b1a8-c335102e4cb6.json', None], 
    ...
}

which causes an exception in the code, blocking from creating/running the experiment/ctk pod.

 Error  Logging  49s   kopf  Handler 'create_chaos_experiment' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 291, in execute_handler_once
    lifecycle=lifecycle,  # just a default for the sub-handlers, not used directly.
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 380, in invoke_handler
    **kwargs,
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/invocation.py", line 117, in invoke
... "controller.py", line 101, in create_chaos_experiment
    pod_tpl = await create_pod(v1, cm, spec, ns, name_suffix, meta)
  File "controller.py", line 386, in run
    return await loop.run_in_executor(executor, pfunc)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "controller.py", line 623, in create_pod
    f"Override default chaos command arguments: "
TypeError: sequence item 3: expected str instance, NoneType found

Create unique namespace when not specified

When the namespace name is not specified, we shall create a unique namespace, with the generated suffix, same as pods and other resources.

If the user want to use a common namespace, it is his responsibility to provide the name of this namespace.
We can not know in advance whether the user wants all its experiments running in the same namespace or in a unique namespace.

Load kubernetes secrets as a file

Currently, the only solution to load a kubernetes secret as a volume-mounted file, is to redefine the full pod template.

Do we want to ease the secrets in the CTK definition ? or leave this as advanced stuff

To be discussed...

Re-run an experiment

currently, when applying the same experiment definition twice, it does nothing on the second apply.

$ k apply -f examples/basic.yaml 
namespace/chaostoolkit-run configured
configmap/chaostoolkit-experiment created
chaostoolkitexperiment.chaostoolkit.org/my-chaos-exp created

$ k apply -f examples/basic.yaml 
namespace/chaostoolkit-run unchanged
configmap/chaostoolkit-experiment unchanged
chaostoolkitexperiment.chaostoolkit.org/my-chaos-exp unchanged

How could we re-run the experiment a second time, whiteout deleting it ?
$ k -n chaostoolkit-crd delete ctk my-chaos-exp

Could we have something (api/command) at the experiment level to be able to retriever a new pod creation, re-start the existing pod ?

RBAC issue in chaostoolkit operator when update the serviceaccount

when create the chaosexperiments, find there have RBAC issue when update the serviceaccount by adding the finalizers in namespace chaostoolkit-run.

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 291, in execute_handler_once
    lifecycle=lifecycle,  # just a default for the sub-handlers, not used directly.
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 380, in invoke_handler
    **kwargs,
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/invocation.py", line 117, in invoke
    result = await fn(*args, **kwargs)  # type: ignore
  File "controller.py", line 55, in create_chaos_experiment
    await update_sa(v1, ns, sa_tpl)
  File "controller.py", line 386, in run
    return await loop.run_in_executor(executor, pfunc)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "controller.py", line 691, in update_sa
    api, ns, "service_account", body
  File "controller.py", line 685, in _update_namespaced_resource
    return api_func(name, ns, body)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 16730, in patch_namespaced_service_account
    (data) = self.patch_namespaced_service_account_with_http_info(name, namespace, body, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 16830, in patch_namespaced_service_account_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 344, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 178, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 403, in request
    body=body)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 286, in PATCH
    body=body)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'e83bbe05-fe8e-4e33-bc96-1455c8d03349', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Mon, 13 Jul 2020 13:35:33 GMT', 'Content-Length': '350'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"serviceaccounts \"chaostoolkit-qeguy\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , \u003cnil\u003e","reason":"Forbidden","details":{"name":"chaostoolkit-qeguy","kind":"serviceaccounts"},"code":403}

`chaostoolkit-run`namespace is not created by default / should it ?

When deploying the CRD, we create the chaostoolkit-crd workspace, but not the default one for running experiments chaostoolkit-run

is it auto created when creating a ChaosExperiment object via the CRD.

the issue title might be wrong/ not underlying all my issue/toughts:

  • shall we create the default -run workspace when installing the CRD ?
  • when creating multiple ChaosExperiment via the CRD. What happens when we delete one experiment? is it gonna delete the namespace if this experiment was the first and created it ? what happens for other experiments that were created after in the same workspace ? Will it remove all resources from other experiments as well ? or will it be forbidden to remove the namespace as other resources are not yet deleted (I don't think this is normal behavior)

Use ClusterRoles instead of Roles by default

Greetings,

I want to get some thoughts on an upstream PR I put together.

As part of initial technical exploration using chaostoolkit/kubernetes-crd, I wanted to quickly test the chaostoolkit-kubernetes extensions with the CRD. After building my own runner docker image, I still was unable to modify resources outside of the chaostoolkit-run namespace. I see this path as being critical for first line evaluation of the crd: users will incorporate the CRD and manifests into their clusters, and then want to be able to operate on any namespace from the experiment pod.

I updated the CRD schema to support a new spec property, clusterRoleBindNamespaces. Each namespace specified in the list will generate an additional RoleBinding, allowing the service account associated with an experiment to interact with the k8s api for the given namespace.

TGPSKI#2

Document behavior with default/custom namespaces

If the namespace is created by the user, it will not be deleted.

if a namespace is not specified, a unique namespace will be created, used to create resources in it , and deleted when deleting the chaos experiment

if the user specifies a namespace name, that does not exist, it will be deleted . Bare in mind that the first experiment creating the namespace is owner, and will delete it, event if other experiments are created into that same custom namespace. !!

A default namespace can be used to have all experiments into a single namespace

Change image name location in pod template

We currently have nested block for defining the ChaosExperiment POD image in the POD template,

spec:
  namespace: chaostoolkit-run
  pod:
    image:
      name: my/chaostoolkit

This sub level shall be removed , to have simpler description

spec:
  namespace: chaostoolkit-run
  pod:
    image: my/chaostoolkit

requires code, tests, readme & CTK doc to be updated

Load kubernetes secrets as environment

We currently can only load environment variables from K8s ConfigMap, that contains plain text key values. How can we use encrypted secrets and load them as variables for the experiment pod

two options:

  • reuse the env block, and defines a secretName property alongside the configMapName. but the enabled flag will be common, not possible to disable only one.
    If the secretName is not defined we don't use it. (no default value). this solution could also be used to load both secure/unsecure variables from both K8s ConfigMap & Secret
  pod:
    env:
      configMapName: chaostoolkit-env
      secretName: chaostoolkit-secrets
  • create a new block secret, similar to the env one.
  pod:
    secret:
      secretName: chaostoolkit-secrets
      enabled: true

this solution could be enabled with a default secret name, and be enabled while env is disabled. We could also add another property to indicate how to load the secrets , env. vars or mounted file.

CTK network policies are installed in the wrong namespace

When we apply the kustomization, the network policies metadata namespace is overridden from chaostoolkit-run to chaostoolkit-crd. Having those available in the wrong namespace.
Causing the CRD not properly working, neither the desired protection for CTK pods

when using the namespace property in a kustomization file, it overrides namespace for all resources underneath

see related issue kubernetes-sigs/kustomize#880

Allow running an experiment that already exists

When the CRD creates an Chaos Experiment, it cannot be created again, unless manually deleted first.

We could add a flag in the experiment definition, to indicate that the [existing] experiment can be deleted/updated with the new one.

This is mainly useful when we want to run multiple times a same experiment with simply applying it , without explicit deletion
kubectl delete ctk x & kubectl apply -f manifest.yaml

We could use a flag like

spec:
  replaceExisting: true

AttributeError: 'V1ConfigMap' object has no attribute 'setdefault'

It seems there is a problem in the ocntroller as it tries to make kopf adopt a V1ConfigMap instance rather than a dictionary.

def create_experiment_env_config_map(..)
    return v1.create_namespaced_config_map(namespace, body)

should be replaced with:

cm = v1.create_namespaced_config_map(namespace, body)
return cm.to_dict()

Auto delete Chaos Experiment once completed

Currently the Chaos Experiment stays in the list of the CRD until manual user deletion.
(NB: this also prevents from re-creating an experiment with the same name without deleting first)

We could use a TTL to indicate to removes the experiment (and optionally all related resources) after a certain amount of time after completion

Similar to pod TTL mechanism

Kustomize installation on travis does not work all the time

It 's strange behavior that sometimes kustomize cannot be installed on travis , as part of before_install CI.

Workaround is to re-start the build, and wait for it to succeed.

I followed official doc from Kustomize, to install it, via curl | bash.
We shall investigate, why it's not working 100% or find an alternative on install procedure

Schedule & repeat an experiment

We could leverage the K8s cronjob to create schedulable / repeatable experiments.

We could add a new block in the CTK object definition to indicate a cron-like scheduling

  • very simple definition (similar to K8s CrobJob schedule definition)
spec:
  schedule: "*/1 * * * *"
  • More complex structure (more easily extensible)
spec:
  schedule:
    enabled: true
    kind: cronJob
    value: "*/1 * * * *"

Do we want to also schedule at a particular day/time ?

can not write the log in chaos log path in container

refer the chaostoolkit-run pod log below, there have permission issue when writing logs in home path.

[root@hchenxa-inf kubernetes-crd]# oc logs -n chaostoolkit-run chaostoolkit-43yr5
Traceback (most recent call last):
  File "/usr/local/bin/chaos", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 1256, in invoke
    Command.invoke(self, ctx)
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.5/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/chaostoolkit/cli.py", line 68, in cli
    context_id=str(uuid.uuid4()))
  File "/usr/local/lib/python3.5/site-packages/chaostoolkit/logging.py", line 75, in configure_logger
    loglevel=logging.DEBUG)
  File "/usr/local/lib/python3.5/site-packages/logzero/__init__.py", line 416, in logfile
    rotating_filehandler = RotatingFileHandler(filename, mode=mode, maxBytes=maxBytes, backupCount=backupCount, encoding=encoding)
  File "/usr/local/lib/python3.5/logging/handlers.py", line 150, in __init__
    BaseRotatingHandler.__init__(self, filename, mode, encoding, delay)
  File "/usr/local/lib/python3.5/logging/handlers.py", line 57, in __init__
    logging.FileHandler.__init__(self, filename, mode, encoding, delay)
  File "/usr/local/lib/python3.5/logging/__init__.py", line 1014, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/local/lib/python3.5/logging/__init__.py", line 1043, in _open
    return open(self.baseFilename, self.mode, encoding=self.encoding)
PermissionError: [Errno 13] Permission denied: '/home/svc/chaostoolkit.log'

CRD does not respect custom pod image / location

Greetings,

I have been exploring the chaostoolkit k8s-crd as a chaos experiment runner for work. I have struggled a lot with the documentation and examples, and want to provide assistance if possible. Currently, I am unable to progress with any work on chaostoolkit because I cannot get a custom image to be deployed by the crd.

My setup:

  • Kubernetes 1.18 on x86_64
  • 1 Master / 3 Agents

I have setup the example configuration generic with RBAC. Everything matches the example manifests, except for the addition of a custom run image, image pull secrets, and an image pull policy. I've tried two different methods to get the crd watcher to pick up the new pod definition and create the correct pods.

    "chaostoolkit-pod.yaml" = <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: chaostoolkit
  labels:
    app: chaostoolkit
spec:
  restartPolicy: Never
  serviceAccountName: chaostoolkit
  containers:
  - name: chaostoolkit
    image: <MY_PRIVATE_REGISTRY>/chaostoolkit:latest
    imagePullPolicy: Always
    command:
    - /usr/local/bin/chaos
    args:
    - run
    - $(EXPERIMENT_PATH)
    env:
    - name: CHAOSTOOLKIT_IN_POD
      value: "true"
    - name: EXPERIMENT_PATH
      value: "/home/svc/experiment.json"
    envFrom:
    - configMapRef:
        name: chaostoolkit-env
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
    volumeMounts:
    - name: chaostoolkit-settings
      mountPath: /home/svc/.chaostoolkit/
      readOnly: true
    - name: chaostoolkit-experiment
      mountPath: /home/svc/experiment.json
      subPath: experiment.json
      readOnly: true
  volumes:
  - name: chaostoolkit-settings
    secret:
      secretName: chaostoolkit-settings
  - name: chaostoolkit-experiment
    configMap:
      name: chaostoolkit-experiment
  imagePullSecrets:
  - name: regcred
    EOF

In the chaostoolkit-pod.yaml definition, I have added my custom docker repository, as well as an image pull secret and pull policy. Otherwise, all other fields are the same.

Here's my example experiment:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: chaostoolkit-experiment8
  namespace: chaostoolkit-run
data:
  "experiment.json": |-
    {
        "version": "1.0.0",
        "title": "Moving a file from under our feet is forgivable",
        "description": "Our application should re-create a file that was removed",
        "steady-state-hypothesis": {
            "title": "The file must be around first",
            "probes": [
                {
                    "type": "probe",
                    "name": "file-must-exist",
                    "tolerance": true,
                    "provider": {
                        "type": "python",
                        "module": "os.path",
                        "func": "exists",
                        "arguments": {
                            "path": "/etc"
                        }
                    }
                }
            ]
        },
        "method": [
            {
                "ref": "file-must-exist"
            }
        ]
    }
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
  name: chaostoolkit-experiment8
  namespace: chaostoolkit-crd

When I create these resources, the crd watcher will create a pod in the chaostoolkit-run namespace. Every time, without fail, the created pod uses the "chaostoolkit/chaostoolkit" image. Looking into the config map for chaostoolkit-resources-templates, the yaml / json looks valid.

I checked the documentation, and thought perhaps I needed to add the custom spec to the experiment definition. Therefore, I retried multiple times with the following experiment definition:

---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
  name: chaostoolkit-experiment8
  namespace: chaostoolkit-crd
spec:
  namespace: chaostoolkit-run
  pod:
    image: <MY_PRIVATE_REGISTRY>/chaostoolkit:latest
    imagePullPolicy: Always
  imagePullSecrets:
  - name: regcred

Again, the created pods still use "chaostoolkit/chaostoolkit", and not my custom image.

Where do I go from here?

CRD fails to create experiment when using custom service account

When using a custom service account name, the CRD is not able to create the experiment. It fails when creating the role binding
The default role is not created, but the role binding is not looking for the same key in the spec

apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
spec:
  namespace: chaostoolkit-run
  role:
    name: chaostoolkit-pod

logs from CRD

[2020-10-01 14:52:24,819] kopf.objects         [INFO    ] Namespace 'chaostoolkit-run' already exists. Let's continue...
[2020-10-01 14:52:24,820] kopf.objects         [INFO    ] [chaostoolkit-crd/3e6a7ec7-2af3-4c8a-bb00-fbe3c30ea346] chaostoolkit resources will be created in namespace 'chaostoolkit-run'
[2020-10-01 14:52:24,820] kopf.objects         [INFO    ] [chaostoolkit-crd/3e6a7ec7-2af3-4c8a-bb00-fbe3c30ea346] Suffix for resource names will be '-1683h'
[2020-10-01 14:52:24,917] kopf.objects         [INFO    ] [chaostoolkit-crd/3e6a7ec7-2af3-4c8a-bb00-fbe3c30ea346] Created service account
[2020-10-01 14:52:25,024] kopf.objects         [ERROR   ] [chaostoolkit-crd/3e6a7ec7-2af3-4c8a-bb00-fbe3c30ea346] Handler 'create_chaos_experiment' failed permanently: Failed to bind to role: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b1fd3b39-2cea-4009-9e3c-014da2684108', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Thu, 01 Oct 2020 14:52:25 GMT', 'Content-Length': '290'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"roles.rbac.authorization.k8s.io \"chaostoolkit-experiment-1683h\" not found","reason":"NotFound","details":{"name":"chaostoolkit-experiment-1683h","group":"rbac.authorization.k8s.io","kind":"roles"},"code":404}

in the create_role function, we check for
role_name = cro_spec.get("role", {}).get("name")

while in the create_role_binding, we check for
role_bind_name = cro_spec.get("role", {}).get("bind")

It seems to me the second functional shall also check for role name key in the CRO spec

Missing OpenAPIV3 Validation for CRD

Using the example template for a chaos experiment and the custom resource, I get the following Validation error on an attempted apply.

error: error validating "chaos-experiments/first.yaml": error validating data: ValidationError(ChaosToolkitExperiment): unknown field "spec" in org.chaostoolkit.v1.ChaosToolkitExperiment; if you choose to ignore these errors, turn validation off with --validate=false

Setup:

  • Kubernetes 1.18 on x86_64
  • 1 Master / 3 Agents

CRD:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: chaosexperiments.chaostoolkit.org
spec:
  scope: Namespaced
  group: chaostoolkit.org
  versions:
    - name: v1
      served: true
      storage: true
  names:
    kind: ChaosToolkitExperiment
    plural: chaosexperiments
    singular: chaosexperiment
    shortNames:
      - ctk
      - ctks

Template:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: chaostoolkit-experiment8
  namespace: chaostoolkit-run
data:
  "experiment.json": |-
    {
        "version": "1.0.0",
        "title": "Moving a file from under our feet is forgivable",
        "description": "Our application should re-create a file that was removed",
        "steady-state-hypothesis": {
            "title": "The file must be around first",
            "probes": [
                {
                    "type": "probe",
                    "name": "file-must-exist",
                    "tolerance": true,
                    "provider": {
                        "type": "python",
                        "module": "os.path",
                        "func": "exists",
                        "arguments": {
                            "path": "/etc"
                        }
                    }
                }
            ]
        },
        "method": [
            {
                "ref": "file-must-exist"
            }
        ]
    }
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
  name: chaostoolkit-experiment8
  namespace: chaostoolkit-crd
spec:
  namespace: chaostoolkit-run
  pod:
    image: <PRIVATE_REGISTRY>/chaostoolkit:latest
  imagePullSecrets:
  - name: regcred

Digging into the validation error, the CRD expects a schema key in the versions array. Below is a stub showing the location of the necessary schema key.

  group: chaostoolkit.org
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object

Install the operator in another namespace than the default one

For some security reasons, some users are given a single namespace to play with , and cannot create another namespace.

The current kustomization prevents from installing the operator in other namespace than the default chaostoolkit-crd namespace.

To change the installation namespace, two steps are required:

  • changes the namespace in kustomization.yaml files (depending on the overlay)
  • changes the default command for the container in the deployment, to listen to another namespace (otherwise the operator will not work and remain silent)

Install with the kuberneted operator does not work

Hi,

I'm following the documentation, and can't get anything deployed.
I have cloned the project locally and just run the first command :

kustomize build manifests/overlays/generic-rbac | kubectl apply -f -
Error: accumulating resources: 2 errors occurred:
	* accumulateFile error: "accumulating resources from '../../base': '/Users/user/PycharmProjects/kubernetes-crd/manifests/base' must resolve to a file"
	* accumulateDirector error: "recursed accumulation of path '/Users/user/PycharmProjects/kubernetes-crd/manifests/base': accumulating resources: 2 errors occurred:\n\t* accumulateFile error: \"accumulating resources from './common/kustomization.yaml': missing metadata.name in object {map[apiVersion:kustomize.config.k8s.io/v1beta1 kind:Kustomization namespace:crd resources:[./serviceaccount.yaml ./crd.yaml ./configmap.yaml ./deployment.yaml]]}\"\n\t* loader.New error: \"error loading ./common/kustomization.yaml with git: url lacks host: ./common/kustomization.yaml, dir: got file 'kustomization.yaml', but '/Users/user/PycharmProjects/kubernetes-crd/manifests/base/common/kustomization.yaml' must be a directory to be a root, get: invalid source string: ./common/kustomization.yaml\"\n\n"

Anything I missed here ?

After a bit a digging, I pulled the install-in-another-namespace branche, and get something deployed by going into manifest directory, and running :

kustomize build base/common/ | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/chaosexperiments.chaostoolkit.org unchanged
serviceaccount/chaostoolkit-crd created
configmap/chaostoolkit-resources-templates created
deployment.apps/chaostoolkit-crd created

But then the pod is not starting :

kubectl -n crd get pods
NAME                                READY   STATUS   RESTARTS   AGE
chaostoolkit-crd-6b9cf74d69-btzhk   0/1     Error    2          44s

here's the log :

kopf.reactor.running [ERROR ] Root task 'watcher of chaosexperiments.chaostoolkit.org' is failed: 403, message='chaosexperiments.chaostoolkit.org is forbidden: User "system:serviceaccount:crd:chaostoolkit-crd" cannot list resource "chaosexperiments" in API group "chaostoolkit.org" in the namespace "crd"', url=URL('https://X.X.X.X:443/apis/chaostoolkit.org/v1/namespaces/crd/chaosexperiments')

But probably the first issue is the one to look at.

How far do we go with the templating of the resources from the CRO definition

The usual goal of a CRO definition is to remain simple and hide all the complexity in the CRD. But we have so many combinations (due to passing all the Kubernetes information) that we end up with an equally complex definition for the user.

Should we instead keep the basic options lean and simple and promote overriding the all pod otherwise?

Just thinking out loud :)

cc @dmartin35

Issue with ConfigMap when creating ChaosToolkitExperiment

[2020-03-19 16:03:07,467] kopf.objects         [ERROR   ] [chaostoolkit-crd/my-chaos-exp] Handler 'create_chaos_experiment' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 291, in execute_handler_once
    lifecycle=lifecycle,  # just a default for the sub-handlers, not used directly.
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 380, in invoke_handler
    **kwargs,
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/invocation.py", line 117, in invoke
    result = await fn(*args, **kwargs)  # type: ignore
  File "controller.py", line 67, in create_chaos_experiment
    kopf.adopt(cm_tpl, owner=body)
  File "/usr/local/lib/python3.7/site-packages/kopf/toolkits/hierarchies.py", line 139, in adopt
    append_owner_reference(objs, owner=real_owner)
  File "/usr/local/lib/python3.7/site-packages/kopf/toolkits/hierarchies.py", line 28, in append_owner_reference
    refs = obj.setdefault('metadata', {}).setdefault('ownerReferences', [])
AttributeError: 'V1ConfigMap' object has no attribute 'setdefault'

Install CTK extensions at runtime

Currently, to use the CTK with extension, we need to create a new docker image that includes the desired plugins. This solution is not very flexible, unless creating a big image that contains all possible extensions.

Could we / would we provide a way to specify the list of CTK extensions to be used by the ChaosExperiment CRO ?

we could provide something like:

---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
  name: my-chaos-exp
  namespace: chaostoolkit-crd
spec:
  pod:
    extensions:
    - chaostoolkit-slack
    - chaostoolkit-kubernetes

Can we do the pip install within an init-container ?

Drawback, dynamic extensions installation will be done at each runtime, will slow down the start of running the experiment. But this might not be very critical if we think extensions flexibility is a gain.

Add travis CI

We shall pass unit tests via travis , as part of the CICD

Deleting `chaosexperiment` CRO does not delete created resources

When deleting a chaosexperiment CRO on the chaostoolkit-crd namespace, it does not removes the resources (service account, pods, etc.) on the chaostoolkit-run

$ k -n chaostoolkit-run get pods
NAME                 READY   STATUS      RESTARTS   AGE
chaostoolkit-k7ssq   0/1     Completed   0          41m

$ k -n chaostoolkit-crd delete ctk my-chaos-exp
chaostoolkitexperiment.chaostoolkit.org "my-chaos-exp" deleted

$ k -n chaostoolkit-run get pods
NAME                 READY   STATUS      RESTARTS   AGE
chaostoolkit-k7ssq   0/1     Completed   0          42m

CRD cannot create experiment pod

With the following CTK experiment CRO:

---
apiVersion: v1
kind: Secret
metadata:
  name: chaostoolkit-settings
  namespace: chaostoolkit-run
data:
  settings.yaml: *********
---
apiVersion: chaostoolkit.org/v1
kind: ChaosToolkitExperiment
metadata:
  name: my-chaos-exp
  namespace: chaostoolkit-crd
spec:
  namespace: chaostoolkit-run
  pod:
    image:
      name: chaosiq/chaostoolkit
    chaosArgs:
      - --verbose
      - run
      - https://console.chaosiq.dev/assets/experiments/71340a18-96fd-478a-bfdd-2f454bf429e1.json

The CRD is not able to create PODs for running the experiment, see CRD logs


[2020-03-23 13:31:38,521] kopf.objects         [INFO    ] Namespace 'chaostoolkit-run' already exists. Let's continue...
[2020-03-23 13:31:38,526] kopf.objects         [INFO    ] [chaostoolkit-crd/my-chaos-exp] chaostoolkit resources will be created in namespace 'chaostoolkit-run'
[2020-03-23 13:31:38,526] kopf.objects         [INFO    ] [chaostoolkit-crd/my-chaos-exp] Suffix for resource names will be '-ltva8'
[2020-03-23 13:31:38,570] kopf.objects         [INFO    ] [chaostoolkit-crd/my-chaos-exp] Created service account
[2020-03-23 13:31:38,672] kopf.objects         [INFO    ] [chaostoolkit-crd/my-chaos-exp] Created role
[2020-03-23 13:31:38,717] kopf.objects         [INFO    ] [chaostoolkit-crd/my-chaos-exp] Created rolebinding
[2020-03-23 13:31:38,870] kopf.objects         [INFO    ] Env config map named 'chaostoolkit-env'
[2020-03-23 13:31:38,874] kopf.objects         [INFO    ] Settings secret volume named 'chaostoolkit-settings'
[2020-03-23 13:31:38,875] kopf.objects         [INFO    ] Experiment config map named 'chaostoolkit-experiment'
[2020-03-23 13:31:38,875] kopf.objects         [INFO    ] Override default chaos command arguments: $ chaos --verbose run https://console.chaosiq.dev/assets/experiments/71340a18-96fd-478a-bfdd-2f454bf429e1.json
[2020-03-23 13:31:38,879] kopf.objects         [ERROR   ] [chaostoolkit-crd/my-chaos-exp] Handler 'create_chaos_experiment' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 291, in execute_handler_once
    lifecycle=lifecycle,  # just a default for the sub-handlers, not used directly.
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/handling.py", line 380, in invoke_handler
    **kwargs,
  File "/usr/local/lib/python3.7/site-packages/kopf/reactor/invocation.py", line 117, in invoke
    result = await fn(*args, **kwargs)  # type: ignore
  File "controller.py", line 71, in create_chaos_experiment
    pod_tpl = await create_pod(v1, cm, spec, ns, name_suffix)
  File "controller.py", line 268, in run
    return await loop.run_in_executor(executor, pfunc)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "controller.py", line 474, in create_pod
    pod = api.create_namespaced_pod(body=tpl, namespace=ns)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 6115, in create_namespaced_pod
    (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 6206, in create_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 344, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 178, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 387, in request
    body=body)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 266, in POST
    body=body)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '7ccf57ba-353b-44da-988f-b83e66ee5074', 'Content-Type': 'application/json', 'Date': 'Mon, 23 Mar 2020 13:31:38 GMT', 'Content-Length': '478'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod in version \"v1\" cannot be handled as a Pod: v1.Pod.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Image: ReadString: expects \" or n, but found {, error found in #10 byte of ...|\"image\": {\"name\": \"c|..., bigger context ...|\"containers\": [{\"name\": \"chaostoolkit\", \"image\": {\"name\": \"chaosiq/chaostoolkit\"}, \"imagePullPolicy\"|...","reason":"BadRequest","code":400}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.