The Kubernetes Security Profiles Operator

License: Apache License 2.0

Go 23.80% Dockerfile 0.08% Makefile 0.30% Shell 0.53% Nix 0.03% Ruby 0.12% C 75.10% Smarty 0.03%

kubernetes kubernetes-operator seccomp k8s-sig-node seccomp-operator seccomp-profiles security-profiles selinux apparmor

security-profiles-operator's Introduction

Kubernetes Security Profiles Operator

This project is the starting point for the Security Profiles Operator (SPO), an out-of-tree Kubernetes enhancement which aims to make it easier for users to use SELinux, seccomp and AppArmor in Kubernetes clusters.

About

The motivation behind the project can be found in the corresponding RFC.

Related Kubernetes Enhancement Proposals (KEPs) which have direct influence on this project:

Next to those KEPs, here are existing approaches for security profiles in the Kubernetes world:

Features

The SPO's features are implemented for each one of the underlying supported technologies, namely: Seccomp, SELinux and AppArmor. Here's the feature parity status across them:

	Seccomp	SELinux	AppArmor
Profile CRD	Yes	Yes	Yes
ProfileBinding	Yes	No	No
Deploy profiles into nodes	Yes	Yes	Yes
Remove profiles no longer in use	Yes	Yes	Yes
Profile Auto-generation (logs)	Yes	WIP	No
Profile Auto-generation (ebpf)	Yes	No	No
Audit log enrichment	Yes	WIP	Yes

For information about the security model and what permissions each features requires, refer to SPO's security model.

Personas & User Stories

As any other piece of software, this operator is meant to help people. Thus, the target personas have been reflected in a document in this repo.

The functionality that this operator is meant to enable is captured as user stories. If you feel that a user story is not captured properly, feel free to submit a Pull Request. The team will be more than happy to review and help you reflect the requirement.

Roadmap

The project tries to not overlap with those existing implementations to provide valuable additions in a more secure Kubernetes context. We created a mind map to get a better feeling about all features we want to implement to better support some security areas within Kubernetes:

Going forwards, the operator will extend its purpose to assist Kubernetes users to create, distribute and apply security profiles for seccomp, AppArmor, SeLinux, PodSecurityPolicies and RBAC permissions.

Community, discussion, contribution, and support

If you're interested in contributing to SPO, please see the developer focused document

We schedule a monthly meeting every last Thursday of a month.

Meeting Notes

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

security-profiles-operator's People

Contributors

Stargazers

Watchers

Forkers

hasheddan evrardjp saschagrunert rhafer rafaelcapre adheipsingh cmurphy moolen ekmixon jaormx pjbgf kaizhe doanhnhq-uit smithysmithy akchamps naveensrinivasan yjuns jhrozek isabella232 ch-stark nannancy voigt sfowl arxiv-research mrogers950 digitalarche jitendrs pareshpateldfw sploving omkar-velotio roshpr sandipanpanda vaibhav2107 mx-psi spiffxp hecg119 vinods12 dut3062796s ashwinramanipsg ccojocar qinghai5060 kinvolk sidjain87 vincent056 andreaszeissner holyspectral openshift kajov longpi1 xiaojiey suprithars111 neblen pramine newtonguass brness jdziat mqasimsarfraz rhmdnd tylerauerbeck marckcloudarch helenabutowatcisco krishna412 jadebustos rahulroshan-kachchap addyoured henriquevcosta matthew-hagemann sumodgeorge tuxerrante chenliu1993 chiragkyal massaox sarab97 tormath1 slashben wangmin362 mrmoshkovitz coreycook8 rhtrai2 yuumasato novaesis 0xmilkmix vijayalakshmibaghdadi nahumlitvin snail130 aathil1231 kalyann567 mhils jlowe64 primalpimmy muthhus tnader1991 blue-infosec leifengs huytn1219 jcho02

security-profiles-operator's Issues

Move this project into kubernetes-sigs

The idea is to get sponsoring from SIG node to develop it in the kubernetes-sigs organization. This allows us to use prow and community resources in the first place.

Rename operator to consider support to other security capabilities (i.e. AppArmor)

It may be worth renaming the operator given the direction of broaden up its capabilities. (xref #118)

My personal suggestion would be "security-profiles-operator".

Places to rename:

repository kubernetes/org#2177
code base #139
slack channel kubernetes/community#5098

/priority important-soon

Add default profiles to the operator

The idea is to add some seccomp profiles for well-known applications, for example the Kubernetes dashboard.

Make seccomp profile applicable per namespace

We decided that the operator should watch every namespace, therefore we can deploy profiles from every namespace.

Make e2e tests (partially) run with namespaced deployment

The namespaced deployment is currently not e2e tested. One thing to mention is that they will not run out of the box, because the default-profiles configMap is part of the target namespace and not the seccomp-operator one.

Further enhancements to Deployment / DaemonSet model

This issue should collect possible enhancements to our current deployment model. Right now we create a Deployment which takes care of deploying the DaemonSet.

Context: https://docs.google.com/document/d/1mIfBJYGOcrNMnXsTw6i3jp8IYyjHC5DZkiuuufFAn_s/edit

Do you see any further enhancements which can be part of this umbrella issue?

Expand the scope of the operator to AppArmor

Loading AppArmor profiles could be done easily by the operator, too. We probably have to rename the project to something like "security-operator", too.

Provide user guidance around tighter RBAC controls

What would you like to be added:

The operator has currently cluster-wide get and list permissions for ConfigMap objects.
Guidance is needed to help users tightening the RBAC configuration without impacting operator's core functionality.

Why is this needed:

Help users implement more restrictive security controls.

/area security

make all fails

What happened:

I have forked the repo and try to run make and it failed.

What you expected to happen:

make to succeed

How to reproduce it (as minimally and precisely as possible):

make all
mkdir -p build
go build -ldflags '-s -w -linkmode external -extldflags "-static" -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.buildDate='2021-01-12T16:15:06Z' -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.gitCommit=c7f337312e70b657af8e8c51b9310c47ea74f2a7 -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.gitTreeState=clean -X sigs.k8s.io/security-profiles-operator/internal/pkg/version.version=0.3.0-dev' -tags 'netgo osusergo' -o build/security-profiles-operator ./cmd/security-profiles-operator
# github.com/containers/common/pkg/seccomp
../../../../pkg/mod/github.com/containers/[email protected]/pkg/seccomp/supported.go:38:12: undefined: unix.Prctl
../../../../pkg/mod/github.com/containers/[email protected]/pkg/seccomp/supported.go:38:23: undefined: unix.PR_GET_SECCOMP
../../../../pkg/mod/github.com/containers/[email protected]/pkg/seccomp/supported.go:40:13: undefined: unix.Prctl
../../../../pkg/mod/github.com/containers/[email protected]/pkg/seccomp/supported.go:40:24: undefined: unix.PR_SET_SECCOMP
../../../../pkg/mod/github.com/containers/[email protected]/pkg/seccomp/supported.go:40:45: undefined: unix.SECCOMP_MODE_FILTER
make: *** [build/security-profiles-operator] Error 2

Anything else we need to know?:

A better contributing guide would be helpful. I will do a PR to bootstrap when I can get past this.

Environment:

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release): osx
Kernel (e.g. uname -a):
Others:

Provide seccomp profile for the operator

What would you like to be added:

I think a default profile for the operator itself could be a good starting point for a demo. It should be highly restrictive and we could create some documentation around how to build such profiles as well as providing additional application profiles out of the box.

Why is this needed:

We should aim for the maximum amount of security for the operator deployment, a seccomp profile will help with that.

Add support for loading AppArmor profiles into nodes

Add support for managing AppArmor profiles based off ConfigMap.

/priority important-soon

Setup regular community meeting

The idea is to sync-up on new features, ongoing development and contributors on a regular basis. I suggest that we do that once a month. Do you think that this would be sufficient?

Check if a node has seccomp support.

We could verify if a node supports seccomp by:

func IsEnabled() bool {
	// Check if Seccomp is supported, via CONFIG_SECCOMP.
	if err := unix.Prctl(unix.PR_GET_SECCOMP, 0, 0, 0, 0); err != unix.EINVAL {
		// Make sure the kernel has CONFIG_SECCOMP_FILTER.
		if err := unix.Prctl(unix.PR_SET_SECCOMP, unix.SECCOMP_MODE_FILTER, 0, 0, 0); err != unix.EINVAL {
			return true
		}
	}
	return false
}

The question is, what would we do in the case that this function returns false? We could make the reconciliation a noop, throw a warning event. Right now this would not bring much benefit to the user. How could we prevent that a workload with a profile gets scheduled on a node which is not supporting seccomp?

Surface events for seccomp profiles created via ConfigMaps

Users should be made aware if their seccomp profile was successfully added to nodes in the cluster or not.

RBAC Permissions for controllers to be created by kubebuilder using markers

What would you like to be added:

The RBAC permission for the controllers to be automated by code generation using kubbuilder markers.

Example

// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch.tutorial.kubebuilder.io,resources=cronjobs/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=batch,resources=jobs,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=batch,resources=jobs/status,verbs=get

https://book.kubebuilder.io/reference/markers/rbac.html

Why is this needed:

This avoids the manual updating of the permissions for k8s objects in the generated yaml.
Also, it avoids over-provisioning systems with excess permission.

Here is an example

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app: security-profiles-operator
  name: security-profiles-operator
  namespace: security-profiles-operator
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - services/finalizers
  - endpoints
  - persistentvolumeclaims
  - events
  - configmaps
  - configmaps/finalizers
  - secrets
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  resources:
  - deployments
  - daemonsets
  - replicasets
  - statefulsets
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  resourceNames:
  - selinux-operator
  resources:
  - deployments/finalizers
  verbs:
  - update
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - apps
  resources:
  - replicasets
  - deployments
  verbs:
  - get
- apiGroups:
  - security.openshift.io
  resourceNames:
  - privileged
  resources:
  - securitycontextconstraints
  verbs:
  - use
---

The above role can delete events which not required.

Move the e2e to ginkgo

What would you like to be added:

Move the e2e tests to ginkgo

Why is this needed:

I am proposing to utilize ginkgo for e2e testing instead of the standard go test. ginkgo is utilized by kubebuilder https://github.com/kubernetes-sigs/kubebuilder/blob/master/test/e2e/v2/plugin_cluster_test.go and comes as a standard package for any new CRD.

Ginkgo is BDD driven. It has features like BeforeEach and AfterEach
Here is an example from Kubebuilder which shows makes the code clean and simple

BeforeEach(func() {
			var err error
			kbc, err = utils.NewTestContext(utils.KubebuilderBinName, "GO111MODULE=on")
			Expect(err).NotTo(HaveOccurred())
			Expect(kbc.Prepare()).To(Succeed())

			// Install cert-manager with v1beta2 CRs.
			By("installing cert manager bundle")
			Expect(kbc.InstallCertManager(true)).To(Succeed())

			By("installing prometheus operator")
			Expect(kbc.InstallPrometheusOperManager()).To(Succeed())
		})

		AfterEach(func() {
			By("clean up created API objects during test process")
			kbc.CleanupManifests(filepath.Join("config", "default"))

			By("uninstalling prometheus manager bundle")
			kbc.UninstallPrometheusOperManager()

			// Uninstall cert-manager with v1beta2 CRs.
			By("uninstalling cert manager bundle")
			kbc.UninstallCertManager(true)

			By("remove container image and work dir")
			kbc.Destroy()
		})

ginkgo can do parallel testing which will decrease the test completion time. It comes with builtin Wait features like Eventually

Eventually(verifyCurlUp, 30*time.Second, time.Second).Should(Succeed())

It has a rich API for writing tests.

IMO it is the ginkgo test is easier to read and maintain.

"error"="updating SeccompProfile: updating SeccompProfile

What happened:

apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: SeccompProfile
metadata:
name: whitelist
namespace: default
spec:
defaultAction: SCMP_ACT_ERRNO
targetWorkload: custom-profiles
architectures:

SCMP_ARCH_X86_64
syscalls:
action: SCMP_ACT_LOG
names:
- _llseek
- _newselect
- accept
  long list here

kubectl create -f whitelist.yaml

kubectl get seccompprofile -o wide
NAME STATUS AGE LOCALHOSTPROFILE
whitelist Active 19m operator/default/custom-profiles/whitelist.json

kubectl logs security-profiles-operator-2n7wx -n security-profiles-operator
E1213 16:36:52.535201 1 controller.go:246] controller "msg"="Reconciler error" "error"="updating SeccompProfile: updating SeccompProfile: Operation cannot be fulfilled on seccompprofiles.security-profiles-operator.x-k8s.io "whitelist": the object has been modified; please apply your changes to the latest version and try again" "controller"="pods" "name"="busyboxsecc" "namespace"="default" "reconcilerGroup"="" "reconcilerKind"="Pod"

Files is in correct place: ls /var/lib/kubelet/seccomp/operator/default/custom-profiles/whitelist.json
/var/lib/kubelet/seccomp/operator/default/custom-profiles/whitelist.json

kubectl get pods
NAME READY STATUS RESTARTS AGE
busyboxsecc 0/1 Error 0 27s

What you expected to happen:

Profile being loaded and Pod being running

How to reproduce it (as minimally and precisely as possible):

Happened also after re-install of security-profiles-operator.

Anything else we need to know?:

Environment:

Cloud provider or hardware configuration: selfmade KVM K8s
OS (e.g: cat /etc/os-release): Debian 10
Kernel (e.g. uname -a): Linux master1 4.19.0-13-cloud-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
Others: cri-o runtime, kube version 1.20.0

Dynamic Seccomp Profiles based on Cluster-level rules to Enforce syscalls

What would you like to be added:

Custom seccomp profiles could be dynamically modified to always allow specific system calls.
The always allowed system calls would be low risk/impact and built into the operator.

Why is this needed:

Some essential system calls (e.g. exit, exit_group) can be left out of seccomp profiles leading to containers being caught up on endless exit loops, as reported here. This is a non-deterministic behaviour, meaning that the same profile and container combination may have the issue in one kernel/platform, and not have it in another.

Define behaviour for `CRD` and `ConfigMap` containing the same profile

The operator is gaining full support to represent seccomp profiles in CRD.

Used in conjunction, and having the same profile name used in both CRD and ConfigMap forms, may lead to a non-deterministic results. Whatever resource is created last would overwrite the other former. Here's a few ideas on how we could tackle it:

Only allow a single method to represent profiles per cluster. When CRD is enabled, ConfigMap is disabled and vice-versa.
Create slightly different path for each (i.e. cm/myprofile.json and crd/myprofile.json).

selinux: Use events in the selinux controller

What would you like to be added:

decorate the selinux policy controller with events.

Why is this needed:

It was suggested in PR#214 that the selinux controller should use events so that admins are aware of what has happened.

Enhance e2e tests to test all use-cases

For the MVP we should work on making the e2e cover all features.

Cut initial release

This is the placeholder issue for cutting the first release. I think we should wait for kubernetes/test-infra#18667 and everything in milestone v1.

Cutting the release would imply a PR changing the version in the operator.yaml manifest. This commit could be tagged afterwards as the actual version. The built image (after finishing the postsubmit job) could be promoted to the actual tag.

I see that we infer the version from the git tag as an issue, because it breaks the above described workflow. We probably have to stick to hard coded versions.

Use k3d instead of kind

What would you like to be added:

Evaluate utilizing k3d instead of kind cluster for e2e test

Why is this needed:

k3s is a lightweight K8s distribution and k3d is the kind equivalent for it. We should investigate and measure if this can speed up our development/testing times and what are the drawbacks of using k3d vs kind.

Decide whether and how to add default syscalls to the SeccompProfile CRD

Part of the idea of #117 was to perhaps ensure that a standard set of default syscalls were present in a profile so that a minimum viable pod could be launched in kubernetes on e.g. runc. This issue is to evaluate whether this is worthwhile and discuss how to implement it.

The first consideration is that it's possible that the syscalls used by runc could change over time, and in fact they haven't been fully documented (opencontainers/runc#2097) so we would need to maintain our own list. Moreover, a container runtime other than runc might be used, and container managers on top of runc like docker/cri-o might also call other syscalls. Also, the list might be different depending on the architecture.

Another consideration is the pause container, which seems to use an additional set of syscalls on top of what is required for runc. Should those syscalls be included in the defaults?

Finally, the actions to take for a syscall aren't a blanket "allow" or "deny", there is also e.g. "log", "trace", "kill", etc. If a user creates a complex profile with different actions for different syscalls, should the default syscalls use the "allow" action or the "log" action? What happens if the user adds one of the default syscalls to their "log" rule, but the controller is automatically adding it to the "allow" rule?

One idea could be to simply maintain an example SeccompProfile that contains the minimum required syscalls instead of building it into the CRD/controller. Thoughts?

Create AppArmor profile for the operator

What would you like to be added:

A tailor-made AppArmor profile to restrict functionality to the minimum required for the operator to run.

Why is this needed:

Better define the attack surface, giving users a clear view of what the operator can/cannot do.

/area security
/priority important-longterm

Clean up profiles when ConfigMap deleted

What would you like to be added:

Currently profiles are not cleaned up on a node when their ConfigMap is deleted. The lifecycle of the profile on a node should match that of the ConfigMap that created it.

Why is this needed:

In most cases it is actually fine for the profile to be orphaned. In fact, cleaning up a profile after a workload has started using it can be a problem if that workload restarts and the profile is no longer there. However, there are a few cases in which orphaning profiles can be a problem:

A user creates a profile that is overly permissive or has a typo and does not want it to be able to be applied to containers anymore.
A user creates a new profile with a similar name to an old one that no longer exists as a ConfigMap, but still exists on the node. They then create a workload and accidentally apply the old profile. The operator would not block this because it would be using a valid profile, but it would be unexpected and potentially difficult to troubleshoot.
A user cannot get an accurate global list of seccomp profiles without interacting with each node.

Define groupName for new SeccompProfile CRD

The groupName for the new SeccompProfile CRD is based on the current name of the operator which may soon change. It may also make sense to append .k8s.io or .kubernetes.io to the name since it is part of an official kubernetes SIG, but this requires approval. Once a decision is reached about the new name for the operator, we should revisit the groupName of the CRD.

Fix failing image promotion

The image promotion post submit still seems to fail because of the non existing bucket:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-seccomp-operator-push-image/1290634231994650624

We like to push the latest master image to GCR, but it seems there are some bits missing or misconfigured.

The prow job: https://github.com/kubernetes/test-infra/blob/ab8cb5abfd6f5304526a982a24f63d33dde78a02/config/jobs/kubernetes-sigs/seccomp-operator/seccomp-operator-presubmits.yaml

The promoter manifest: https://github.com/kubernetes/k8s.io/blob/0af91d1599f3d2f51312799807504e92e3160ae1/k8s.gcr.io/manifests/k8s-staging-seccomp-operator/promoter-manifest.yaml

The staging bucket: https://github.com/kubernetes/k8s.io/blob/4adcbd780a71d502a0d8fceb456f57ef7e7353af/infra/gcp/ensure-staging-storage.sh#L89

Relevant PRs:

/milestone v1
/priority important-soon

Create distinct cluster-bound and namespace-bound resources

Context

Currently, the resources (SeccompProfile and SelinuxPolicy) are bound according to the deployment of the operator. For this, we have two distinct kustomize overlays called cluster and namespaced, which set the RBAC and Deployment mode, amongst other things.

What would you like to be added:

The proposal is to have two types per resource:

A "cluster" scoped resource which would be accessible from all namespaces
A "namespaced" resource which would only be accessible from the namespace it was created.

e.g.

We would have:

ClusterSeccompProfile and ClusterSelnuxPolicy as cluster-scoped resources
SeccompProfile and SelinuxPolicy as namespace-scoped resources

This is similar to how other kube resources are scoped. e.g. ClusterRole and Role.

Why is this needed:

Less maintenance

This would allow us to have a more unified deployment model (no need for two overlays) where we could start auto-generating our RBAC rules with kubebuilder (less maintenance burden).

Unifies use-cases

This would also enable the use-cases of both overlays in one operator, as opposed to having to deploy the two overlays.

Shared policies

Finally, it would also allow deployers to create cluster-wide policies for common scenarios. e.g. A cluster-wide SeccompProfile that's usable for nginx-based web-services. Validation of permissions can be done via a webhook.

Bind CRD profiles to workloads

What would you like to be added:

Target is to find a way to make applying profiles easier rather than specifying the LocalhostPath in the security context.

Why is this needed:

Users may not know where the profile is actually saved on disk. We could implement a new resource like a ProfileBinding which uses a selector to mutate workloads and set the right path to the profile in it's annotation/securityContext.

It is questionable how to handle deletions of those bindings. The annotations/fields for seccomp and AppArmor behave differently with respect to their mutability.

selinux: expand the selinuxpolicy status subresource to include error messages

What would you like to be added:

expand the status subresource of the selinuxpolicy object with an error message
populate it on errors
add an e2e test that tries loading a broken policy

Why is this needed:

the selinux controller already reads errors from selinuxd, but only displays them in the logs. It would be more user-friendly if the error was displayed in the status subresource.

Use kustomize for base deployment and derive namespaced one

The idea is to combine both deployment variants to increase their maintainability.

We could create one kustomize base and change the namespaced deployment to be an overlay of the base. We also could generate those two deployments statically for the end users, so that they do not have to rely on the kustomize binary itself (like now, but with increased maintainability). The verify CI job can then check if the deployments are in sync.

Alternatively, we could also create a helm chart. WDYT?

Fix post submit jobs are not running

What happened:

Post submit jobs for cri-tools do not seem to be running:
https://testgrid.k8s.io/sig-node-cri-tools
https://prow.k8s.io/?job=post-cri-tools-push-image-user-multi-arch

What you expected to happen:

The jobs to run when there was a change to https://github.com/kubernetes-sigs/cri-tools/tree/master/images/image-user as per PR kubernetes-sigs/cri-tools#698

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Post submit job added in kubernetes/test-infra#20092

Environment:

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Others:

Separate ServiceAccount/RBAC permissions for DaemonSet and Deployment

Context

With the recent change of making the DaemonSet be deployed by a Deployment workload ( #180 ). The operator is left using the same ServiceAccount with the same permissions for both workloads.

What would you like to be added:

We should move the workloads to have individual service accounts so they can have their own RBAC permissions.

Why is this needed:

If one of these pieces would be compromised, this reduces the attack vector.

selinux: kustomize templates should enable selinuxd by default

What would you like to be added:

The kustomize templates should switch EnableSelinux to true by default when deploying on OpenShift.

Why is this needed:

To cover the set of defaults better tailored to OpenShift

Unable to run e2e tests on macos

What happened:

Unable to e2e test on osx

What you expected to happen:

Able to run e2e tests on osx

How to reproduce it (as minimally and precisely as possible):

make test-e2e

go test -timeout 40m -count=1 ./test/... -v
=== RUN   TestSuite
cluster-type: kind
I0114 10:40:49.347318   83140 suite_test.go:288]  "msg"="Setting up suite"
=== RUN   TestSuite/TestSecurityProfilesOperator
I0114 10:40:49.347847   83140 suite_test.go:288]  "msg"="Deploying the cluster"
    suite_test.go:133:
        	Error Trace:	suite_test.go:133
        	            				suite.go:148
        	Error:      	Expected nil, but got: &os.PathError{Op:"fork/exec", Path:"/Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/build/kind", Err:0x8}
        	Test:       	TestSuite/TestSecurityProfilesOperator
I0114 10:40:49.350537   83140 suite_test.go:288]  "msg"="Waiting for cluster to be ready"
time="2021-01-14T10:40:49-05:00" level=info msg="+ /usr/local/bin/kubectl wait --for condition=ready nodes --all"
Unable to connect to the server: dial tcp 192.168.99.106:8443: i/o timeout
    suite_test.go:246:
        	Error Trace:	suite_test.go:246
        	            				suite_test.go:274
        	            				suite_test.go:137
        	            				suite.go:148
        	Error:      	Expected nil, but got: command /usr/local/bin/kubectl wait --for condition=ready nodes --all did not succeed: Unable to connect to the server: dial tcp 192.168.99.106:8443: i/o timeout
        	Test:       	TestSuite/TestSecurityProfilesOperator
I0114 10:43:19.447856   83140 suite_test.go:288]  "msg"="Destroying cluster"
time="2021-01-14T10:43:19-05:00" level=info msg="+ /Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/build/kind delete cluster --name=spo-e2e-1610638849 -v=3"
    suite_test.go:246:
        	Error Trace:	suite_test.go:246
        	            				suite_test.go:151
        	            				suite.go:141
        	            				panic.go:969
        	            				panic.go:212
        	            				signal_unix.go:742
        	            				command.go:326
        	            				suite_test.go:247
        	            				suite_test.go:274
        	            				suite_test.go:137
        	            				suite.go:148
        	Error:      	Expected nil, but got: &os.PathError{Op:"fork/exec", Path:"/Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/build/kind", Err:0x8}
        	Test:       	TestSuite/TestSecurityProfilesOperator
    suite.go:63: test panicked: runtime error: invalid memory address or nil pointer dereference
        goroutine 16 [running]:
        runtime/debug.Stack(0xc00014b500, 0x1facf40, 0x2b34520)
        	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/debug/stack.go:24 +0x9f
        github.com/stretchr/testify/suite.failOnPanic(0xc000102f00)
        	/Users/naveen/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:63 +0x57
        panic(0x1facf40, 0x2b34520)
        	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:969 +0x1b9
        k8s.io/release/pkg/command.(*Stream).OutputTrimNL(...)
        	/Users/naveen/go/pkg/mod/k8s.io/[email protected]/pkg/command/command.go:326
        sigs.k8s.io/security-profiles-operator/test_test.(*e2e).run(0xc00012a100, 0xc0004d6a20, 0x56, 0xc00014b780, 0x4, 0x4, 0x19, 0x10e9651)
        	/Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/test/suite_test.go:247 +0xc1
        sigs.k8s.io/security-profiles-operator/test_test.(*kinde2e).TearDownTest(0xc00012a100)
        	/Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/test/suite_test.go:151 +0x14d
        github.com/stretchr/testify/suite.Run.func1.1(0xc000114030, 0xc000102f00, 0x1ef6e90, 0x1c, 0x0, 0x0, 0x22e0ee0, 0xc000481920, 0xc000481920, 0xc000114b28, ...)
        	/Users/naveen/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:141 +0x103
        panic(0x1facf40, 0x2b34520)
        	/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:969 +0x1b9
        k8s.io/release/pkg/command.(*Stream).OutputTrimNL(...)
        	/Users/naveen/go/pkg/mod/k8s.io/[email protected]/pkg/command/command.go:326
        sigs.k8s.io/security-profiles-operator/test_test.(*e2e).run(0xc00012a100, 0xc000128460, 0x16, 0xc00014bc48, 0x5, 0x5, 0x0, 0x2111260)
        	/Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/test/suite_test.go:247 +0xc1
        sigs.k8s.io/security-profiles-operator/test_test.(*e2e).kubectl(...)
        	/Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/test/suite_test.go:274
        sigs.k8s.io/security-profiles-operator/test_test.(*kinde2e).SetupTest(0xc00012a100)
        	/Users/naveen/go/src/github.com/naveensrinivasan/security-profiles-operator/test/suite_test.go:137 +0x41c
        github.com/stretchr/testify/suite.Run.func1(0xc000102f00)
        	/Users/naveen/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:148 +0x658
        testing.tRunner(0xc000102f00, 0xc0000de090)
        	/usr/local/Cellar/go/1.15.6/libexec/src/testing/testing.go:1123 +0xef
        created by testing.(*T).Run
        	/usr/local/Cellar/go/1.15.6/libexec/src/testing/testing.go:1168 +0x2b3
--- FAIL: TestSuite (150.10s)
    --- FAIL: TestSuite/TestSecurityProfilesOperator (150.10s)
FAIL
FAIL	sigs.k8s.io/security-profiles-operator/test	150.436s
FAIL
make: *** [test-e2e] Error 1

Environment:

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release): darwin
Kernel (e.g. uname -a):
Others:

Enhance the profile validation by re-using the libcontainer implementation

I'm working right now to move the validation code from runc (libcontainer) into containers/common, which could be then re-used by the operator to do a tighter validation of the profiles.

If we have a CRD, then we could think about possibilities to validate the profile on CRD creation.

Check if type-based validation is necessary when having a CRD

With #125 we will have a CRD around seccomp profiles which give us a good API-level validation of the fields. We should check if this type-based validation is still necessary afterwards:

https://github.com/kubernetes-sigs/seccomp-operator/blob/b292dd72053e9868d39e6d2063dcfb1f4454e99a/internal/pkg/controllers/profile/profile.go#L216-L226

The check just unmarshals the JSON fields which gives us kind of a type safety.

Main operator Deployment is missing its seccomp profile

What happened:

With the operator splitting part of its functionality from a DaemonSet to a Deployment, it lost its seccomp profile and the init code that set it up, which now only exists for the DaemonSet:

b44de6f#diff-ee1324908347cf8841925458a9937fb58cd6593d4002af28b74a0071bb12dbf4L16

What you expected to happen:

The Deployment should still be run with a seccomp profile, at least if the administrator is choosing to run it with seccomp.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Others:

e2e tests are broken due to failing cert-manager certificate creation

The e2e tests are currently not able to deploy the operator, because cert-manager seems to fail to create the certificate in kind.

The operator deployment gets stuck in ContainerCreating state:

> k describe pod security-profiles-operator-844b669589-fxvvm
Name:           security-profiles-operator-844b669589-fxvvm
Namespace:      security-profiles-operator
Priority:       0
Node:           spo-e2e-1611320144-control-plane/172.18.0.2
Start Time:     Fri, 22 Jan 2021 14:06:49 +0100
Labels:         app=security-profiles-operator
                name=security-profiles-operator
                pod-template-hash=844b669589
Annotations:    seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/security-profiles-operator-844b669589
Containers:
  security-profiles-operator:
    Container ID:
    Image:         security-profiles-operator:latest
    Image ID:
    Port:          9443/TCP
    Host Port:     0/TCP
    Args:
      manager
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      RELATED_IMAGE_OPERATOR:          security-profiles-operator:latest
      RELATED_IMAGE_NON_ROOT_ENABLER:  bash:5.0
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from security-profiles-operator-token-p4wvt (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webhook-server-cert
    Optional:    false
  security-profiles-operator-token-p4wvt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  security-profiles-operator-token-p4wvt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                 node-role.kubernetes.io/master:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age               From               Message
  ----     ------       ----              ----               -------
  Normal   Scheduled    17s               default-scheduler  Successfully assigned security-profiles-operator/security-profiles-operator-844b669589-fxvvm to spo-e2e-1611320144-control-plane
  Warning  FailedMount  1s (x6 over 16s)  kubelet            MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found

The cert-manager-cainjector pod reported the following error:

E0122 13:03:23.740344       1 sources.go:114] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="unable to fetch associated certificate" "error"="Certificate.cert-manager.io \"webhook-cert\" not found" "certificate"={"Namespace":"security-profiles-operator","Name":"webhook-cert"} "resource_kind"="MutatingWebhookConfiguration" "resource_name"="spo-mutating-webhook-configuration" "resource_namespace"="" "resource_version"="v1beta1"
I0122 13:03:23.740375       1 controller.go:167] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="could not find any ca data in data source for target" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="spo-mutating-webhook-configuration" "resource_namespace"="" "resource_version"="v1beta1"

It may be possible that we have to add additional configuration to make cert-manager work in kind.

cc @cmurphy

selinux: selinuxd container hardening

Not sure if this is an RFE or a bug, feel free to change the issue.

What would you like to be added:

resource requests and limits for the selinuxd container
the selinuxd container should use read-only rootfs

Why is this needed:

To harden the selinuxd container. Note that the RO FS is dependant on mounting the database selinuxd uses in a RW location and the database location is not configurable at the moment, so this is dependent on an selinux RFE as well.

Record seccomp profiles

The idea is to allow the recording of seccomp profiles directly inside the Kubernetes cluster. To achieve that we could utilize the oci-seccomp-bpf-hook, which has the following requirements:

Kernel headers have to be installed (could be handled by the operator or the sysadmin)
An OCI hook compatible container runtime

We could think about annotating a node, then install the kernel headers (may be a prerequisite) and the hook. After that, the operator would have to annotate the target workload correctly (that's the way how the hook knows what to do). The recorded profile would have to be written into a configmap afterwards, which would cause that all nodes sync the profile to disk.

WDYT? It's overall a bit complicated but seems like a valuable feature after the initial release.

Edit: We could also use the seccomp notifier feature to think about a more elegant way to record profiles. This needs more evaluation but might be achievable by creating a PID namespace-shared sidecar.

Implement minimal valuable implementation

Summary

Create an operator which synchronizes seccomp profiles from config maps to file system locations.

Use cases and possible issues

A user creates a new profile and the operator syncs them on disk for each node
- How to map local paths to the profiles?
A user deletes a profile
- TBD
A user updates a profile
- Probably create a new profile and do not delete the old one
A user wants to apply a profile to a workload
- The operator takes care of adding the right annotation to the workload
- We have to define what the user has to do to apply the profile (apply a label, annotation, …)

Implementation details

Operator DaemonSet
Watch ConfigMaps only on operator namespace
Security best practices
- Non-root operator!
- Non-root agents?
- RBAC permissions for the ServiceAccount
Testing
- E2E - prow (blocked - sig repo)
Documentation
- Security concerns?
Easy deployment method
- Single plain file
Events
- Persisted files
- Event for failures

Validate profiles before writing them to disk

We could add a pre validation step to parse the seccomp profiles before writing them to disk. This way we can ensure that only valid profiles are written down. I would report errors with events as well as directly at reconciliation level. WDYT? Would this be suitable for v1?

/area security

Refers to seccomp/containers-golang#34

SELinux profile support

Some time ago, we noticed a gap in how folks install SELinux modules for their workloads, and came up with an operator to do just that: https://github.com/JAORMX/selinux-operator . The intent of that project is to be able to create SELinux policies as Custom Resources, and to be able to install them and take them into use on a per-namespace basis. So it comes with the controller that reads policies and schedules workloads to install them, as well as a webhook to validate that folks aren't able to use policies that are not installed in the namespaces. So, to some extent, it has a very similar model to the security-profile-operator.

If you'd like a bit more of an explanation and to see it in action, we gave a talk about it in DevConf this year: https://www.youtube.com/watch?v=iMO6rwA-i_s

What would you like to be added:

Let's merge the projects!

This would come in the form of adding the SELinux policy types, controllers, and webhook from the selinux-operator to be part of the selinux-profiles-operator instead.

Why is this needed:

It would add SELinux support.

selinux: the selinuxpolicy controller should not trigger needless writes to the policy file

What would you like to be added:

Reschedule the selinux controller reconcile loop until selinuxd's /ready endpoint returns true, indicating that selinuxd is ready to accept requests.

Why is this needed:

There was a race in selinuxd that caused the daemon to only start watching for selinuxpolicy files after the startup which is quite expensive and takes a long time. This was uncovered when I tried to remove the code in the reconciler that always writes the policy files on every reconcile loop. If we want to only write once in order to not trigger more inotify watches for selinuxd and make the policy state flap, we need to know when selinuxd is ready to accept requests which was implemented in JAORMX/selinuxd#25

Constrain resource usage

What would you like to be added:

Seccomp operator should have reasonable requirements / limits of CPU, memory and storage defined.

Why is this needed:

This provides safe defaults whilst decreases the likelihood of the operator being used to starve a cluster of its resources.

/priority important-soon
/assign

Custom Resource Definition (CRD) for profiles

The idea is to create a CRD on top of profiles, which helps to:

focus on a better user experience
provides a useful abstraction, like the minimum amount of syscalls needed for container runtimes like runc
easier create profiles on a higher level

pull-security-profiles-operator-verify failing intermittently

What happened:

Now that we have multiple CRDs make verify-deployments fails intermittently, therefore breaking pull-security-profiles-operator-verify.

What you expected to happen:

On a clean git branch, make verify-deployments should yield consistent results.

How to reproduce it (as minimally and precisely as possible):

git fetch origin pull/176/head:test2
git checkout test2
make verify-deployments
git add .
git commit -m "test"
make verify-deployments
make verify-deployments

Anything else we need to know?:

Upon initial investigation the culprit seems to be:

go run -tags generate sigs.k8s.io/controller-tools/cmd/controller-gen "crd:crdVersions=v1" paths="./api/..." output:crd:stdout > deploy/base/crd.yaml

As it yields results in different order every time.

Environment:

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release): Ubuntu 18.04
Kernel (e.g. uname -a): Linux ... 5.4.0-58-generic #64-Ubuntu
Others:

Dynamic Seccomp Profiles based on Cluster-level rules to Block syscalls

What would you like to be added:

Custom seccomp profiles could be dynamically modified to always block specific system calls.
This would ensure that profiles could not allow their execution (considering both allow and deny list profiles).

Why is this needed:

A good example of the need for this is when it is insecure to use a given system call. For example, before kernel 4.8 ptrace could be used to escape seccomp enforcement. The rule to block such case could look like this:

{
  "syscalls": [
      "ptrace"
  ],
  "outcome": "BLOCK",
  "criteria": {
     "minKernel": "4.8"
  }
}

This would automatically protect users by removing ptrace from allow-list-based seccomp profiles having actions SCMP_ACT_LOG or SCMP_ACT_ALLOW. And would include ptrace on deny-list-based seccomp profiles.

If this functionality becomes customisable by users, the list of "blockable" syscalls would need to be built into the operator to ensure that essential system calls could not be blocked.

kubernetes-sigs / security-profiles-operator Goto Github PK

security-profiles-operator's Introduction

Kubernetes Security Profiles Operator

About

Features

Personas & User Stories

Roadmap

Community, discussion, contribution, and support

Code of conduct

security-profiles-operator's People

Contributors

Stargazers

Watchers

Forkers

security-profiles-operator's Issues

What would you like to be added:

Why is this needed:

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

What would you like to be added:

Why is this needed:

What would you like to be added:

Example

Why is this needed:

What would you like to be added:

Why is this needed:

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

What would you like to be added:

Why is this needed:

What would you like to be added:

Why is this needed:

What would you like to be added:

Why is this needed:

What would you like to be added:

Why is this needed:

What would you like to be added:

Why is this needed:

Context

What would you like to be added:

Why is this needed:

Less maintenance

Unifies use-cases

Shared policies

What would you like to be added:

Why is this needed:

What would you like to be added:

Why is this needed:

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Context

What would you like to be added:

Why is this needed:

What would you like to be added:

Why is this needed:

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Environment:

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

What would you like to be added:

Why is this needed:

Summary

Use cases and possible issues

Implementation details

What would you like to be added:

Let's merge the projects!