Giter Club home page Giter Club logo

controller-rs's Introduction

kube-rs

Crates.io Rust 1.75 Tested against Kubernetes v1_25 and above Best Practices Discord chat

A Rust client for Kubernetes in the style of a more generic client-go, a runtime abstraction inspired by controller-runtime, and a derive macro for CRDs inspired by kubebuilder. Hosted by CNCF as a Sandbox Project

These crates build upon Kubernetes apimachinery + api concepts to enable generic abstractions. These abstractions allow Rust reinterpretations of reflectors, controllers, and custom resource interfaces, so that you can write applications easily.

Installation

Select a version of kube along with the generated k8s-openapi structs at your chosen Kubernetes version:

[dependencies]
kube = { version = "0.91.0", features = ["runtime", "derive"] }
k8s-openapi = { version = "0.22.0", features = ["latest"] }

See features for a quick overview of default-enabled / opt-in functionality.

Upgrading

See kube.rs/upgrading. Noteworthy changes are highlighted in releases, and archived in the changelog.

Usage

See the examples directory for how to use any of these crates.

Official examples:

For real world projects see ADOPTERS.

Api

The Api is what interacts with Kubernetes resources, and is generic over Resource:

use k8s_openapi::api::core::v1::Pod;
let pods: Api<Pod> = Api::default_namespaced(client);

let pod = pods.get("blog").await?;
println!("Got pod: {pod:?}");

let patch = json!({"spec": {
    "activeDeadlineSeconds": 5
}});
let pp = PatchParams::apply("kube");
let patched = pods.patch("blog", &pp, &Patch::Apply(patch)).await?;
assert_eq!(patched.spec.active_deadline_seconds, Some(5));

pods.delete("blog", &DeleteParams::default()).await?;

See the examples ending in _api examples for more detail.

Custom Resource Definitions

Working with custom resources uses automatic code-generation via proc_macros in kube-derive.

You need to add #[derive(CustomResource, JsonSchema)] and some #[kube(attrs..)] on a spec struct:

#[derive(CustomResource, Debug, Serialize, Deserialize, Default, Clone, JsonSchema)]
#[kube(group = "kube.rs", version = "v1", kind = "Document", namespaced)]
pub struct DocumentSpec {
    title: String,
    content: String,
}

Then you can use the generated wrapper struct Document as a kube::Resource:

let docs: Api<Document> = Api::default_namespaced(client);
let d = Document::new("guide", DocumentSpec::default());
println!("doc: {:?}", d);
println!("crd: {:?}", serde_yaml::to_string(&Document::crd()));

There are a ton of kubebuilder-like instructions that you can annotate with here. See the documentation or the crd_ prefixed examples for more.

NB: #[derive(CustomResource)] requires the derive feature enabled on kube.

Runtime

The runtime module exports the kube_runtime crate and contains higher level abstractions on top of the Api and Resource types so that you don't have to do all the watch/resourceVersion/storage book-keeping yourself.

Watchers

A low level streaming interface (similar to informers) that presents Applied, Deleted or Restarted events.

let api = Api::<Pod>::default_namespaced(client);
let stream = watcher(api, Config::default()).applied_objects();

This now gives a continual stream of events and you do not need to care about the watch having to restart, or connections dropping.

while let Some(event) = stream.try_next().await? {
    println!("Applied: {}", event.name_any());
}

NB: the plain items in a watcher stream are different from WatchEvent. If you are following along to "see what changed", you should flatten it with one of the utilities from WatchStreamExt, such as applied_objects.

Reflectors

A reflector is a watcher with Store on K. It acts on all the Event<K> exposed by watcher to ensure that the state in the Store is as accurate as possible.

let nodes: Api<Node> = Api::all(client);
let lp = Config::default().labels("kubernetes.io/arch=amd64");
let (reader, writer) = reflector::store();
let rf = reflector(writer, watcher(nodes, lp));

At this point you can listen to the reflector as if it was a watcher, but you can also query the reader at any point.

Controllers

A Controller is a reflector along with an arbitrary number of watchers that schedule events internally to send events through a reconciler:

Controller::new(root_kind_api, Config::default())
    .owns(child_kind_api, Config::default())
    .run(reconcile, error_policy, context)
    .for_each(|res| async move {
        match res {
            Ok(o) => info!("reconciled {:?}", o),
            Err(e) => warn!("reconcile failed: {}", Report::from(e)),
        }
    })
    .await;

Here reconcile and error_policy refer to functions you define. The first will be called when the root or child elements change, and the second when the reconciler returns an Err.

See the controller guide for how to write these.

TLS

By default rustls is used for TLS, but openssl is supported. To switch, turn off default-features, and enable the openssl-tls feature:

[dependencies]
kube = { version = "0.91.0", default-features = false, features = ["client", "openssl-tls"] }
k8s-openapi = { version = "0.22.0", features = ["latest"] }

This will pull in openssl and hyper-openssl. If default-features is left enabled, you will pull in two TLS stacks, and the default will remain as rustls.

musl-libc

Kube will work with distroless, scratch, and alpine (it's also possible to use alpine as a builder with some caveats).

License

Apache 2.0 licensed. See LICENSE for details.

controller-rs's People

Contributors

clux avatar ianstanton avatar kazk avatar sjmiller609 avatar yangsoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

controller-rs's Issues

unable to upgrade to 0.72.0 due to quote version selection

error: failed to select a version for `quote`.
    ... required by package `darling_core v0.14.0`
    ... which satisfies dependency `darling_core = "=0.14.0"` of package `darling v0.14.0`
    ... which satisfies dependency `darling = "^0.14.0"` of package `kube-derive v0.71.0 (https://github.com/kube-rs/kube-rs.git?rev=7715cabd4d1976493e6b8949471f283df927a79e#7715cabd)`
    ... which satisfies git dependency `kube-derive` of package `kube v0.71.0 (https://github.com/kube-rs/kube-rs.git?rev=7715cabd4d1976493e6b8949471f283df927a79e#7715cabd)`
    ... which satisfies git dependency `kube` of package `controller v0.11.6 (/home/clux/kube/controller-rs)`
versions that meet the requirements `^1.0.18` are: 1.0.18

all possible versions conflict with previously selected packages.

  previously selected package `quote v1.0.15`
    ... which satisfies dependency `quote = "^1.0.3"` (locked to 1.0.15) of package `actix-macros v0.2.3`
    ... which satisfies dependency `actix-macros = "^0.2.3"` (locked to 0.2.3) of package `actix-web v4.0.1`
    ... which satisfies dependency `actix-web = "^4.0.1"` (locked to 4.0.1) of package `controller v0.11.6 (/home/clux/kube/controller-rs)`

Cannot currently install 0.72 because of darling 0.14 pulling in quote 1.0.18
https://github.com/TedDriggs/darling/blob/master/core/Cargo.toml

but actix-macros also pulls in quote at 1.0.3
https://github.com/actix/actix-net/blob/master/actix-macros/Cargo.toml#L19

i didn't think this normally would cause a problem but there's the line:

... which satisfies dependency `quote = "^1.0.3"` (locked to 1.0.15) of package `actix-macros v0.2.3`

which indicates there's a max of 1.0.15 somehow

Status update/patch

Hello Eirik @clux,

Nice to meet you virtually. Thank you so much for this awesome. 🚀 codebase! Started playing with it here, evaluating it as an alternative to the canonical go lang knative source controller implementation.

Issue description

While playing with the control-rs, the latest main revision (1d97d57...) I've noticed the Foo resource's status is updated only on the first resource creation. For all subsequent Foo instance creations, the status would not be updated. Tried to debug this issue with additional traces. It appears that status_patch.await does not return the second time, but I could be totally wrong.

Reproduction steps

  1. Install controller
➜  controller-rs git:(main) ✗ k create -f yaml/foo-crd.yaml
customresourcedefinition.apiextensions.k8s.io/foos.clux.dev created
➜  controller-rs git:(main) ✗ k create -f yaml/deployment.yaml
serviceaccount/foo-controller created
clusterrole.rbac.authorization.k8s.io/control-foos created
clusterrolebinding.rbac.authorization.k8s.io/foo-controller-binding created
service/foo-controller created
deployment.apps/foo-controller created
  1. Create the first Foo instance and check if the status is updated
➜  controller-rs git:(main) ✗ k create -f yaml/instance-bad.yaml
foo.clux.dev/bad created
➜  controller-rs git:(main) ✗ k get foo bad -o yaml | grep is_bad
  is_bad: true
  1. Create a second Foo instance and check if the status is updated
➜  controller-rs git:(main) ✗ k create -f yaml/instance-good.yaml
foo.clux.dev/good created
➜  controller-rs git:(main) ✗ k get foo good -o yaml | grep is_bad
➜  controller-rs git:(main) ✗

Alternative Reproduction steps

For the following scenario same symptoms are observed:

  1. Create the first Foo instance and check if the status is updated
➜  controller-rs git:(main) ✗ k create -f yaml/instance-good.yaml
foo.clux.dev/good created
➜  controller-rs git:(main) ✗ k get foo good -o yaml | grep is_bad
  is_bad: false
  1. Delete the first instance
➜  controller-rs git:(main) ✗ k delete foo good
foo.clux.dev "good" deleted
  1. Recreate the same instance and check if the status is updated
➜  controller-rs git:(main) ✗ k create -f yaml/instance-good.yaml
foo.clux.dev/good created
➜  controller-rs git:(main) ✗ k get foo good -o yaml | grep is_bad
➜  controller-rs git:(main) ✗

Please share your thoughts...

Best regards and stay safe!

docker push on ci is not pushing with correct tags

The GHA docker-metadata action is meant to infer tags in

- name: Configure tags based on git tags + latest
uses: docker/metadata-action@v4
id: meta
with:
images: clux/controller
tags: |
type=pep440,pattern={{version}}
type=raw,value=latest,enable={{is_default_branch}}
type=ref,event=pr
via docker-metadata pep440.

but as can be seen in the last job it outputs:

Processing tags input
  type=pep440,pattern={{version}},prefix=otel-,value=,enable=true,priority=900
  type=ref,event=pr,prefix=otel-,enable=true,priority=600
  type=raw,value=otel,enable={{is_default_branch}},priority=200
Processing flavor input
  latest=auto
  prefix=
  prefixLatest=false
  suffix=
  suffixLatest=false
Docker image version
  otel

which means the e2e ci which is meant to test the built image from the chart fails.

I think this could be because we are not running the job in response to a tag push, but instead as a normal build.
but there's also the bad ordering setup currently: https://github.com/kube-rs/controller-rs/actions/runs/5814041092

we run docker-base which is depended on by e2e, but the e2e job does a kubectl apply on the helm output (which is always using the last version).

not sure i have enough time to fix this today and likely off for a week so rough plan:

  • make e2e job do a helm template with custom values (with version set) that ensures it's the currently built image that gets tested (as opposed to either latest or the chart.yaml pin)
  • figure out a way to get tag versions to get picked up by docker-metadata (tag based build)?
  • figure out how e2e job works in the tag based setup? (skip?)
  • possibly: combine features of base build + telemetry build and just have one docker build and get rid of docker-otel to simplify otel version selection in the chart (did a hacky _helper for it in gotpl...)
  • update test doc on kube.rs with any fixes

future long term

  • migrate from my personal dockerhub to kube-rs github registry
  • push chart to kube-rs github registry (also oci now)

convert the crd to a more real world example

the current Foo crd does not map onto anything, and is not the best thing to show newcomers.

we should have something more akin to the configmap-generator crd in kube-rs/examples, something that needs some statemachinery that can be unit tested.

tilt up doesn't seem to install CRD

$ tilt up
Tilt started on http://localhost:10350/
v0.32.0, built 2023-03-13

(space) to open the browser
(s) to stream logs (--stream=true)
(t) to open legacy terminal mode (--legacy=true)
(ctrl-c) to exit
Tilt started on http://localhost:10350/
v0.32.0, built 2023-03-13

Initial Build
Loading Tiltfile at: /home/mkm/controller-rs/Tiltfile
compiling with features:
Successfully loaded Tiltfile (9.664509ms)
      compile │
      compile │ Initial Build
      compile │ Running cmd: just compile
WARNING: You are running Kind without a local image registry.
Tilt can use the local registry to speed up builds.
Instructions: https://github.com/tilt-dev/kind-local
      compile │     Finished release [optimized] target(s) in 0.14s

uncategorized │
uncategorized │ Initial Build
uncategorized │ STEP 1/1 — Deploying
uncategorized │      Applying YAML to cluster
uncategorized │      Objects applied to cluster:
uncategorized │        → doc-controller:serviceaccount
uncategorized │        → doc-controller:clusterrole
uncategorized │        → doc-controller:clusterrolebinding
uncategorized │
uncategorized │      Step 1 - 0.11s (Deploying)
uncategorized │      DONE IN: 0.11s
uncategorized │
doc-controll… │
doc-controll… │ Initial Build
doc-controll… │ STEP 1/3 — Building Dockerfile: [clux/controller]
doc-controll… │ Building Dockerfile for platform linux/amd64:
doc-controll… │   FROM cgr.dev/chainguard/static
doc-controll… │   COPY --chown=nonroot:nonroot ./controller /app/
doc-controll… │   EXPOSE 8080
doc-controll… │   ENTRYPOINT ["/app/controller"]
doc-controll… │
doc-controll… │
doc-controll… │      Building image
doc-controll… │      [background] read source files
doc-controll… │      [1/2] FROM cgr.dev/chainguard/static@sha256:f410bf52742e6feffaea0ec77ee0da46e3891c66aed4d99eb5fff70e154df01f
doc-controll… │      [1/2] FROM cgr.dev/chainguard/static@sha256:f410bf52742e6feffaea0ec77ee0da46e3891c66aed4d99eb5fff70e154df01f [done: 16ms]
doc-controll… │      [background] read source files 21.51MB [done: 99ms]
doc-controll… │      [1/2] FROM cgr.dev/chainguard/static@sha256:f410bf52742e6feffaea0ec77ee0da46e3891c66aed4d99eb5fff70e154df01f 488.64kB / 488.64kB [done: 614ms]
doc-controll… │      [2/2] COPY --chown=nonroot:nonroot ./controller /app/
doc-controll… │      [2/2] COPY --chown=nonroot:nonroot ./controller /app/ [done: 109ms]
doc-controll… │      exporting to image
doc-controll… │      exporting to image [done: 90ms]
doc-controll… │
doc-controll… │ STEP 2/3 — Pushing clux/controller:tilt-41cd18d1e986baac
doc-controll… │      Loading image to KIND
doc-controll… │      Image: "docker.io/clux/controller:tilt-41cd18d1e986baac" with ID "sha256:41cd18d1e986baac3dea88effd356b84143d3260feeeaa1ff655eb6633c34805" not yet present on node "kind-control-plane", loading...
doc-controll… │
doc-controll… │ STEP 3/3 — Deploying
doc-controll… │      Applying YAML to cluster
doc-controll… │      Objects applied to cluster:
doc-controll… │        → doc-controller:service
doc-controll… │        → doc-controller:deployment
doc-controll… │
doc-controll… │      Step 1 - 2.80s (Building Dockerfile: [clux/controller])
doc-controll… │      Step 2 - 1.36s (Pushing clux/controller:tilt-41cd18d1e986baac)
doc-controll… │      Step 3 - 0.02s (Deploying)
doc-controll… │      DONE IN: 4.18s
doc-controll… │
doc-controll… │
doc-controll… │ Tracking new pod rollout (doc-controller-5787c66f9-jtd7v):
doc-controll… │      ┊ Scheduled       - <1s
doc-controll… │      ┊ Initialized     - (…) Pending
doc-controll… │      ┊ Ready           - (…) Pending
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Container image "docker.io/istio/proxyv2:1.13.1" already present on machine
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Container image "clux/controller:tilt-41cd18d1e986baac" already present on machine
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Container image "docker.io/istio/proxyv2:1.13.1" already present on machine
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Container image "clux/controller:tilt-41cd18d1e986baac" already present on machine
doc-controll… │ 2023-03-14T10:33:03.303910Z  INFO actix_server::builder: starting 16 workers
doc-controll… │ 2023-03-14T10:33:03.310268Z  INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime
doc-controll… │ 2023-03-14T10:33:03.390673Z DEBUG HTTP{http.method=GET http.url=https://10.96.0.1/apis/kube.rs/v1/documents?&limit=1 otel.name="list" otel.kind="client"}: kube_client::client::builder: requesting
doc-controll… │ 2023-03-14T10:33:03.410695Z ERROR HTTP{http.method=GET http.url=https://10.96.0.1/apis/kube.rs/v1/documents?&limit=1 otel.name="list" otel.kind="client" otel.status_code="ERROR"}: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
doc-controll… │ 2023-03-14T10:33:03.410724Z ERROR controller::controller: CRD is not queryable; HyperError(hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }))). Is the CRD installed?
doc-controll… │ 2023-03-14T10:33:03.410734Z  INFO controller::controller: Installation: cargo run --bin crdgen | kubectl apply -f -
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Back-off restarting failed container
doc-controll… │ WARNING: Detected container restart. Pod: doc-controller-5787c66f9-jtd7v. Container: doc-controller.
doc-controll… │ 2023-03-14T10:33:04.304458Z  INFO actix_server::builder: starting 16 workers
doc-controll… │ 2023-03-14T10:33:04.385413Z  INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime
doc-controll… │ 2023-03-14T10:33:04.390388Z DEBUG HTTP{http.method=GET http.url=https://10.96.0.1/apis/kube.rs/v1/documents?&limit=1 otel.name="list" otel.kind="client"}: kube_client::client::builder: requesting
doc-controll… │ 2023-03-14T10:33:04.416035Z  WARN kube_client::client: Unsuccessful data error parse: 404 page not found
doc-controll… │
doc-controll… │ 2023-03-14T10:33:04.416054Z DEBUG kube_client::client: Unsuccessful: ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 } (reconstruct)
doc-controll… │ 2023-03-14T10:33:04.416061Z ERROR controller::controller: CRD is not queryable; Api(ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 }). Is the CRD installed?
doc-controll… │ 2023-03-14T10:33:04.416065Z  INFO controller::controller: Installation: cargo run --bin crdgen | kubectl apply -f -
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Back-off restarting failed container
WARNING: You are running Kind without a local image registry.
Tilt can use the local registry to speed up builds.
Instructions: https://github.com/tilt-dev/kind-local
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Back-off restarting failed container
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Search Line limits were exceeded, some search paths have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local tail244ec.ts.net influxdata.io a.influxcloud.net
doc-controll… │ [event: pod doc-controller-5787c66f9-jtd7v] Container image "clux/controller:tilt-41cd18d1e986baac" already present on machine
doc-controll… │ WARNING: Detected container restart. Pod: doc-controller-5787c66f9-jtd7v. Container: doc-controller.
doc-controll… │ 2023-03-14T10:33:19.787903Z  INFO actix_server::builder: starting 16 workers
doc-controll… │ 2023-03-14T10:33:19.794091Z  INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime
doc-controll… │ 2023-03-14T10:33:19.799238Z DEBUG HTTP{http.method=GET http.url=https://10.96.0.1/apis/kube.rs/v1/documents?&limit=1 otel.name="list" otel.kind="client"}: kube_client::client::builder: requesting
doc-controll… │ 2023-03-14T10:33:19.892258Z  WARN kube_client::client: Unsuccessful data error parse: 404 page not found
doc-controll… │
doc-controll… │ 2023-03-14T10:33:19.892278Z DEBUG kube_client::client: Unsuccessful: ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 } (reconstruct)
doc-controll… │ 2023-03-14T10:33:19.892285Z ERROR controller::controller: CRD is not queryable; Api(ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 }). Is the CRD installed?
doc-controll… │ 2023-03-14T10:33:19.892290Z  INFO controller::controller: Installation: cargo run --bin crdgen | kubectl apply -f -
doc-controll… │ WARNING: Detected container restart. Pod: doc-controller-5787c66f9-jtd7v. Container: doc-controller.
doc-controll… │ 2023-03-14T10:33:48.796764Z  INFO actix_server::builder: starting 16 workers
doc-controll… │ 2023-03-14T10:33:48.803127Z  INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime
doc-controll… │ 2023-03-14T10:33:48.886924Z DEBUG HTTP{http.method=GET http.url=https://10.96.0.1/apis/kube.rs/v1/documents?&limit=1 otel.name="list" otel.kind="client"}: kube_client::client::builder: requesting
doc-controll… │ 2023-03-14T10:33:48.912853Z  WARN kube_client::client: Unsuccessful data error parse: 404 page not found
doc-controll… │
doc-controll… │ 2023-03-14T10:33:48.912870Z DEBUG kube_client::client: Unsuccessful: ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 } (reconstruct)
doc-controll… │ 2023-03-14T10:33:48.912876Z ERROR controller::controller: CRD is not queryable; Api(ErrorResponse { status: "404 Not Found", message: "\"404 page not found\\n\"", reason: "Failed to parse error data", code: 404 }). Is the CRD installed?
doc-controll… │ 2023-03-14T10:33:48.912881Z  INFO controller::controller: Installation: cargo run --bin crdgen | kubectl apply -f -

add integration tests for the crd

in particular find a nice, standard way to test what happens when a new crd instance is added and modified

we don't want to test the full state machinery here (this should be handled by #22),
but we do want to have an e2e like test here that verifies that the controller performs the right action when kubectl applies a test crd.

this is a harder issue though, kube-rs does not have amazing support for this yet.

migrate ci to github actions and push two variants of the image

to stay in line with the rest of ci in the org. plus make optional features less awkward (currently people have to build it locally to test this against a cluster without otel, and it should be fairly easy to avoid that).

  • port github actions from version-rs
  • push an -no-telemetry tag-suffixed image

automate tracing tests

Currently we have a single manual test for checking that our trace setup produces valid trace ids (TraceId != TraceId::INVALID). See #45

To test this we need a valid tracer that (afaikt) needs to talk to something.

The easiest way to currently test this is to setup a tempo instance using something like:

replicas: 1
tempo:
  retention: 24h
  authEnabled: false
  server:
    http_listen_port: 3100
  storage:
    trace:
      backend: local
      local:
        path: /var/tempo/traces
      wal:
        path: /var/tempo/wal
  receivers:
    otlp:
      protocols:
        http:
          endpoint: "0.0.0.0:55681"
        grpc:
          endpoint: "0.0.0.0:4317"
    jaeger:
      protocols:
        grpc:
          endpoint: "0.0.0.0:14250"
persistence:
  enabled: false

as values for grafana/tempo chart in (tested against 1.0.1 last).

then run just forward-tempo to redirect traffic to that when doing cargo test locally.

Once tempo is being forwarded to we can run just test-telemetry.

We should do one of:

Option 1: Find a way to get valid traceIds without requiring a valid collector

I couldn't figure out how. The mock NoopSamplers returns invalid traces: #45 (comment)

Option 2: Install tempo into the integration test cluster

via something like

      - uses: azure/setup-helm@v3
      - run: helm install grafana/tempo --wait -f values.yaml
      - run: just forward-tempo
      - run: just test-telemetry &

then run the integration tests there

Option 3: Run tests manually (current)

Run these tests manually when upgrading opentelemetry/tracing ecosystem. Not ideal, but these tests also doesn't give you the biggest signal, and there's an argument to be made for not having more complicated integration tests setups for a test controller.

Option 4: Find alternate tests

provided we have consistent versions of opentelemetry this should really never break.
maybe we can replace this integration test with some cargo tree based duplicate version check instead.
then on the other hand, it's nice to check that this all works, so probably not.

add linting of yaml

doing helm here now so should do some basic linting in the lint job

  • helm template | kubeconform to lint the schema
  • helm chart lint to lint values

exemplars support

We have Loki -> Tempo support so people can discover bad reconcile traces from a controller's logs. However, it would be much easier to do this based on exemplars in the tail end of its new histogram.

This currently isn't working. Here's a WIP issue.

I have a hacky implementation of exmplars in tikv/rust-prometheus#395.
With the use in master, it outputs:

# HELP foo_controller_handled_events handled events
# TYPE foo_controller_handled_events counter
foo_controller_handled_events 3
# HELP foo_controller_reconcile_duration_seconds The duration of reconcile to complete in seconds
# TYPE foo_controller_reconcile_duration_seconds histogram
foo_controller_reconcile_duration_seconds_bucket{le="0.01"} 0
foo_controller_reconcile_duration_seconds_bucket{le="0.1"} 0
foo_controller_reconcile_duration_seconds_bucket{le="0.25"} 0
foo_controller_reconcile_duration_seconds_bucket{le="0.5"} 0
foo_controller_reconcile_duration_seconds_bucket{le="1"} 0
foo_controller_reconcile_duration_seconds_bucket{le="5"} 0
foo_controller_reconcile_duration_seconds_bucket{le="15"} 3 # {trace_id="27c2e480c02d586c98934828324eeb9a"} 9 1617533722.954
foo_controller_reconcile_duration_seconds_bucket{le="60"} 3
foo_controller_reconcile_duration_seconds_bucket{le="+Inf"} 3
foo_controller_reconcile_duration_seconds_sum 25
foo_controller_reconcile_duration_seconds_count 3

which SHOULD be in line with the openmetric spec on exemplars
even matches the exemplar example

promtool 2.26 does not give good info on this (but then, not sure if it has support yet, exemplars are experimental thus far.

kubectl port-forward svc/foo-controller 8080:80
curl 0.0.0.0:8080/metrics -sSL | ./promtool check metrics
error while linting: text format parsing error in line 12: expected integer as timestamp, got "#"

but looks like the grafan agent (0.13) also fails to scrape it:

kubectl port-forward -n monitoring grafana-agent-5gkqg 8000:80
curl http://0.0.0.0:8000/agent/api/v1/targets | jq
...
      "last_scrape": "2021-04-04T10:40:08.843113131Z",
      "scrape_duration_ms": 7,
      "scrape_error": "expected timestamp or new record, got \"MNAME\""

so we are probably blocked upstream on scraper not understanding the comment hash.

Image that SHOULD work: clux/controller:0.9.3

get the k3d local registry working with tilt

supposedly this is pretty easy. made a registry node for my k3d, but it still pushed to my dockerhub. might need to create the cluster with the either --registry-create or --registry-use.

anyway want the tilt up experience working for users, and currently it won't because it needs to push to my dockerhub.

  • try to follow https://k3d.io/usage/guides/registries/#using-a-local-registry and make it work
  • link to that guide in the readme (we already mention the local registry)
  • keep it simple enough so that we don't have to pollute the readme in this repo with linux specifics
  • some fallback mechanic if they don't have a k3d cluster?
  • test with k3d 5

suggestions welcome, coming back to this at some point.

build fails with just 1.13.0

$ uname -sm
Darwin arm64
$ just --version
just 1.13.0
$ just compile
error: Expected backtick, identifier, '(', ')', '/', or string, but found '{'
   |
47 | _build features="": (compile {{features}})
   |

I noticed the issue because I tried tilt up, which uses just

$ tilt up
Tilt started on http://localhost:10350/
v0.32.0, built 2023-03-13

(space) to open the browser
(s) to stream logs (--stream=true)
(t) to open legacy terminal mode (--legacy=true)
(ctrl-c) to exit
Tilt started on http://localhost:10350/
v0.32.0, built 2023-03-13

Initial Build
Loading Tiltfile at: /Users/mkm/Build/controller-rs/Tiltfile
compiling with features:
Successfully loaded Tiltfile (6.091125ms)
WARNING: You are running Kind without a local image registry.
Tilt can use the local registry to speed up builds.
Instructions: https://github.com/tilt-dev/kind-local
      compile │
      compile │ Initial Build
      compile │ Running cmd: just compile
      compile │ error: Expected backtick, identifier, '(', ')', '/', or string, but found '{'
      compile │    |
      compile │ 47 | _build features="": (compile {{features}})
      compile │    |                              ^
      compile │ ERROR: just compile exited with exit code 1
      compile │ ERROR: Build Failed: Command "just compile" failed: exit status

trace and span IDs are invalid (zero)

The TraceId returned by telemetry::get_trace_id is invalid (zero). This is also the case for the SpanId. I suspect it's something to do with the integration between tracing and opentelemetry (in src/telemetry.go):

///  Fetch an opentelemetry::trace::TraceId as hex through the full tracing stack
pub fn get_trace_id() -> TraceId {
    use opentelemetry::trace::TraceContextExt as _; // opentelemetry::Context -> opentelemetry::trace::Span
    use tracing_opentelemetry::OpenTelemetrySpanExt as _; // tracing::Span to opentelemetry::Context

    tracing::Span::current()
        .context()
        .span()
        .span_context()
        .trace_id()
}

Running locally against my Kubernetes cluster, with the lorem example doc created, the trace_id value is always zero:

2023-03-06T03:00:22.530885Z INFO reconciling object{object.ref=Document.v1.kube.rs/lorem.default object.reason=object updated}:reconcile{trace_id=00000000000000000000000000000000}: controller::controller: Reconciling Document "lorem" in default

Adding a getter and Field for span_id shows similar:

2023-03-06T04:12:32.778508Z INFO reconciling object{object.ref=Document.v1.kube.rs/lorem.default object.reason=object updated}:reconcile{trace_id=00000000000000000000000000000000 span_id=0000000000000000}: controller::controller: Reconciling Document "lorem" in default

how to implement a reconcile function

Hi, thanks to having provided this inspiring project. If I'm not wrong, the operator acts like a k8s controller, creating the resource reflector and managing the control loop in the state::init for keeping in sync the store. I'm wondering where/how I can define a reconcile function in the operator itself to trigger other behaviors accordingly to event notifications. I mean, if I want to create new k8s objects after a new Foo resource has been added, what is your suggestion to implement this? Thanks

add finalizer usage on the new crd

after #21 we should have finalizer support herein.

adapt the secret_syncer example from kube-rs to add support for finalizers to the crd herein.

if possible, it could be a compile-time feature. but that's a nice-to-have, if it ends up looking ugly, let us just have it enabled always.
setting a finalizer would require us having a non-trivial cleanup loop so we would need #21 done.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.