tektoncd / plumbing Goto Github PK

View Code? Open in Web Editor NEW

60.0 10.0 110.0 16.08 MB

This repo holds configuration for infrastructure used across the tektoncd org 🏗️

License: Apache License 2.0

Go 54.10% Shell 30.70% Makefile 2.84% Dockerfile 7.76% HTML 0.78% Python 3.68% Smarty 0.14%

tekton pipeline plumbing infrastructure test-infrastructure test-infra hacktoberfest

plumbing's Introduction

Plumbing

This repo holds configuration for infrastructure used across the tektoncd org 🏗️:

Tekton resources for:
- Continuous Delivery: release projects, build docker images and other periodic jobs
- Continuous Integration: run CI jobs for the various Tekton Projects. NOTE this responsibility is used shared with Prow
Prow manifests and configuration for:
- Continuous Integration: run CI jobs, merge approved changes (via Tide)
- Support functionality via various plugins
Ingress configuration for access via tekton.dev
Gubernator is used for holding and displaying Prow logs
Boskos is used to control a pool of GCP projects which end to end tests can run against
Peribolos is used to control org and repo permissions
bots
custom interceptors, used for Tekton based CI
Catlin, which provides validation for catalog resources

Automation runs in the tektoncd GCP projects.

More details on the infrastructure are available in the documentation.

Support

If you need support, reach out in the tektoncd slack via the #plumbing channel.

Members of the Tekton governing board have access to the underlying resources.

plumbing's People

Contributors

Stargazers

Watchers

Forkers

bobcatfish vdemeester dlorenc chmouel wlynch houshengbo ncskier dibyom imjasonh carolynmabbott 16yuki0702 afrittoli syldej azkaoru mnitta mnuttall quarkus-snapshot-12-16-2019 danielhelfand natalieparellano bernie990 piyush-garg tomgeorge guitcastro eddycharly timmyers adshmh gijsvandulmen chetan-rns nikhil-thomas sthaha fenglixa gabemontero savitaashture jerop jacobhjkim snehlatamohite gwonsoolee andreafrittoli alangreene puneetpunamiya sm43 pritidesai gyliu513 pratap0007 barthy1 hixio-mh a-roberts devopstoday11 jimmyma yaoxiaoqi popcor255 vinamra28 pradeepitm12 marcusmaday fiunchinho marcelmue rajesh-ibm-power praveen4g0 svghadi jmcshane priyawadhwa boost-entropy-k8s justaugustus mrutkows laassiliawissal openshift-pipelines zhouhaibing089 xauthulei jkdihenkar vaikas lbernick frerikandriessen spidersouleater abayer doytsujin sayan-biswas chases2 yongxuanzhang jacksgt manojkumar238 yuzp1996 abdelazizmohamed calizarr lastravex jeromeju t0masd chitrangpatel lcarva quanzhang-william jagathprakash bnallapeta enarha basavaraju-g patrickfarrell khrm qpc-github quantum-platinum-cloud ghatwala chuangw6 xinredhat

plumbing's Issues

Process: Update test-runner image

I recently updated knative/test-infra to update githubhelp to reduce the number of API calls it makes for listing changed files - knative/test-infra#1555. I'd like to update the prow image so prow can actually run tests for tektoncd/pipeline#1607.

test-runner uses the latest clone (perhaps we should also pin this similar to #125?), so all we need to regen the test-runner image as is. I don't have permissions to do this, so someone else needs to. 🙃

Add more boskos projects

Expected Behavior

We should have enough boskos clusters available that tests are able to run across all our projects.

Actual Behavior

Sometimes projects fail with:

I0528 16:44:41.318] 2019/05/28 16:44:41 main.go:312: Something went wrong: failed to prepare test environment: --provider=gke boskos failed to acquire project: resource not found

Which we think is caused when too many tests are running at the same time

(e.g. tektoncd/pipeline#909)

Additional Info

https://github.com/tektoncd/plumbing#boskos

Automate config.yaml prow configuration

The prow/config.yaml is getting bigger and bigger — but lot's of configuration share the same content. We may want to have a smaller configuration file that we use to generate the final config.yaml.

Inspiration comes from knative/test-infra

/kind question

Find a home for `taps` formulas

For the CLI we want to have a homebrew tap repository to install easily on OSX.

We have currently one on my own github user at :

https://github.com/chmouel/homebrew-tektoncd-cli/

which is generated at release time with goreleaser :

https://github.com/tektoncd/cli/blob/master/.goreleaser.yml#L48-L64

It would be better we have this to something more official, the solutions i can see are :

Move to the official core Formulas on https://github.com/Homebrew/homebrew-core/

Pros:

it's official yeah!
It's easier for our user (no tap to tap)

Cons:

We are dependent of being approved before release announcement etc..
We can't use goreleaser (not kosher enough for core formula), it's a manual update and no one (or at least me :)) like the extra release steps.

Have a tektoncd/homebrew-tools

Pros:

Related: tektoncd/cli#99

User just have to do brew tap tektoncd/tools and this will subscribe to the tektoncd repo.
We can then have multiple formulas in there for different tools that we need.

Cons:

I can't see much to be honest, but opinions may diverge

Another suggestion has been made to have it directly in the tektoncd/cli repo, but brew require an adjacent repo with a homebrew prefix. :(

My vote would be to create a new github repo and use it as tap and not having in the official core repo.

/cc @siamaksade @vdemeester @sthaha

/kind question

Configure the github release task to use a functional user

Expected Behavior

When drafting a GitHub release using the create-draft-release task, the release should be created with the token of a functional user.

Actual Behavior

It is created using @afrittoli token.

Migrate tektoncd project to Tekton (instead of prow)

This is kind of an Epic issue for moving towards having Tekton building Tekton.
The ultimate goal is to have all CI/CD (aka PR builds, periodics builds, release builds, commands handling, …) of the tektoncd projects handled by tektoncd projects (aka tektoncd/pipeline, tektoncd/triggers, …).

Tektoncd project convention for pipeline-as-code

This has been discussed a tiny bit during last working group.

As of today, each project that uses Tekton to do some work (release most likely), is using a tekton folder. Now that we are getting more into dogfooding and that we are able to start experimenting with pipeline-as-code (thanks to tektoncd/triggers and #100, we should try to define a convention for the tektoncd projects (nothing official yet).

Few questions we may want to answer

One file vs a folder with multiple files possible ?
Hidden folder or not (aka tekton/ vs .tekton/) ?
Experimenting with a DSL (from experimental maybe) or stick with plain yaml ?

All those questions do not need to be answer at the same time, and the answer might evolve over time. This issue is there to track the work around that.

/cc @bobcatfish @dibyom @skaegi @chmouel @mnuttall @imjasonh @vtereso @afrittoli @dlorenc @sbwsg @akihikokuroda @abayer

Linting should catch exported values without docstrings

Expected Behavior

If someone opens a PR with exported values that don't have docstrings, linting should prompt them to add a docstring.

Actual Behavior

Nothing happens! e.g. in tektoncd/triggers#63

Steps to Reproduce the Problem

Add a new type to a _types.go in the triggers repo or pipeline repo, e.g.

type SomeNewThing struct {
	TriggerBinding     TriggerBindingRef  `json:"triggerBinding"`
	TriggerTemplate    TriggerTemplateRef `json:"triggerTemplate"`
	ServiceAccountName string             `json:"serviceAccountName"`
}

Open a PR
Note that the linting doesnt say anything! But actually if you opened this in an editor (e.g. vscode with the Go plugin) you'd get a warning that there should be docstrings for SomeNewThing and its attributes

Additional Info

From @vtereso :

Adding `-E golint` to our lint command (`golangci-lint run`) _should_ allow us to get [this behavior](https://github.com/golang/lint/blob/master/lint.go), but when I tried it only complained about naming stutter errors.

(Those stutter errors are probably decent to surface as well imo! Why not :D )

Build log is missing in the pipeline nightly release bucket

Expected Behavior

The pipeline log should be stored along to the release.yaml in the bucket

Actual Behavior

It is not (anymore) https://console.cloud.google.com/storage/browser/tekton-releases-nightly/pipeline/latest?project=tekton-nightly

Setup controlled access to Tekton Dashboard for the dogfooding cluster

Expected Behavior

Project maintainers should be able to use the Tekton Dashboard for the dogfooding cluster, with no need to have direct access to the k8s cluster itself.

Actual Behavior

The only way to access the cluster today is via port-forwarding.

Additional Info

We'll need a dedicated subdomain of tekton.dev
We need an auth system, it could be a web-server with HTTP basic auth or IAP on GKE
We need automation and documentation for the whole setup

Eventually it might be nice to get authentication integrated in the tekton dashboard itself.

tekton-releases bucket organization

Currently, we are using tekton-releases bucket to store releases yamls for tektoncd/pipeline. We may want to use it for other tektoncd projects, like dashboard (that also has a yaml or two for releases). My question is:

Should we create another bucket for those, like dashboard-releases ?
Should we have some sort of folder layout for the future ?

pipeline/
  latest/
  previous/
dashboard/
  latest/
  previous/

/cc @bobcatfish

Update version of `go` in base image

Expected Behavior

Our unit tests as run by Prow (which is configured to use gcr.io/tekton-releases/tests/test-runner@sha256:a4a64b2b70f85a618bbbcc6c0b713b313b2e410504dee24c9f90ec6fe3ebf63f ) should catch race conditions such as tektoncd/pipeline#1124.

Actual Behavior

The unit tests are run with -race (you can find the code for this somewhere in library.sh in the vendored scripts) but they don't catch the issue. I'm guessing this is b/c of the version of go inside the image, but not sure:

docker run --entrypoint go gcr.io/tekton-releases/tests/test-runner@sha256:a4a64b2b70f85a618bbbcc6c0b713b313b2e410504dee24c9f90ec6fe3ebf63f version      
go version go1.11.4 linux/amd64

Steps to Reproduce the Problem

Checkout tekton pipelines before the fix in tektoncd/pipeline#1308
Run go test ./... -race - see the data race detected
Run the tests in the image we use with prow instead, e.g.:

# from the dir you've checked out tekton pipelines into
docker run -it -v `pwd`:/go/src/github.com/tektoncd/pipeline/ --entrypoint /bin/bash gcr.io/tekton-releases/tests/test-runner@sha256:a4a64b2b70f85a618bbbcc6c0b713b313b2e410504dee24c9f90ec6fe3ebf63f
$ cd /go/src/github.com/tektoncd/pipeline/
$ go test ./... -race

See that there is no data race!

Automate obtaining a test cluster

Expected Behavior

We need a Tekton based way to obtain a test cluster for testing, e.g. when doing a release we need a test cluster to test the release against.

Actual Behavior

At the moment we can either use a pre-provisioned cluster or use test-infra tools to claim a cluster from Boskos, but that's not very "Tektonik".
I was thinking we could have one (or both):

an interceptor that can allocate a test cluster and enrich the binding with the name of a cluster resource that embeds credentials for access
a tekton task that allocates the cluster (from Boskos? from scratch?) and builds the cluster resource

How do we cleanup the cluster once the job is done? I guess in a similar way we could trigger cleanup once the test pipeline is complete. We might also need a way to garbage collect them in case of missed signalling.

Steps to Reproduce the Problem

Additional Info

Replace Boskos StatefulSet with Deployment

Expected Behavior

If you make changes to the boskos configuration and apply it, you should be able to see the changes in the resulting pod.

Actual Behavior

The boskos StatefulSet has update strategy "ondelete" so you have to delete the pod before it'll get the update.

Steps to Reproduce the Problem

Change the image in boskos/boskos.yaml
Apply the config
Watch nothing happen

Additional Info

The k8s prow folks seem to have gone through this same cycle kubernetes/test-infra#11956 and are now using a Deployment (and they got rid of the PVC as well) https://github.com/kubernetes/test-infra/pull/13594/files

Tag catalog repository on pipeline release

When we do a major pipeline release it would be useful to tag the tektoncd/catalog repository.

/cc @sthaha

/assign

Jan 8 Nightly Pipeline Release failed

The jan 8 nightly release of Tekton Pipelines failed, so the most recent release is Jan 7. I found the pod that failed with:

 kubectl --context dogfood get pods -l tekton.dev/pipelineRun=pipeline-release-nightly-4zc78-dv7rz

The tag-images step failed and the logs ended with:

+ for REGION in "${REGIONS[@]}"
+ for TAG in "latest" v20200109-74bf8b82bb
+ gcloud -q container images add-tag gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:50ed8fc999392f349aea82f9ca7c7b85fd76a319ab3b3469c08f65a8f9cb96aa asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter:latest
Created [asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter:latest].
Updated [gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:50ed8fc999392f349aea82f9ca7c7b85fd76a319ab3b3469c08f65a8f9cb96aa].
+ for TAG in "latest" v20200109-74bf8b82bb
+ gcloud -q container images add-tag gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:50ed8fc999392f349aea82f9ca7c7b85fd76a319ab3b3469c08f65a8f9cb96aa asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter:v20200109-74bf8b82bb
Created [asia.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter:v20200109-74bf8b82bb].
Updated [gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/kubeconfigwriter@sha256:50ed8fc999392f349aea82f9ca7c7b85fd76a319ab3b3469c08f65a8f9cb96aa].
+ for IMAGE in "${BUILT_IMAGES[@]}"
+ IMAGE_WITHOUT_SHA=gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init:v20200109-74bf8b82bb
+ IMAGE_WITHOUT_SHA_AND_TAG=gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init
+ IMAGE_WITH_SHA=gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init@sha256:e475ad7e4fa81844f5a08c606be20f8af6830aa8aca1ef9e62684f5f65b472b9
+ gcloud -q container images add-tag gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init@sha256:e475ad7e4fa81844f5a08c606be20f8af6830aa8aca1ef9e62684f5f65b472b9 gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init:latest
Created [gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init:latest].
Updated [gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init@sha256:e475ad7e4fa81844f5a08c606be20f8af6830aa8aca1ef9e62684f5f65b472b9].
+ for REGION in "${REGIONS[@]}"
+ for TAG in "latest" v20200109-74bf8b82bb
+ gcloud -q container images add-tag gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init@sha256:e475ad7e4fa81844f5a08c606be20f8af6830aa8aca1ef9e62684f5f65b472b9 us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init:latest
ERROR: Error during upload of: us.gcr.io/tekton-nightly/github.com/tektoncd/pipeline/cmd/creds-init:latest
ERROR: gcloud crashed (V2DiagnosticException): response: {'status': '504', 'content-length': '82', 'x-xss-protection': '0', 'transfer-encoding': 'chunked', 'server': 'Docker Registry', '-content-encoding': 'gzip', 'docker-distribution-api-version': 'registry/2.0', 'cache-control': 'private', 'date': 'Thu, 09 Jan 2020 02:16:19 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json'}
Unable to determine the upload's size.: None

If you would like to report this issue, please run the following command:
  gcloud feedback

To check gcloud for common problems, please run the following command:
  gcloud info --run-diagnostics

So actually most of the process completed, but partway through tagging the images, gcloud crashed, I guess after receiving a 504?

I think this is probably just a flake (we'll see tomorrow) but if we see this again we might want to try to make the tagging steps more robust (e.g. maybe backoff and retry in that case?)

Make the Mario bot production ready

The current Mario bot demo works, but we need to properly automate deployment and maintenance. Things that we need to do:

obtain and automate setup of DNS name + TLS for mario-bot webhook(s)
setup an ingress linked to that name and to mario event listener service
automate the creation of the github secret and the deployment of the webhook
setup a github account for mario-bot
cd mario-bot's tekton resources
refine and merge #1

The code of the interceptor is pretty rough. Configuration params needs to be extracted and added to the interceptor manifest. We also need to define how the mapping command -> pipeline will look like in terms of interceptor + binding + listener + trigger + pipeline.

Pipeline nightly release sometimes fails on linting timeout

Expected Behavior

The pipeline works

Actual Behavior

The linting task sometimes fails

Steps to Reproduce the Problem

Trigger a pipelinerun - or even directly the linting task:

k create job --from=cronjob/pipeline-cron-trigger-pipeline-nightly-release pipeline-cron-trigger-pipeline-nightly-release-timeout

Additional Info

level=info msg="[runner] Issues before processing: 311, after processing: 0"
level=info msg="[runner] Processors filtering stat (out/in): filename_unadjuster: 311/311, exclude-rules: 1/1, identifier_marker: 215/215, exclude: 1/215, nolint: 0/1, cgo: 311/311, path_prettifier: 311/311, skip_files: 311/311, skip_dirs: 215/311, autogenerated_exclude: 215/215"
level=info msg="[runner] processing took 163.75751ms with stages: skip_dirs: 41.343572ms, exclude: 35.927496ms, identifier_marker: 33.289256ms, path_prettifier: 28.207542ms, filename_unadjuster: 11.651277ms, autogenerated_exclude: 10.700937ms, nolint: 1.859581ms, cgo: 768.229µs, max_same_issues: 3.275µs, diff: 1.211µs, uniq_by_line: 1.107µs, max_from_linter: 842ns, source_code: 761ns, skip_files: 725ns, path_shortener: 715ns, exclude-rules: 659ns, max_per_file_from_linter: 325ns"
level=info msg="[runner] linters took 2m0.222550318s with stages: goanalysis_metalinter: 1m43.982837899s, unused: 16.075581672s"
level=info msg="File cache stats: 0 entries of total size 0B"
level=error msg="Timeout exceeded: try increase it by passing --timeout option"
level=info msg="Memory: 2498 samples, avg is 237.0MB, max is 2297.8MB"
level=info msg="Execution took 5m3.295881998s"

Automated release for tektoncd projects

It should be possible for any projects in the tektoncd umbrella to do release without having to access the prow “production” cluster. Owner from a project should be able to create a release branch, maybe configure prow to build that release (ideally it would be automated) and it should just works.

As of today, tektoncd/pipeline and tektoncd/cli releases are done manually, accessing the prow “production” cluster.

Design docs is here

/area prow
/kind feature

PipelineRun log app should be deployable with ko

Expected Behavior

The PipelineRun log app in plumbing can be deployed to a cluster with ko to make testing easier.

Actual Behavior

The log app is only deployable manually with the deployment yaml in the repo at the moment, not with ko.

Add a Tekton based check to enforce a `kind` label

Expected Behavior

To help release automation, it should not be possible to merge a PR if it doesn't have a "kind" label set.

Implementation plan

Create a new trigger that listens for PR changes.
Create a task that fetches the list of labels, if zero or more then one are found the check is failed.
Update the PR from the task with check passed or failed - and instructions on how to add a label if it's missing.

Isolate certain Pipeline examples into their own namespace

Desirable Behavior

A couple of the examples we have in Pipeline need to run in their own namespace in order to test them without leaking some state into other running examples. Specifically the LimitRangeName examples which currently have to live under a no-ci directory so they don't run. If run in the same namespace as other examples they will cause some minor explosions that prevent the PR from passing checks.

So it'd be desirable for the examples to either choose their own namespace name (using whatever default that is used currently if they don't specify one) OR we could run each example in its own tiny namespace that gets torn down after it finishes.

Make Coverage report consistent

Expected Behavior

Coverage job should consistently produce a report for all PRs.

Actual Behavior

When it works, it works. But sometimes, the job passes but it does not actually produce a report that we can view e.g. tektoncd/pipeline#1757

Steps to Reproduce the Problem

This seems to be happening intermittently

Additional Info

Coverage job might be using a different golang version
Maybe we can use coveralls.io as an alternative instead?

TektonCD catalog tests e2e on pipeline PR

As we have seen from the $() to ${} syntax move, API changes is happenning and breaks user templates.

It would be nice to have a job on pipeline PR to identify them that checkout latest catalog and try to apply them with the PR's pipeline install.

I think we should make it as a passive job, since updating both repositories at the same time can get us to the chicken and egg problem,

(there probably should be a CLI story as well in the future when we get some good e2e testing in there)

Towards tektoncd project with no dependencies to plumbing

Related to tektoncd/triggers#180 (comment) — use go modules for tektoncd/triggers. This issue aims to discuss and track a way to make so that tektoncd projects do not need explicit dependency to tektoncd/plumbing.

hm interesting! I think this highlights something very odd (and historical) about our plumbing repo - we were using go to vendor in bash scripts basically thinking (is there anything besides the bash scripts we need from plumbing?)

(My experiences with git submodules in the past were painful but I can't remember the specifics so maybe it's okay?)

I think no matter what we should work toward not sharing scripts from plumbing like this, so this is more motivation to migrate from the bash scripts to some combo of

Tekton Tasks (shared via one of @vdemeester 's catalog designs? tektoncd/pipeline#964)

Code (I thought python made a lot of sense but then we're going to have to start publishing python packages? maybe sticking to Go makes more sense)

Images

Couple ideas:

Use git submodule for now

Update whatever actually needs the plumbing scripts to fetch them (probably our .sh e2e scripts)

Hold off on this until we migrate away from the bash scripts

I'm partial to (2) tho it's more work but would settle for (1) (can always do 2 as a next step :))

Today, most of the projects (tektoncd/pipeline, tektoncd/cli, …) use a small hack with dep or go mod to get a specific version of pluming scripts as a go dependency. As @bobcatfish highlight, this highlight an historical odd thing with plumbing (inherited from knative).

We should aim towards a tektoncd/plumbing dependency-free. Having all project using tekton pipelines is the end goal and would make those projects standing on their own, tektoncd/plumbing would be the infra setup and cross project CI/CD if any. That said, as discussed on the issue with @bobcatfish, we should do this in several step, ensuring all project are doing the same (easier to follow and adapt).

Here are my initial proposals

For now, either keep with the hack or use git submodules
- tektoncd/experimental is using something different too, so we are gonna need to change that in any case
- Please shout if you don't want git submodules 😝. I (and @bobcatfish too) have little experience with them in the past few years so, not sure what to think about it yet.
Bake the current script in the test-runner image used for CI (with prow), so that we don't need to have any as dependency. This would force us to use script only for CI and make sure anything can run locally without those scripts (and document this)
Use Tekton to build Tekton projects, using our own pipeline-as-code as convention (nothing official at first, just the convention we use/experiment with in tektoncd organization).

/cc @bobcatfish @dibyom @skaegi @chmouel @mnuttall @imjasonh @vtereso @afrittoli @dlorenc @sbwsg

Figure out why compute networks are leaking

We keep leaking test projects because of errors cleaning up compute networks in GCP. I've seen this a few times:

{"error":"exit status 1","level":"error","msg":"failed to clean up project tekton-prow-5, error info: Activated service account credentials for: [[email protected]]\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/zones/us-central1-f/instances/gke-tdashboard-e2e-cls11-default-pool-ce7baba3-3c3j].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/zones/us-central1-a/instances/gke-tdashboard-e2e-cls11-default-pool-9ba771cf-6v2z].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/zones/us-central1-c/instances/gke-tdashboard-e2e-cls11-default-pool-fd962099-kjkv].\nERROR: (gcloud.compute.disks.delete) Could not fetch resource:\n - The disk resource 'projects/tekton-prow-5/zones/us-central1-f/disks/gke-tdashboard-e2e-cls11-default-pool-ce7baba3-3c3j' is already being used by 'projects/tekton-prow-5/zones/us-central1-f/instances/gke-tdashboard-e2e-cls11-default-pool-ce7baba3-3c3j'\n\nError try to delete resources: CalledProcessError()\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/zones/us-central1-a/disks/gke-tdashboard-e2e-cls-pvc-c77e294f-8155-11e9-8601-42010a800ff0].\nERROR: (gcloud.compute.disks.delete) Could not fetch resource:\n - The disk resource 'projects/tekton-prow-5/zones/us-central1-a/disks/gke-tdashboard-e2e-cls11-default-pool-9ba771cf-6v2z' is already being used by 'projects/tekton-prow-5/zones/us-central1-a/instances/gke-tdashboard-e2e-cls11-default-pool-9ba771cf-6v2z'\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.disks.delete) Could not fetch resource:\n - The disk resource 'projects/tekton-prow-5/zones/us-central1-c/disks/gke-tdashboard-e2e-cls11-default-pool-fd962099-kjkv' is already being used by 'projects/tekton-prow-5/zones/us-central1-c/instances/gke-tdashboard-e2e-cls11-default-pool-fd962099-kjkv'\n\nError try to delete resources: CalledProcessError()\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/gke-tdashboard-e2e-cls1133378684342767618-b8f6cd5c-all].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/gke-tdashboard-e2e-cls1133378684342767618-b8f6cd5c-ssh].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/gke-tdashboard-e2e-cls1133378684342767618-b8f6cd5c-vms].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-2wnao3jebww7ldrn463stwke].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-7mzjmae3tlidh4yoidvnpe53].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-ba5h4uquy4cktbsldj6ba2g3].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-pjf5geul3rfwfyd5mynbasya].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-rpeifunzrsxxfwyacvtmctpr].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-uuaca3r2wn7yqlbdyzue5lrn].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-x233wi3sx7onq4enxk4rf6mk].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-yieefqzg6fev2ygounhzgoee].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/k8s-e9a2e6c01b399a0a-node-http-hc].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/k8s-fw-abf21be97815511e99e8642010a800ff].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-knsku4qwwbtr3bhcf3y6vcmu].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/routes/gke-tdashboard-e2e-cls1133-89060406-8156-11e9-8601-42010a800ff0].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/routes/gke-tdashboard-e2e-cls1133-916eac97-8157-11e9-9e86-42010a800ff1].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/routes/gke-tdashboard-e2e-cls1133-ea76028e-8156-11e9-ad37-42010a800fef].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/regions/us-central1/forwardingRules/abf21be97815511e99e8642010a800ff].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/regions/us-central1/targetPools/abf21be97815511e99e8642010a800ff].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/httpHealthChecks/k8s-e9a2e6c01b399a0a-node].\nDeleting Managed Instance Group...\n............................................Deleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/zones/us-central1-f/instanceGroupManagers/gke-tdashboard-e2e-cls11-default-pool-ce7baba3-grp].\ndone.\nDeleting Managed Instance Group...\n..........................................Deleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/zones/us-central1-a/instanceGroupManagers/gke-tdashboard-e2e-cls11-default-pool-9ba771cf-grp].\ndone.\nDeleting Managed Instance Group...\n........................Deleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/zones/us-central1-c/instanceGroupManagers/gke-tdashboard-e2e-cls11-default-pool-fd962099-grp].\ndone.\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/instanceTemplates/gke-tdashboard-e2e-cls11-default-pool-ce7baba3].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/instanceTemplates/gke-tdashboard-e2e-cls11-default-pool-9ba771cf].\nDeleted [https://www.googleapis.com/compute/v1/projects/tekton-prow-5/global/instanceTemplates/gke-tdashboard-e2e-cls11-default-pool-fd962099].\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.subnets.delete) Could not fetch resource:\n - Invalid resource usage: 'Cannot delete auto subnetwork from an auto subnet mode network.'.\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.compute.networks.delete) Could not fetch resource:\n - The network resource 'projects/tekton-prow-5/global/networks/tdashboard-e2e-net1133378684342767618' is already being used by 'projects/tekton-prow-5/global/firewalls/tdashboard-e2e-net1133378684342767618-rpeifunzrsxxfwyacvtmctpr'\n\nError try to delete resources: CalledProcessError()\nERROR: (gcloud.container.clusters.list) ResponseError: code=404, message=Not Found.\nDeleting cluster tdashboard-e2e-cls1133378684342767618...\n...................................................................................................................................................................................................................................................................................................................done.\nDeleted [https://container.googleapis.com/v1/projects/tekton-prow-5/zones/us-central1/clusters/tdashboard-e2e-cls1133378684342767618].\n[=== Start Janitor on project 'tekton-prow-5' ===]\n[=== Activating service_account /etc/test-account/service-account.json ===]\n[=== Finish Janitor on project 'tekton-prow-5' with status 1 ===]\n","time":"2019-05-28T15:05:48Z"}

I fix this by manually going into the projects and deleting the tdashboard-e2e-* networks.

Setup auto config application for Prow

Expected Behavior

When we make changes to our prow config:

The pull request that proposes the changes should verify them as much as possible
Once they are merged, they should be automatically applied to our prow cluster

Actual Behavior

We have nothing verifying our config.yaml changes
Once the changes are made, they must be applied manually with:

kubectl create configmap config --from-file=config.yaml=prow/config.yaml --dry-run -o yaml | kubectl replace configmap config -f -

Additional Info

Bonus: can we do this via a Task/Pipeline? See tektoncd/pipeline#267 for more info on our dogfood epic, will need zenhub access to see subtasks.

Create a `Task` that can create a Tekton Pipelines cluster to test against

Expected Behavior

We should be testing and releasing Pipelines with Pipelines.

We should have a Task (and related Resources) defined that we can use to:

Create a cluster to test against (currently done using boskos and triggered by e2e-tests.sh)
Deploy Pipelines to the cluster

The Task

Inputs:
- git repo
- cluster (to deploy Pipelines to and run end to end tests) (or is this an output?)
Outputs:
- cluster

Actual Behavior

We currently use prow and e2e-tests.sh, which also uses boskos to control clusters to test against.

Additional Info

If possible let's try to keep using the boskos cluster we were already using
Actually triggering this is not in scope for this issue, it's fine to test manually (we'll be linking these together in a later issue)

Udated prow to the latest version

Motivation

The latest version of Prow supports a more recent version of Tekton, so we should update both Prow and Tekton in the prow cluster and benefit from that.
It would be nice to get rid of pipelines with ${} and other legacy syntax for good.

Task details

Since prow runs our CI system, we need to make sure the update is as swift as possible. It would be nice to have a rollback plan in place should things go wrong.
We should write down a detailed update and rollback process.
Prow + Tekton is only used for nightly builds, and those are also built by Tekton alone in the dogfooding cluster, so we can do the two updates pretty much in isolation one from the other.

Markdown Link check is failing for valid URLs

Expected Behavior

Markdown link check is should pass!

Actual Behavior

Markdown link check is failing for valid URLs e.g. tektoncd/triggers#273
I can reproduce this locally from the Triggers repo.

Steps to Reproduce the Problem

liche -d $(pwd) -v $(find docs/ -name *.md)

Additional Info

Failing URLs:

tektoncd/triggers#275

I1212 16:03:05.422] 	ERROR	https://docs.gitlab.com/ee/user/project/integrations/webhooks.html#events
I1212 16:03:05.422] 		Not Found (HTTP error 404)
I1212 16:03:05.422] 	ERROR	https://docs.gitlab.com/ee/user/project/integrations/webhooks.html/
I1212 16:03:05.422] 		Not Found (HTTP error 404)

tektoncd/pipeline#1736

I1212 08:38:19.837] docs/tasks.md
I1212 08:38:19.837] 	ERROR	https://en.wikipedia.org/wiki/Shebang_(Unix
I1212 08:38:19.838] 		Not Found (HTTP error 404)

tektoncd/pipeline#1696

I1211 16:59:30.399] ---- Checking links in the markdown files ----
I1211 16:59:30.399] ----------------------------------------------
I1211 16:59:32.799] docs/resources.md
I1211 16:59:32.800] 	ERROR	https://godoc.org/github.com/jenkins-x/go-scm/scm#State
I1211 16:59:32.800] 		Not Found (HTTP error 404)

Stop using Boskos

Expected Behavior

We should create and manage as few GCP projects as possible since it is hard to do things easily across projects such as creating roles and permission/access across projects (e.g. I can't make one role and use it in 14 different projects).

We should look into making it so that we can use just one project and avoid needing to use boskos to manage it. We still need to:

Avoid tests interfering with each other (e.g. by overwriting each other's published images or using the same test cluster)
To make sure cleanup happens between tests
All governing board members should have the access needed to be able to support tekton projects (https://github.com/tektoncd/community/blob/master/governance.md#permissions-and-access)

Actual Behavior

In #29 we are adding even more boskos projects, and for knative's needs they are up to 40 separate GCP projects https://github.com/knative/test-infra/blob/master/ci/prow/boskos/resources.yaml.

It turns out we don't really need to use separate projects, we just need:

To be able to create and teardown GKE clusters
To be able to separate images built and published by test runs (e.g. in a subfolder in GCR)

Broken pipe in report_go_test kills go tests early

Expected Behavior

report_go_test() reliably runs go tests and reports the results

Actual Behavior

Go tests randomly terminate with output of the form,

I0627 16:00:23.903] {"level":"debug","msg":"Deleting webhook info from ConfigMap"}
I0627 16:00:23.903] {"level":"debug","msg":"Found and deleting webhook info hook-loop0-num-49 from githubwebhook"}
I0627 16:00:23.954] {"level":"deFinished run, return code is 1
I0627 16:00:24.090] XML report written to /workspace/_artifacts/junit_Xm8BLWQs.xml

"Finished run, return code is 1" appears right in the middle of simple output statements. We have also seen,

I0627 15:46:27.041] ok      github.com/tektoncd/dashboard/pkg/endpoints    8.707s
I0627 15:46:27.041] ?       github.com/tektoncd/dashboard/pkg/logging    [no test files]
I0627 15:46:27.041] tee: 'standard output': Resource temporarily unavailable

Our belief is that the ${go_test} | tee ${report} statement in report_go_test results in a periodically broken pipe, terminating the test run.

Steps to Reproduce the Problem

Write go tests
Run go tests
Be unlucky

Additional Info

See for example tektoncd/experimental@3c81950 in which overriding unit_tests() allowed a test which was failing reliably to then pass reliably.

Boskos seems to be wedged

Expected Behavior

Boskos should clean up projects once they are done being use and make them available for future use.

Actual Behavior

tektoncd/pipeline#1541 and tektoncd/pipeline#1888 both have consistently failing integration tests with an error like:

I0117 21:16:30.477] 2020/01/17 21:16:30 main.go:734: provider gke, will acquire project type gke-project from boskos
I0117 21:21:30.475] 2020/01/17 21:21:30 main.go:316: Something went wrong: failed to prepare test environment: --provider=gke boskos failed to acquire project: resources not found

In #29 and other times in the past we have responded to this error by provisioning more projects for boskos.

This time though it's definitely not the case that all the projects are in use:

https://pantheon.corp.google.com/home/activity?project=tekton-prow-9 <-- had current activity
https://pantheon.corp.google.com/home/activity?project=tekton-prow-10 <-- hasnt had an activity since the 14th

When I look at the logs from the boskos Janitor I see this kind of error:

 msg: "failed to clean up project tekton-prow-10, error info: Activated service account credentials for: [[email protected]]
ERROR: (gcloud.compute.instances.list) Some requests did not succeed:
 - Invalid value for field 'zone': 'asia-northeast3-a'. Unknown zone.
 - Invalid value for field 'zone': 'asia-northeast3-b'. Unknown zone.
 - Invalid value for field 'zone': 'asia-northeast3-c'. Unknown zone.

Fail to list resource 'instances' from project 'tekton-prow-10'
ERROR: (gcloud.compute.disks.delete) unrecognized arguments: --global 

To search the help text of gcloud commands, run:
  gcloud help -- SEARCH_TERMS
Error try to delete resources: CalledProcessError()
ERROR: (gcloud.container.clusters.list) ResponseError: code=404, message=Not Found.
[=== Start Janitor on project 'tekton-prow-10' ===]
[=== Activating service_account /etc/test-account/service-account.json ===]
[=== Finish Janitor on project 'tekton-prow-10' with status 1 ===]

I think the gcloud error might be a red herring, maybe a state that boskos gets into after some other kind of error first.

CPU and memory usage for both boskos + the boskos janitor started going up a few hours ago but its hard to say if that is causing the problem or if the problem is causing it:

Also this particular janitor pod has been steadily using more and more memory (interestingly this one was started on Jan 6 but the other two janitor pods had been around since like may)

The other 2 janitor pods look like:

Additional Info

kubernetes/test-infra#15866 is related (detecting when janitor fails repeatedly)
kubernetes/test-infra#14611 hints that this might be a quota problem - we're definitely near our cpu limit so ive requested that increased

I couldn't find any other quotas that seemed like they needed increasing. I think there's a good job that boskos got into a bad state and just restarting everything will fix it.

coincidentally there was a (seemingly unrelated?) GCP outage at the time when these errors started: https://status.cloud.google.com/incident/zall/20001 So maybe that put things into a bad state

it's also possible that this is because we're using such an old version of boskos and it might need an update - i think there's a good chance that updating boskos will solve the whole thing but I didn't want to rush to do that since we might run into other problems.

Setup peribolos for tektoncd org management

Currently adding users to the tektoncd org is manual. We should automate this using something like peribolos: https://github.com/kubernetes/test-infra/tree/master/prow/cmd/peribolos

Plan:

Add a peribolos config to the tektoncd/community github repo
Add documentation for users on how to add themselves
Run this manually for now
Wrap peribolos in a tekton Task
Automate the peribolos syncing

Try enabling statusreconciler

Expected Behavior

Our prow deployment should use all useful and available Prow tools.

Actual Behavior

Since we installed Prow, a new component called statusreconciler was added (kubernetes/test-infra#12258) - it looks like this has something to do with preventing PRs from becoming stuck? Would be good to understand what this actually does before turning it on.

Additional Info

Example configuration is available at:

https://github.com/kubernetes/test-infra/blob/6a85e1e9718d1d43aaa4b756826b36e8d22eff71/prow/cluster/starter.yaml#L456-L504

Automate build/test of PipelineRun log app in Plumbing CI

Expected Behavior

The pipelinerun log app should be built and tested on every PR to the plumbing repo (ideally only if its files have been changed).

Actual Behavior

The log app does not have automated build or test.

Setup periodics job for tektoncd projects (using dogfooding)

All periodics job for any project (pipeline, cli, …) should use tekton to be be run, thus running on dogfooding cluster as a start.

This depends on #137 (for e2e tests)
This would use kubernetes cronjobs and tektoncd/triggers
The definition of those pipeline could temporarly start in plumbing but we need to aim towards having those definition and configuration in each project, this is related to #102

/kind feature
/area test-infra

Standardize yaml/markdown formatters / linters

We should have standardized yaml and markdown linters. In a few issues, there's been problems with different forms of linters conflicting with each other causing issues (e.g. tektoncd/triggers#323 uses prettier, which seems to conflict with what yamllint wants).

Leading candidate is prettier due to its popularity (35k+ stars on GitHub), support for both markdown and yaml (among other formats), and support for various editors (vim, emacs, VS Code, etc)

Open questions

Is prettier the best option? Are there other tools that would be a better fit? Let us know!

Support Go >= 1.13 for ko-ci

Expected Behavior

gcr.io/tekton-releases/ko-ci should be able to build tektoncd/triggers.

Actual Behavior

[publish-images : run-ko] + ko resolve --preserve-import-paths -f /workspace/go/src/github.com/tektoncd/triggers/config/                                                                                                                                                                  
[publish-images : run-ko] 2019/11/27 01:11:16 Building github.com/tektoncd/triggers/cmd/controller                                                                                                                                                                                        
[publish-images : run-ko] 2019/11/27 01:11:16 Building github.com/tektoncd/triggers/cmd/eventlistenersink                                                                                                                                                                                 
[publish-images : run-ko] 2019/11/27 01:11:16 Building github.com/tektoncd/triggers/cmd/webhook                                             
[publish-images : run-ko] 2019/11/27 01:13:49 Unexpected error running "go build": exit status 2                                                                                                                                                                                          
[publish-images : run-ko] # github.com/tektoncd/triggers/pkg/interceptors/webhook                                                                                                                                                                                                         
[publish-images : run-ko] pkg/interceptors/webhook/interceptors.go:47:15: original.Clone undefined (type *http.Request has no field or method Clone)

Looks like ko-ci is using an older go version, and triggers needs >= 1.13. gcr.io/tekton-nightly/ko-ci works, so all we need to do is promote this to tekton-releases.

Prow cert is expired?

Expected Behavior

https should work for prow.tekton.dev

Actual Behavior

As pointed out by @chmouel when you visit prow.tekton.dev (e.g. https://prow.tekton.dev/log?job=pull-tekton-pipeline-unit-tests&id=1204443482555420674 to look at the raw build log for tektoncd/pipeline#1702) you see an error like this:

Additional Info

Everything we know about how the certs work is written at https://github.com/tektoncd/plumbing/tree/master/prow#ingress - also @dlorenc will know more if he wants to jump in here :D

p.s. as a follow up to fixing this immediate issue we should open an issue to proactively prevent this in the future (unless we feel its not worth the effort)

Flakyness reports and retryer

Prow has some nice tools to reports Flakyness :

https://github.com/knative/test-infra/tree/master/tools/flaky-test-reporter

and even automatically retry :

https://github.com/knative/test-infra/tree/master/tools/flaky-test-retryer

It would be a good idea to try installing both, at least at first running the reporter for a bit so we can have some visibilities and see if we can enable retryer on a test repo (I like to propose CLI because we have quite a lot of them there).

Create pre-commit hook configuration for running presubmits

Expected Behavior

I should be able to catch common presubmit issues prior to creating a PR.

Actual Behavior

I forget to run golang-ci and doc linters/formaters, push my code anyway, get a notification ~5 mins later that I don't actually read until ~30 mins later that I need to format a file.

Additional Info

We should set up recommended pre-commit/push hooks to automatically run common tests/linting/etc. This should be a pruned set of tests to be relatively fast - O(seconds).

https://pre-commit.com/ has been suggested as a mechanism to manage these. Alternatively, we could just have a script to use git-hooks directly (https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks)

Upgrade triggers to release v0.2.0

This issue is meant to track the work on migrating to triggers to v0.2.0 in the dogfooding cluster.
https://github.com/tektoncd/triggers/releases/tag/v0.2.0

Upgrade the running service
- it could just be a kubectl apply , since we don't have any special configuration in place
- it would be good to try and use the "deploy tekton service" pipeline instead, and make sure we have overlays in place if needed. That will make it easier to switch to CD of the service in future
Upgrade the resources from the plumbing repo:
- Upgrade bindings to use JSONPath syntax
- Upgrade event listener to use bindings (as opposed to binding)
- Upgrade event listener to use interceptors (as opposed to interceptor)
- Update deployed resources
Upgrade the Mario bot
- Update Mario to exploit the inbuilt github interceptor and CEL filters
- Update the deployed Mario bot
Upgrade the Prow config CD mechanism
- The solution runs on dogfooding but the PR is not merged yet
- Update the PR
- Redeploy resources
Test periodic jobs, mario and prow config cd

Automate and share labels accross repositories

So, to integrate better with tide and, for example, letting users/contriubtors add some labels to their issues, we need some specific labels (e.g. question -> kind/question, …) — I am refering to tide commands.

We could use https://github.com/kubernetes/test-infra/tree/master/label_sync (with our own configuration and colors :catdance: ) or https://github.com/icecrime/label-all-the-things for automating that
We need to define the list we want (for kind/…, area/…, priority/…, etc.) and store this configuration here

cc @bobcatfish @abayer @dlorenc

/assign
/kind enhancement

Add automation for all tektoncd repos

Expected Behavior

All of the repos in tektoncd should have a minimum of prow automation applied.

We should have tide automation for:

We should add unit test support for:

dashboard (tektoncd/dashboard#72)
experimental (in each subdir?) : #37 and tektoncd/experimental#60
catalog
cli (tektoncd/cli#14) – a bit more than unit tests

Unit test support is going to be interesting, the easiest way to add this initially is probably to copy-paste the prow jobs as is and add exactly the script the knative-tests image expects:

plumbing/prow/config.yaml

Lines 107 to 108 in 2ff5265

 - "./test/presubmit-tests.sh" 

 - "--unit-tests"

OR we could be adventurous and try using our own script.

OR be very adventurous and try our own image.

Actual Behavior

Pipelines has all the bells and whistles, the rest of the projects seem to have only a random smattering (e.g. community has tide merging, but this repo doesn't).

Additional Info

Each project may decide to add additional automation (e.g. integration tests, coverage) which is totally cool, hopefully setting up a minimum should give them a starting point.
The work to do here is basically copy paste in the prow config to copy the unit-test section from pipelines and duplicate it for the other repos.

Deploy Tekton resources daily from plumbing

Setup a task that deploys all Tekton resources defined in under plumbing/tekton to the dogfooding cluster. We should run this on a daily basis for now, using CronJob + Triggers + TaskRun.

Eventually we should think of a way to deploy resources from other tekton repos too, but we might want to do that in a namespaced way.

Add image vulnerability scanning for releases/nightly builds

Container scans can be performed on Dockerhub for pushed images, we could push the images there too so we can access/republish new images based on any problems found or we can look into whether we get regular scans performed on gcr.io images.

At the very least it'd be great to do this before publishing release images and I think this is a useful step to take with a view to getting Tekton really ready for production.

For convenience here's what one would see on Dockerhub for, say, a Node.js release with vulns (maybe this scanning is only done for official images?):

Maintain tekton services configuration overlays in git

We need a way to maintain the custom configurations we need for Tekton services in the dogfooding cluster as overlays in git.

It should be possible to apply them on top of release files (e.g. full release or nightly) and ideally also to combine them with a ko based deploy (which could simply be a ko resolve + same solution as for releases).

This will allow us to store custom configurations in git, e.g. for pipeline we may want a custom default timeout and bucket configuration.

This is a step towards being able to CD services in the dogfooding cluster, but it's also required for manual deploys.

Current configuration:

$ ky get cm/config-artifact-bucket -n tekton-pipelines | egrep (...)
apiVersion: v1
data:
  bucket.service.account.field.name: GOOGLE_APPLICATION_CREDENTIALS
  bucket.service.account.secret.key: release.json
  bucket.service.account.secret.name: release-secret
  location: gs://dogfooding-pipelines
kind: ConfigMap
metadata:
  name: config-artifact-bucket
  namespace: tekton-pipelines

Install golangci-lint from a release instead of master

When building the test-runner image, we are installing golangci-lint from master (using RUN go get -u github.com/golangci/golangci-lint/cmd/golangci-lint).

We should install it from a release version instead to make sure its behavior doesn't change implicitly when bumping the test-runner image.

/cc @chmouel

Ref: tektoncd/pipeline#1634

/area test-infra

Create a pipeline to validate a released version of a Tekton project

We need a way to validate releases for various Tekton projects, and we should use a Tekton pipeline for that. Eventually we should be able to automatically trigger the pipeline as a post release job.

The aim is not to validate that the code that we just released works fine, since that is tested before the release as part of our CI, but rather to verify that the release artifacts were correctly generated. We had an issue in the past with .ko.yaml not being generated properly, which caused the wrong base image to be used when building the release.