kubernetes-csi / csi-release-tools Goto Github PK
View Code? Open in Web Editor NEWshared build and test files used by kubernetes-csi projects
License: Apache License 2.0
shared build and test files used by kubernetes-csi projects
License: Apache License 2.0
Right now it's temporarily set to a commit hash on master to pick up a fix: kubernetes/test-infra#14948
background: kubernetes/test-infra#19477
We now have prod images in k8s.gcr.io/sig-storage, but canary images are in gcr.io/k8s-staging-sig-storage
Warning Failed 5m38s (x2 over 5m50s) kubelet Failed to pull image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:canary": rpc error: code = NotFound desc = failed to pull and unpack image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:canary": failed to resolve reference "k8s.gcr.io/sig-storage/csi-node-driver-registrar:canary": k8s.gcr.io/sig-storage/csi-node-driver-registrar:canary: not found
CI is failing because quay.io is down. It's reported in this issue:
kubernetes/kubernetes#91242
Should we move our images to a different repo, i.e., the community infra?
While cutting release v5.0.0 for external-snapshotter, it timed out before building the image for the snapshot validation webhook. We've increased the timeout value, but it is good to add timestamps in build logs to show how long each step takes.
#178 (comment)
/kind feature
So we can remove difficult to maintain logic like https://github.com/kubernetes-csi/csi-release-tools/pull/113/files#diff-c2d96beb8cf73c62f7ae059aff96e381a8d73317df3ce356f36ba6b46338959dR768
https://github.com/kubernetes-sigs/kustomize/blob/master/examples/image.md
we need to use the start/end shas instead
Due to spelling and boilerplate errors in some files, spellcheck and boilerplate tests are failing.
Boilerplates are wrong in cloudbuild.sh
, .prow.sh
, verify-subtree.sh
and prow.sh
.
Some spellings are wrong in SIDECAR_RELEASE_PROCESS.md
and prow.sh
.
When kubelet or the CSI hostpath driver set up a loop device during testing in Prow and don't unbind that loop device before testing ends, then the KinD cluster teardown (in contrast to a VM) will not unbind those loop devices either because there's nothing that associates them with the containers in which they were created (loop devices aren't namespaced).
We should try to minimize the impact by trying to identify loop devices that need to be unbound when testing ends.
Currently we just apply the raw yaml depending on the snapshotter version.
https://github.com/kubernetes-csi/csi-release-tools/blob/master/prow.sh#L743
Instead, we should use the locally built image and RBAC for PR jobs.
After adding jobs for K8s v1.29, CI started to fail.
https://testgrid.k8s.io/sig-storage-csi-ci#1.29-test-on-1.29
Our release-1.19 and master CI jobs are failing because they can't build kind images.
Building node image in: /tmp/kind-node-image746303175
ERROR: error building node image: failed to copy build artifact: lstat /home/prow/go/pkg/csiprow.RySNwxo32r/src/kubernetes/bazel-bin/cmd/kubeadm/linux_amd64_stripped/kubeadm: no such file or directory
ERROR: 'kind build node-image' failed
When a change is proposed for csi-release-tools, we need to be sure that the change works for the repos which later will get updated with the modified csi-release-tools. Ideally, updating csi-release-tools should work without changes in those repos, but that's not always possible.
Currently we rely on manual testing in other repos or test PRs in the other repos with the proposed change. Both is cumbersome to set up.
It would be useful to have Prow jobs for a number of key components (csi-driver-host-path, external-provisioner, external-snapshotter?) that automatically run some of the tests for that component (unit testing, sanity, E2E for one Kubernetes release?) with the updated csi-release-tools.
We only collect driver logs, but not k8s components logs, which makes it difficult to debug test failures, like kubernetes-csi/csi-driver-host-path#96
e.g. in following golang binary path, https://dl.google.com/go/go1.21.3.linux-amd64.tar.gz
is the right path, release-tools are missing patch version, it now fails here:
Line 437 in 267b40e
how can I fix this issue? @pohly
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/canary-external-snapshotter-push-images/1734355030900740096
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-csi-driver-host-path-push-images/1735030620733575168
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-csi-driver-smb-push-images/1735616921413357568
Setting /usr/bin/qemu-hexagon-static as binfmt interpreter for hexagon
Mon Dec 11 23:32:17 UTC 2023 go1.20.5 $ curl --fail --location https://dl.google.com/go/go1.21.linux-amd64.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 1449 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404
tar: invalid magic
tar: short read
ERROR: installation of Go 1.21 failed
ERROR
ERROR: build step 0 "gcr.io/k8s-testimages/gcb-docker-gcloud:v20[230](https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/canary-external-snapshotter-push-images/1734355030900740096#1:build-log.txt%3A230)623-56e06d7c18" failed: step exited with non-zero status: 1
Now that hostpath driver supports topology, we should run a kind cluster with 2 nodes so we can better test topology
/help
Hi,
This proposal is to build and push csi sidecar container images for ppc64le. The csi-release-tools currently builds binaries for windows too while running on amd64 arch. We can add ppc64le as well to that list. In further steps, we would also like to build and push the images for ppc64le arch.
A ref PR to cross build ppc64le has been sent here: #47
Please let me know your thoughts on the above. Thanks.
This is the current suspected reason why jobs are timing out
/assign
In a failing job like this, we are not collecting kubelet logs, and it's hard to debug:
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-csi_node-driver-registrar/60/pull-kubernetes-csi-node-driver-registrar-1-15-on-kubernetes-1-15/1206475118583222272
/help
Not sure where is the best place to open this issue, but we need to have a script that automatically move un-tracked SIG-Storage issues from k/k to our issues triage board.
When doing a multiarch build (make push-multiarch
) we might need to override additional args that might be set in the Dockerfile.Windows
file for a multi-image build, for example
ARG CORE_IMAGE=servercore
ARG CORE_IMAGE_TAG=1809
ARG BUILD_IMAGE=nanoserver
ARG BUILD_IMAGE_TAG=1809
ARG REGISTRY=mcr.microsoft.com/windows
FROM ${REGISTRY}/${CORE_IMAGE}:${CORE_IMAGE_TAG} as core
FROM ${REGISTRY}/${BUILD_IMAGE}:${BUILD_IMAGE_TAG}
LABEL description="CSI Node driver registrar"
COPY ./bin/csi-node-driver-registrar.exe /csi-node-driver-registrar.exe
COPY --from=core /Windows/System32/netapi32.dll /Windows/System32/netapi32.dll
USER ContainerAdministrator
ENTRYPOINT ["/csi-node-driver-registrar.exe"]
We can override BUILD_IMAGE
and BUILD_IMAGE_TAG
before building the docker image e.g.
docker build \
--build-arg BUILD_IMAGE=X \
--build-arg BUILD_IMAGE_TAG=Y \
-f Dockerfile.windows .
The existing Makefile has support for https://github.com/kubernetes-csi/csi-release-tools/blob/master/prow.sh#L80 BUILD_PLATFORMS="linux amd64; windows amd64 .exe; ...
, in the Makefile this string is split by ;
and used in this fragment:
echo "$$build_platforms" | tr ';' '\n' | while read -r os arch suffix; do \
docker buildx build --push \
--tag $(IMAGE_NAME):$$arch-$$os-$$tag \
--platform=$$os/$$arch \
--file $$(eval echo \$${dockerfile_$$os}) \
--build-arg binary=./bin/$*$$suffix \
--build-arg ARCH=$$arch \
--label revision=$(REV) \
.; \
done; \
We'd like to include additional build args when calling docker build
One approach that I've seen in PD CSI is to have multiple targets per windows build https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/cfe67c8322c9094030bdf28737a50eb5b1c2a542/Makefile#L51-L78, an approach would be to split BUILD_PLATFORMS
to LINUX_BUILD_PLATFORMS
and WINDOWS_BUILD_PLATFORMS
and then do:
LINUX_BUILD_PLATFORMS="linux ppc64le -ppc64le; linux s390x -s390x; linux arm64 -arm64 ..."
WINDOWS_BUILD_PLATFORMS="windows amd64 nanoserver:1809; windows amd64 servercore:20H2 ..."
echo "$$linux_build_platforms" | tr ';' '\n' | while read -r os arch suffix; do \
docker buildx build --push \
--tag $(IMAGE_NAME):$$arch-$$os-$$tag \
--platform=$$os/$$arch \
--file $$(eval echo \$${dockerfile_$$os}) \
--build-arg binary=./bin/$*$$suffix \
--build-arg ARCH=$$arch \
--label revision=$(REV) \
.; \
done; \
echo "$$windows_build_platforms" | tr ';' '\n' | while read -r os arch base_image; do \
docker buildx build --push \
--tag $(IMAGE_NAME):$$arch-$$os-$$tag \
--platform=$$os/$$arch \
--file $$(eval echo \$${dockerfile_$$os}) \
--build-arg binary=./bin/$*.exe \
--build-arg BASE_IMAGE=$$base_image \
--build-arg ARCH=$$arch \
--label revision=$(REV) \
.; \
done; \
Another approach would be to add more tokens to the string (I'm not sure if at the end to make it backwards compatible) for windows e.g.
BUILD_PLATFORMS="windows amd64 .exe nanoserver:1909; ..."
echo "$$build_platforms" | tr ';' '\n' | while read -r os arch suffix base_image; do \
docker buildx build --push \
--tag $(IMAGE_NAME):$$arch-$$os-$$tag \
--platform=$$os/$$arch \
--file $$(eval echo \$${dockerfile_$$os}) \
--build-arg binary=./bin/$*$$suffix \
--build-arg BASE_IMAGE=$$base_image \
--build-arg ARCH=$$arch \
--label revision=$(REV) \
.; \
done; \
These are just a few approaches, any feedback would be appreciated!
Also ref kubernetes-csi/node-driver-registrar#135
cc @jingxu97
We use this env var to build the images:
# BUILD_PLATFORMS contains a set of tuples [os arch suffix base_image addon_image]
# separated by semicolon. An empty variable or empty entry (= just a
# semicolon) builds for the default platform of the current Go
# toolchain.
BUILD_PLATFORMS =
The default value is:
configvar CSI_PROW_BUILD_PLATFORMS "linux amd64; linux ppc64le -ppc64le; linux s390x -s390x; linux arm -arm; linux arm64 -arm64; windows amd64 .exe nanoserver:1809 servercore:ltsc2019; windows amd64 .exe nanoserver:1909 servercore:1909; windows amd64 .exe nanoserver:2004 servercore:2004; windows amd64 .exe nanoserver:20H2 servercore:20H2; windows amd64 .exe nanoserver:ltsc2022 servercore:ltsc2022" "Go target platforms (= GOOS + GOARCH) and file suffix of the resulting binaries"
After trying to add linux arm/v7 -armv7
(ref kubernetes-csi/external-provisioner#691) as another target I realized that even though arm/v7
is a valid docker buildx platform
value it's not a correct GOARCH
value and therefore compilation of the binary fails.
We could add another arg after arch that's the docker buildx platform, the new tuple would be [os arch buildx_platform suffix base_image addon_image]
, the config for armv7 would be linux arm arm/v7 -armv7
/cc @pohly @msau42 @xing-yang
We successfully built amd64, but ppc64le is failing:
> [1/2] FROM gcr.io/distroless/static:latest:
failed to solve: rpc error: code = Unknown desc = failed to load cache key: no match for platform in manifest sha256:7d687a1d8809b89c60211ff08a9277ffbe76895907aedffac4d2400a4e1a2dc7: not found
+ docker buildx rm multiarchimage-buildertest
make: *** [release-tools/build.make:128: push-multiarch-csi-provisioner] Error 1
ERROR
ERROR: build step 0 "gcr.io/k8s-testimages/gcb-docker-gcloud:v20200421-a2bf5f8" failed: step exited with non-zero status: 2
Opening this issue to discuss the feasibility, benefits, and drawbacks of combining CSI controller sidecars and maintaining less repos and images.
One downside of the current "microservices" model of maintaining one controller per repo/image is that an update to a common library, csi-lib-utils
for example, requires manual changes that are sometimes duplicated across multiple controllers. One possible solution is to combine controllers into one binary, modeling after kube-controller-manager
.
Combining sidecars is non-trivial and would require a strong signal that the improvements are worth the engineering effort. To make such a signal more visible, this bug could be used to collect potential pain points in the current model that are hard to address by other means.
AFAIK there has been some thoughts and work in this area in the past. If you have thought about this, let us know what you think. Thanks!
/cc @chrishenzie
We should have more static check to catch most of the errors like linter, markdown..etc. This issue will keep track of the same.
I have observed same kind of issue in various kubernetes-csi project.
this happens because after the localization there are too much modifications done in the various directories.
I have observed same issue in this page also.
It has one broken link of the contributes cheat sheet
which needs to fix.
I will try to look in further csi repo as well and try to fix it as soon as I can
/kind bug
/assign
Original proposal from @bertinatto in kubernetes-csi/driver-registrar#77:
With CSI becoming GA in Kubernetes v1.13, it would be interesting to provide CSI driver developers with a mechanism to have their sidecar images always updated without having to be aware of new builds of those images.
This can be achieved by also tagging a vA.B.C
release image as vA.B
and vA
.
Users of those tags must be aware of the caveat that automatically updating by always pulling the latest vA
image may miss some changes between releases, for example:
The generated prs all add the replace back:
kubernetes-csi/csi-driver-host-path#383
kubernetes-csi/external-attacher#395
kubernetes-csi/external-provisioner#834
kubernetes-csi/external-snapshotter#797
kubernetes-csi/lib-volume-populator#61
kubernetes-csi/node-driver-registrar#257
kubernetes-csi/volume-data-source-validator#64
kubernetes-csi/external-health-monitor#149
kubernetes-csi/external-resizer#247
kubernetes-csi/livenessprobe#172
kubernetes-sigs/sig-storage-lib-external-provisioner#133
/kind bug
Here https://github.com/kubernetes-csi/csi-release-tools/blob/master/build.make#L146
It pass one --build-arg binary=xx
For windows image, typically dockerfile.windows has BASE_IMAGE etc. like here https://github.com/kubernetes-csi/node-driver-registrar/blob/master/Dockerfile.Windows.
Should we add those build-arg too?
To help catch common typos
/help
It seems like latest 1.15 branch requires cgo, which we don't have enabled:
ERROR: /home/prow/go/pkg/csiprow.Uakw8FT0JH/src/kubernetes/vendor/golang.org/x/sys/unix/BUILD:3:1: in @io_bazel_rules_go//go/private:rules/aspect.bzl%go_archive_aspect[pure="off",static="auto",msan="auto",race="auto",goos="auto",goarch="auto",linkmode="normal"] aspect on go_library rule //vendor/golang.org/x/sys/unix:go_default_library:
Traceback (most recent call last):
File "/home/prow/go/pkg/csiprow.Uakw8FT0JH/src/kubernetes/vendor/golang.org/x/sys/unix/BUILD", line 3
@io_bazel_rules_go//go/private:rules/aspect.bzl%go_archive_aspect(...)
File "/root/.cache/bazel/_bazel_root/dbb3a0bf0fb768dd6a59e5dab737c453/external/io_bazel_rules_go/go/private/rules/aspect.bzl", line 59, in _go_archive_aspect_impl
go.archive(go, source = source)
File "/root/.cache/bazel/_bazel_root/dbb3a0bf0fb768dd6a59e5dab737c453/external/io_bazel_rules_go/go/private/actions/archive.bzl", line 92, in go.archive
cgo_configure(go, srcs = (((((split.go + split.c...), <5 more arguments>)
File "/root/.cache/bazel/_bazel_root/dbb3a0bf0fb768dd6a59e5dab737c453/external/io_bazel_rules_go/go/private/rules/cgo.bzl", line 101, in cgo_configure
fail("Go toolchain does not support c...")
Go toolchain does not support cgo
ERROR: Analysis of target '//cmd/kubelet:kubelet' failed; build aborted: Analysis of target '//vendor/golang.org/x/sys/unix:go_default_library' failed; build aborted
Maybe we can also take this opportunity to upgrade to Go 1.13+?
If GOPATH is undefined, prow.sh attempts to create directories in the root of the filesystem instead of exiting with an error.
$ make
./release-tools/verify-go-version.sh "go"
mkdir: cannot create directory ‘/pkg’: Permission denied
mkdir: cannot create directory ‘/artifacts’: Permission denied
Considering adopting a form of the kubernetes golang backport process to CSI releases as well.
A summary of the process is:
Before we can realistically adopt this, we need to get our release process into a more automated state so that we can regularly release patch versions. @sunnylovestiramisu is investigating that.
The script does not update some rc dependencies like k8s.io/apiserver v0.26.0-rc.0 in kubernetes-csi/external-provisioner#834 and the k8s.io/component-base v0.25.4 in kubernetes-csi/lib-volume-populator#61
/kind bug
In our various Kubernetes-CSI repos we should regularly update dependencies to ensure that we have included all upstream bug fixes.
This includes:
dep ensure -update
git subtree pull
for release-tools
(see https://github.com/kubernetes-csi/csi-release-tools/blob/master/README.md#sharing-and-updating)Then once the PR is pending, CI testing will cover the initial round of testing. Even if that fails, having a PR pending is a reminder to the maintainer that the component is not up-to-date.
The steps above could be automated with a script, like update-dependencies.sh
in this repo. The hub
tool might be useful for managing the PRs. Bonus points for making it possible to update release branches.
Running this script for one or more component is probably best done by the same person at a regular cadence. If multiple people do it, we end up with multiple competing PRs. We could also try to move it into a Prow job, but that would be a second step.
@msau42 does that sound useful?
Rather than defaulting to always on features, I think we should allow tests suites to toggle them. Something like -
CSI_ENABLED_FEATURES="controllerExpansion=true,nodeExpansion=true"
This will enable tests like https://github.com/kubernetes-csi/external-resizer/pull/53/files to enable certain features without having to update csi-release-tools. Also because features can differ between versions it will allow .prow.sh
file in individual branches to have their own configuration. I am not sure if this is preferable to having if-elses
built into prow.sh though.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.