vmware-tanzu / vm-operator Goto Github PK

View Code? Open in Web Editor NEW

98.0 98.0 45.0 80.18 MB

Self-service manage your virtual infrastructure...

License: Other

Makefile 1.17% Go 98.17% Shell 0.54% Dockerfile 0.07% Smarty 0.06%

vm-operator's People

Contributors

Stargazers

Watchers

vm-operator's Issues

doc: Update docs/ref/proj/build.md

This ticket tracks updating ./docs/ref/proj/build.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

Collapse API into a single repo.

From a consumption perspective it's awkward to have a split api/controller which can be rev'd separately. We should move to consolidate to have a single source of truth.

doc: Update docs/start/about/release-notes.md

This ticket tracks updating ./docs/start/about/release-notes.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/concepts/components.md

This ticket tracks updating ./docs/concepts/components.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

Combine with vm-operator-api

Currently the vm-operator's api resides in https://github.com/vmware-tanzu/vm-operator-api this should be brought in to this repo.

ci: The testing-needed labels appear to not always work

I've noticed the testing-needed labels are not working as intended. For example, on #72 the label did not cause the PR check to fail until I removed and then re-added the label manually. PR #71 is a good example of where the label is present, but it is not blocking the PR.

@dilyar8, can you PTAL at this? Thanks!

Enable a way to trigger downstream build and test jobs based on a PR

This may involve a maintainer running an internal Jenkins job with a PR ref or commit SHA on a topic branch.

doc: Update docs/concepts/workloads/guest.md

This ticket tracks updating ./docs/concepts/workloads/guest.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/services-networking/README.md

This ticket tracks updating ./docs/services-networking/README.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/services-networking/vm-service.md

This ticket tracks updating ./docs/services-networking/vm-service.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/start/about/roadmap.md

This ticket tracks updating ./docs/start/about/roadmap.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/ref/config/README.md

This ticket tracks updating ./docs/ref/config/README.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/concepts/api.md

This ticket tracks updating ./docs/concepts/api.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/start/about/README.md

This ticket tracks updating ./docs/start/about/README.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/images/pub-vm-image.md

This ticket tracks updating ./docs/images/pub-vm-image.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/concepts/workloads/vm.md#resources

This ticket tracks updating ./docs/concepts/workloads/vm.md#resources. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

telemetry: Add tests in `metrics` package

The metrics package is used to register Prometheus metrics for various controllers, such as vm, vm publish request, and content library item, etc. We should add necessary tests to ensure proper testing of existing and future changes in this package.

doc: Update docs/concepts/workloads/vm-class.md

This ticket tracks updating ./docs/concepts/workloads/vm-class.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

vmop: Check CPU and memory are set on VM during UpdateConfigSpec

UpdateConfigSpec over writes CPU / Memory reservations from the ConfigSpec even if it is set on the VM.

UpdateConfigSpecAllocation needs to validate if the CPU and memory allocation is set on the VM before overwriting the values from ConfigSpec
in the update workflow as we do not want the changes updated to VMClass to reflect on the VM once the VM is created.

How to run on TKGm

is it possible to run the vm operator on a TKGm cluster or any K8s cluster except for a supervisor cluster?
if it is possible what is the process?

Possible to have same update performed twice

Hey,

I was looking through the controller code and I was wondering suppose I create a VM through the operator and the controller then send a call to vCenter to create a new VM which is received by vCenter. However let's suppose that the pod where the controller is running goes down and we are not able to get the MoRef ID for the VM back (or know whether the call was actually received by vCenter). Now when another controller pod comes back up and starts the reconcile again it will first check whether the VM exists or not. Let's say that the VM is still being created so, it finds that the VM does not exist and in this scenario the controller would send another operation call to create a VM.
What would happen in this scenario? Would we end up creating two VMs? And more generally what happens when there is failure when we are performing an update that is non-idempotent?

Update dependencies

Dependencies need level set to supported versions.

Current dependencies should be updated (controller runtime + k8s deps) .

cc @nikhita @iancoffey

Report status of downstream tests to github PR

@yastij will do initial investigation based on how https://github.com/kubernetes-sigs/vsphere-csi-driver does it and propose what we should adopt for this repo

doc: Update docs/ref/README.md

This ticket tracks updating ./docs/ref/README.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/start/contrib/suggest-change.md

This ticket tracks updating ./docs/start/contrib/suggest-change.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

PoC - Testing Slack integration

This issue is to validate Slack integration.

doc: Update docs/ref/proj/docs.md#go-docs

This ticket tracks updating ./docs/ref/proj/docs.md#go-docs. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

Create a means of deploying the manager binary into a K8S cluster

Being able to deploy the code and run end to end tests against it is an important part of developer experience.

As it stands, we have the ability to create a container and test it in our internal pipelines, but we should have that capability in this OSS repo.

We should add the following capabilities, with appropriate makefile targets:

Build a container image from the manager binary. The makefile can use an opinionated name.
Load the container image into a Kind K8S cluster or similar
Deploy the CRDs necessary to support VM Operator into the K8S cluster
Deploy Pod / Deployment YAML that will create one or more instances of VM Operator in the cluster
- This should come its own checks to ensure that the pod came up successfully

Most of this work is relatively simple scripting and plumbing. Once we have this work done, we can move on to creating some e2e tests against the deployed operator.

doc: Update docs/images/README.md

This ticket tracks updating ./docs/images/README.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

Enable test parallelism to identify and correct flaky tests

I've noticed a bunch of new, flaky tests related to the content source and vmpubreq controllers integration tests and seem related to poorly implemented Ginkgo. For example, from the first run of the integration test job for PR #26:

------------------------------
• Failure [10.293 seconds]
Integration tests
/home/runner/work/vm-operator/vm-operator/test/builder/test_suite.go:251
  Reconcile ContentSource
  /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:165
    when ContentSource and ContentLibraryProvider exists
    /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:166
      when a new ContentSource with duplicate vm images is created
      /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:279
        should reconcile and generate a new VirtualMachineImage object [It]
        /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:319

        Timed out after 10.001s.
        Expected
            <int>: 3
        to equal
            <int>: 2

        /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:326

Re-running just the IT job usually clears up flakes like above. I believe these are occurring because our tests are race-y, and once people start creating PRs in this project, we will see these errors more frequently. When that happens:

Click on the job that failed, ex. integration-test:
Click the icon to re-run just the failed job:
Click on the Re-run jobs button to re-run the failed job and its dependents:

This is usually enough to fix things. However, I want to set a goal that we run Ginkgo with the -p flag, which enables suite parallelism. This would very quickly identify all of the issues we have related to the way we've constructed our tests.

This issue tracks the need to enable parallism for our tests suites.

Create a deploy script to build and deploy vm-operator to an existing vSphere with Tanzu cluster

doc: Update docs/start/help.md

This ticket tracks updating ./docs/start/help.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/ref/config/manager.md

This ticket tracks updating ./docs/ref/config/manager.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

Issue templates

Please create issue templates for different issue types. Upstream cluster-api is a good example to follow.

doc: Update docs/start/contrib/submit-change.md

This ticket tracks updating ./docs/start/contrib/submit-change.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/start/contrib/report-issue.md

This ticket tracks updating ./docs/start/contrib/report-issue.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/tutorials/README.md

This ticket tracks updating ./docs/tutorials/README.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

🔥 Update v1alpha1 VirtualMachineImage to Namespace-scoped

❗❗ This issue should be handled ASAP ❗ ❗

Now that the Image Registry API is enabled on Supervisor, the v1alpha1 VirtualMachineImage needs to be changed to Namespace-scoped. This results in a few errors:

The message an empty namespace may not be set during creation during tests that require a dummy image, ex.

STEP: Creating a temporary namespace
• Failure in Spec Setup (BeforeEach) [0.033 seconds]
Integration tests
/home/runner/work/vm-operator/vm-operator/test/builder/test_suite.go:252
  Invoking Create
  /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:21
    create table [BeforeEach]
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:102
      should work
      /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:103

      Unexpected error:
          <*errors.errorString | 0xc0008d3af0>: {
              s: "an empty namespace may not be set during creation",
          }
          an empty namespace may not be set during creation
      occurred

      /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82

This would seem like a straight-forward fix, ex.

diff --git a/test/builder/util.go b/test/builder/util.go
index bcc89d5a..6451a988 100644
--- a/test/builder/util.go
+++ b/test/builder/util.go
@@ -340,10 +340,11 @@ func DummyVirtualMachineSetResourcePolicy2(name, namespace string) *vmopv1.Virtu
 	}
 }
 
-func DummyVirtualMachineImage(imageName string) *vmopv1.VirtualMachineImage {
+func DummyVirtualMachineImage(imageName, namespace string) *vmopv1.VirtualMachineImage {
 	return &vmopv1.VirtualMachineImage{
 		ObjectMeta: metav1.ObjectMeta{
-			Name: imageName,
+			Name:      imageName,
+			Namespace: namespace,
 		},
 		Spec: vmopv1.VirtualMachineImageSpec{
 			ProductInfo: vmopv1.VirtualMachineImageProductInfo{

However, this breaks much of the rest of the test framework as it does not ever expect a Namespace is required to setup the world when creating dummy images, ex.

Summarizing 12 Failures:

[Fail] Integration tests Invoking Create [BeforeEach] create table should work 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82

[Fail] Integration tests Invoking Create [BeforeEach] create table should work for cluster vm image 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82

[Fail] Integration tests Invoking Create [BeforeEach] create table should not work for invalid image name 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82

[Fail] Integration tests Invoking Update [BeforeEach] when update is performed with changed image name should deny the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Update [BeforeEach] when update is performed with changed storageClass name should deny the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when Ports are updated rejects the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when VmMetadata is updated rejects the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when NetworkInterfaces are updated rejects the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when Volumes are updated when a vSphere volume is added rejects the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when Volumes are updated when a PV is added does not reject the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when AdvancedOptions VolumeProvisioningOptions are updated rejects the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123

[Fail] Integration tests Invoking Delete [BeforeEach] when delete is performed should allow the request 
/home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:280

Ran 12 of 12 Specs in 13.812 seconds
FAIL! -- 0 Passed | 12 Failed | 0 Pending | 0 Skipped

Enforce `testing-needed-e2e-fast` label in merging PRs

The new labeler workflow will apply the testing-needed-e2e-fast label on PRs that change 99+ lines of code (LoC). To ensure this policy is followed, PRs requiring e2e testing should be blocked from merging if the label has not been applied. However, some flexibility should be allowed for PRs that contain mostly generated files, as the labeler does not filter them out currently.

doc: Update docs/ref/proj/release.md

This ticket tracks updating ./docs/ref/proj/release.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

Create README and How To guide(s)

README should be concise and should talk about VM Service at a high level, linking to external blogs and information. It should also link to any Guides we add to the repo.

Given that we are designing a simple experience for build and test, we should create a How To Guide that takes you through the process of modifying, building and testing. This is separate from the "how to contribute" guide. It doesn't have to be elaborate, but it should cover the basics of getting an environment set up, building the code and what the different test targets are designed to cover

Create some simple e2e tests that can be run against a deployed Operator

This work should continue on from that outlined in #2 and provide the ability to sanity-test the function of VM Operator with a minimum of external dependencies.

The vcsim project is likely the ideal means of testing against a mock vCenter.

It would be good to start with a simple CRUD VM lifecycle test.

doc: Update docs/services-networking/guest-net-config.md

This ticket tracks updating ./docs/services-networking/guest-net-config.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

doc: Update docs/concepts/workloads/vm-web-console.md

This ticket tracks updating ./docs/concepts/workloads/vm-web-console.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

Remove CPU and Memory limits since they are not supported

VM Service does not support setting resource limits via VirtualMachineClass APIs today. As a result, the Limits in the VirtualMachineClass custom resource are never set. Filing this issue to remove all references of Limits from our codebase. If we decide to leverage this existing field for ConfigSpec based CPU and memory limits (e.g., to enable kubectl printer columns), then we can close out this issue.

pcidevices: Refactor PCI device code in CreateConfigSpecForPlacement

I believe this code needs to updated for the feature "VM Class as Config". Otherwise, duplicate vGPUs and/or DDPIO PCI devices will be added to the placement spec, since said devices will exist in both the VM Class VAPI and the ConfigSpec once the feature is enabled.

I believe the following change is needed:

	// Do not add the PCI devices from the VAPI if the following FSS is enabled, because
	// the devices will have already been copied to the ConfigSpec by Supervisor.
	if !lib.IsVMClassAsConfigFSSDaynDateEnabled() {
		for _, dev := range CreatePCIDevicesFromVMClass(vmClassSpec.Hardware.Devices) {
			configSpec.DeviceChange = append(configSpec.DeviceChange, &vimtypes.VirtualDeviceConfigSpec{
				Operation: vimtypes.VirtualDeviceConfigSpecOperationAdd,
				Device:    dev,
			})
		}
	}

FROM ubuntu:latest

RUN apt-get update -y && \
    apt-get install -y cloud-init

ENTRYPOINT [ "cloud-init", "schema", "--config-file", "/cloud-config.yaml" ]

Let's imagine we build the following image from the above Dockerfile:

docker build -t civ .

Now, imagine the following cloud config file exists as "my-config.yaml":

#cloud-config
ssh_pwauth: true
users:
    - default
    - name: vmware
      sudo: ALL=(ALL) NOPASSWD:ALL
      lock_passwd: false
      passwd: '$1$salt$SOC44fVbA/ZxeIwD5yw1u1'
      shell: /bin/bash
write_files:
    - content: |
      "VMSVC Says Hello World"
      path: /helloworld

Let's run the validator on the above file:

$ docker run -it --rm -v $(pwd)/my-config.yaml:/cloud-config.yaml:ro civ
Error:
Cloud config schema errors: format-l13.c7: File /cloud-config.yaml is not valid yaml. while scanning a simple key
  in "<byte string>", line 13, column 7:
          "VMSVC Says Hello World"
          ^
could not find expected ':'
  in "<byte string>", line 14, column 7:
          path: /helloworld
          ^

We should add documentation to this repo's doc area with the above example.

doc: Update docs/ref/proj/README.md

This ticket tracks updating ./docs/ref/proj/README.md. Please refer to https://vm-operator.rtfd.io/en/latest/ref/proj/docs/#project-docs for guidelines on writing VM Operator project documentation.

vmware-tanzu / vm-operator Goto Github PK

vm-operator's People

Contributors

Stargazers

Watchers

Forkers

vm-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org