Giter Club home page Giter Club logo

vm-operator's People

Contributors

akrits avatar akutz avatar altonf4 avatar angarg avatar aruneshpa avatar bryanv avatar corrieb avatar cyclexuxu avatar dilyar85 avatar dougm avatar dramdass avatar fstrudel avatar gdaniel1205 avatar jainvarun21 avatar janitha avatar jvrahav avatar kay-ge avatar ljoycevmware avatar mayankbh avatar narendramadanapalli avatar neosab avatar srajashekar-vmw avatar sreyasn avatar srm09 avatar srvaroa avatar tvs avatar xudongliuharold avatar yi0909 avatar zjs avatar zyiyi11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vm-operator's Issues

Collapse API into a single repo.

From a consumption perspective it's awkward to have a split api/controller which can be rev'd separately. We should move to consolidate to have a single source of truth.

ci: The testing-needed labels appear to not always work

I've noticed the testing-needed labels are not working as intended. For example, on #72 the label did not cause the PR check to fail until I removed and then re-added the label manually. PR #71 is a good example of where the label is present, but it is not blocking the PR.

@dilyar8, can you PTAL at this? Thanks!

telemetry: Add tests in `metrics` package

The metrics package is used to register Prometheus metrics for various controllers, such as vm, vm publish request, and content library item, etc. We should add necessary tests to ensure proper testing of existing and future changes in this package.

How to run on TKGm

is it possible to run the vm operator on a TKGm cluster or any K8s cluster except for a supervisor cluster?
if it is possible what is the process?

Possible to have same update performed twice

Hey,

I was looking through the controller code and I was wondering suppose I create a VM through the operator and the controller then send a call to vCenter to create a new VM which is received by vCenter. However let's suppose that the pod where the controller is running goes down and we are not able to get the MoRef ID for the VM back (or know whether the call was actually received by vCenter). Now when another controller pod comes back up and starts the reconcile again it will first check whether the VM exists or not. Let's say that the VM is still being created so, it finds that the VM does not exist and in this scenario the controller would send another operation call to create a VM.
What would happen in this scenario? Would we end up creating two VMs? And more generally what happens when there is failure when we are performing an update that is non-idempotent?

Create a means of deploying the manager binary into a K8S cluster

Being able to deploy the code and run end to end tests against it is an important part of developer experience.

As it stands, we have the ability to create a container and test it in our internal pipelines, but we should have that capability in this OSS repo.

We should add the following capabilities, with appropriate makefile targets:

  • Build a container image from the manager binary. The makefile can use an opinionated name.
  • Load the container image into a Kind K8S cluster or similar
  • Deploy the CRDs necessary to support VM Operator into the K8S cluster
  • Deploy Pod / Deployment YAML that will create one or more instances of VM Operator in the cluster
    • This should come its own checks to ensure that the pod came up successfully

Most of this work is relatively simple scripting and plumbing. Once we have this work done, we can move on to creating some e2e tests against the deployed operator.

Enable test parallelism to identify and correct flaky tests

I've noticed a bunch of new, flaky tests related to the content source and vmpubreq controllers integration tests and seem related to poorly implemented Ginkgo. For example, from the first run of the integration test job for PR #26:

------------------------------
• Failure [10.293 seconds]
Integration tests
/home/runner/work/vm-operator/vm-operator/test/builder/test_suite.go:251
  Reconcile ContentSource
  /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:165
    when ContentSource and ContentLibraryProvider exists
    /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:166
      when a new ContentSource with duplicate vm images is created
      /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:279
        should reconcile and generate a new VirtualMachineImage object [It]
        /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:319

        Timed out after 10.001s.
        Expected
            <int>: 3
        to equal
            <int>: 2

        /home/runner/work/vm-operator/vm-operator/controllers/contentlibrary/contentsource/contentsource_controller_intg_test.go:326

Re-running just the IT job usually clears up flakes like above. I believe these are occurring because our tests are race-y, and once people start creating PRs in this project, we will see these errors more frequently. When that happens:

  1. Click on the job that failed, ex. integration-test:
    image
  2. Click the icon to re-run just the failed job:
    image
  3. Click on the Re-run jobs button to re-run the failed job and its dependents:
    image

This is usually enough to fix things. However, I want to set a goal that we run Ginkgo with the -p flag, which enables suite parallelism. This would very quickly identify all of the issues we have related to the way we've constructed our tests.

This issue tracks the need to enable parallism for our tests suites.

Issue templates

Please create issue templates for different issue types. Upstream cluster-api is a good example to follow.

🔥 Update v1alpha1 VirtualMachineImage to Namespace-scoped

❗❗ This issue should be handled ASAP ❗ ❗

Now that the Image Registry API is enabled on Supervisor, the v1alpha1 VirtualMachineImage needs to be changed to Namespace-scoped. This results in a few errors:

  • The message an empty namespace may not be set during creation during tests that require a dummy image, ex.

    STEP: Creating a temporary namespace
    • Failure in Spec Setup (BeforeEach) [0.033 seconds]
    Integration tests
    /home/runner/work/vm-operator/vm-operator/test/builder/test_suite.go:252
      Invoking Create
      /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:21
        create table [BeforeEach]
        /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:102
          should work
          /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:103
    
          Unexpected error:
              <*errors.errorString | 0xc0008d3af0>: {
                  s: "an empty namespace may not be set during creation",
              }
              an empty namespace may not be set during creation
          occurred
    
          /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82
  • This would seem like a straight-forward fix, ex.

    diff --git a/test/builder/util.go b/test/builder/util.go
    index bcc89d5a..6451a988 100644
    --- a/test/builder/util.go
    +++ b/test/builder/util.go
    @@ -340,10 +340,11 @@ func DummyVirtualMachineSetResourcePolicy2(name, namespace string) *vmopv1.Virtu
     	}
     }
     
    -func DummyVirtualMachineImage(imageName string) *vmopv1.VirtualMachineImage {
    +func DummyVirtualMachineImage(imageName, namespace string) *vmopv1.VirtualMachineImage {
     	return &vmopv1.VirtualMachineImage{
     		ObjectMeta: metav1.ObjectMeta{
    -			Name: imageName,
    +			Name:      imageName,
    +			Namespace: namespace,
     		},
     		Spec: vmopv1.VirtualMachineImageSpec{
     			ProductInfo: vmopv1.VirtualMachineImageProductInfo{
  • However, this breaks much of the rest of the test framework as it does not ever expect a Namespace is required to setup the world when creating dummy images, ex.

    Summarizing 12 Failures:
    
    [Fail] Integration tests Invoking Create [BeforeEach] create table should work 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82
    
    [Fail] Integration tests Invoking Create [BeforeEach] create table should work for cluster vm image 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82
    
    [Fail] Integration tests Invoking Create [BeforeEach] create table should not work for invalid image name 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:82
    
    [Fail] Integration tests Invoking Update [BeforeEach] when update is performed with changed image name should deny the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Update [BeforeEach] when update is performed with changed storageClass name should deny the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when Ports are updated rejects the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when VmMetadata is updated rejects the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when NetworkInterfaces are updated rejects the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when Volumes are updated when a vSphere volume is added rejects the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when Volumes are updated when a PV is added does not reject the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Update [BeforeEach] VirtualMachine update while VM is powered on when AdvancedOptions VolumeProvisioningOptions are updated rejects the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:123
    
    [Fail] Integration tests Invoking Delete [BeforeEach] when delete is performed should allow the request 
    /home/runner/work/vm-operator/vm-operator/webhooks/virtualmachine/validation/virtualmachine_validator_intg_test.go:280
    
    Ran 12 of 12 Specs in 13.812 seconds
    FAIL! -- 0 Passed | 12 Failed | 0 Pending | 0 Skipped

Enforce `testing-needed-e2e-fast` label in merging PRs

The new labeler workflow will apply the testing-needed-e2e-fast label on PRs that change 99+ lines of code (LoC). To ensure this policy is followed, PRs requiring e2e testing should be blocked from merging if the label has not been applied. However, some flexibility should be allowed for PRs that contain mostly generated files, as the labeler does not filter them out currently.

Create README and How To guide(s)

README should be concise and should talk about VM Service at a high level, linking to external blogs and information. It should also link to any Guides we add to the repo.

Given that we are designing a simple experience for build and test, we should create a How To Guide that takes you through the process of modifying, building and testing. This is separate from the "how to contribute" guide. It doesn't have to be elaborate, but it should cover the basics of getting an environment set up, building the code and what the different test targets are designed to cover

Create some simple e2e tests that can be run against a deployed Operator

This work should continue on from that outlined in #2 and provide the ability to sanity-test the function of VM Operator with a minimum of external dependencies.

The vcsim project is likely the ideal means of testing against a mock vCenter.

It would be good to start with a simple CRUD VM lifecycle test.

Remove CPU and Memory limits since they are not supported

VM Service does not support setting resource limits via VirtualMachineClass APIs today. As a result, the Limits in the VirtualMachineClass custom resource are never set. Filing this issue to remove all references of Limits from our codebase. If we decide to leverage this existing field for ConfigSpec based CPU and memory limits (e.g., to enable kubectl printer columns), then we can close out this issue.

pcidevices: Refactor PCI device code in CreateConfigSpecForPlacement

I believe this code needs to updated for the feature "VM Class as Config". Otherwise, duplicate vGPUs and/or DDPIO PCI devices will be added to the placement spec, since said devices will exist in both the VM Class VAPI and the ConfigSpec once the feature is enabled.

I believe the following change is needed:

	// Do not add the PCI devices from the VAPI if the following FSS is enabled, because
	// the devices will have already been copied to the ConfigSpec by Supervisor.
	if !lib.IsVMClassAsConfigFSSDaynDateEnabled() {
		for _, dev := range CreatePCIDevicesFromVMClass(vmClassSpec.Hardware.Devices) {
			configSpec.DeviceChange = append(configSpec.DeviceChange, &vimtypes.VirtualDeviceConfigSpec{
				Operation: vimtypes.VirtualDeviceConfigSpecOperationAdd,
				Device:    dev,
			})
		}
	}

docs: Provide Dockerfile example for validating Cloud-Init user data

Getting cloud-init user data correct may prove difficult for users. The following Dockerfile can be used to validate the user data:

FROM ubuntu:latest

RUN apt-get update -y && \
    apt-get install -y cloud-init

ENTRYPOINT [ "cloud-init", "schema", "--config-file", "/cloud-config.yaml" ]

Let's imagine we build the following image from the above Dockerfile:

docker build -t civ .

Now, imagine the following cloud config file exists as "my-config.yaml":

#cloud-config
ssh_pwauth: true
users:
    - default
    - name: vmware
      sudo: ALL=(ALL) NOPASSWD:ALL
      lock_passwd: false
      passwd: '$1$salt$SOC44fVbA/ZxeIwD5yw1u1'
      shell: /bin/bash
write_files:
    - content: |
      "VMSVC Says Hello World"
      path: /helloworld

Let's run the validator on the above file:

$ docker run -it --rm -v $(pwd)/my-config.yaml:/cloud-config.yaml:ro civ
Error:
Cloud config schema errors: format-l13.c7: File /cloud-config.yaml is not valid yaml. while scanning a simple key
  in "<byte string>", line 13, column 7:
          "VMSVC Says Hello World"
          ^
could not find expected ':'
  in "<byte string>", line 14, column 7:
          path: /helloworld
          ^

We should add documentation to this repo's doc area with the above example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.