ionos-cloud / cluster-api-provider-proxmox Goto Github PK

View Code? Open in Web Editor NEW

169.0 169.0 22.0 3.3 MB

Cluster API Provider for Proxmox VE (CAPMOX)

License: Apache License 2.0

Dockerfile 0.29% Makefile 3.15% Go 96.22% Shell 0.34%

cluster-api kubernetes proxmox

cluster-api-provider-proxmox's People

Stargazers

Watchers

cluster-api-provider-proxmox's Issues

scheduler: allow to overprovision hosts and add a "reserved host memory" setting

Describe the solution you'd like
It would be nice to have the option, to allow overprovisioning of hosts, as, especially when just playing around, even though VMs might have been granted lets say 10GB of memory, they won't always actually allocated 10GB on the hosts. Especially with Linux KSM and KVM memory ballooning.

Additionally, keeping ZFS' ARC cache and the host's own health in mind, it would be nice to have the option, to add a configurable, optional safety buffer for the host into the scheduler.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

scheduler: memory assignment of templates is included in calculations

What steps did you take and what happened:

Assigned a VM template (qm set $VMID --template 1) a high amount of memory, greater than the host's available resources
Actually tried to provision a VM with 1024MB of memory
Was presented with 0B available memory left error

What did you expect to happen:

My requested VM to be provisioned, as VMs with the template flag can not start. Therefore their memory requests shouldn't be taken into consideration.

Anything else you would like to add:

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Issue deleting machine that added into High Availability

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

We have discovered that the CAPMOX provider cannot delete machines which are part of high availability.
Steps to reproduce:

Step 1. VM128 which is a worker node for a cluster is added into High Availability in the proxmox GUI (Datacenter > High Availability)

Step 2. Find the machine resource in the management cluster and delete it

kubectl get machines|grep mk1-busi-cl-worker-wrn46
kubectl delete machine mk1-busi-cl-workers-lnlc8-q98l6

Step 3 - Check the CAPMOX provider logs
E0515 09:29:06.758515 1 controller.go:329] "Reconciler error" err="cannot delete vm with id 128: 500 unable to remove VM 128 - used in HA resources and purge parameter not set." controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/mk1-busi-cl-worker-wrn46" namespace="default" name="mk1-busi-cl-worker-wrn46" reconcileID="6631f758-0c64-4776-9f4a-aadf0435510f"

What did you expect to happen:
I would have expected the machine to be deleted

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
There is a purge parameter for the DELETE API request.
https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu/{vmid}

Environment:

Cluster-api-provider-proxmox version: v1.6.3/v0.40
Kubernetes version: (use kubectl version): 1.28.8
OS (e.g. from /etc/os-release): ubuntu 2204

e2e tests: provide capmox controller log dumps

Describe the solution you'd like
If e2e is failing because the capmox controller is not coming up, there's nothing to debug and no hint what to fix. As such, if clusterctl init fails, dumping the controller pod log would be helpful.

Support for linked clones

What steps did you take and what happened:
Set "full: false", "format: qcow" or "format: raw"...also tried to not set format. Backend storage is NFS.

I0412 17:22:22.707171       1 recorder.go:104] "events: failed to sync MachineSet replicas: failed to clone infrastructure machine from ProxmoxMachineTemplate qa-worker while creating a machine: ProxmoxMachine.infrastructure.cluster.x-k8s.io \"qa-worker-72zf9\" is invalid: spec: Invalid value: \"object\": Must set full=true when specifying format" type="Warning" object={"kind":"MachineSet","namespace":"default","name":"qa-workers-gh8gs","uid":"8451b7f1-f5be-4505-a0c6-13bf0ac175db","apiVersion":"cluster.x-k8s.io/v1beta1","resourceVersion":"444395"} reason="ReconcileError"

What did you expect to happen:
A linked clone would be created

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Maybe I am an idiot but I couldnt find anything in the docs to enable linked clones.

Environment:

Cluster-api-provider-proxmox version: 0.3.0
Kubernetes version: (use kubectl version): 1.28.7
OS (e.g. from /etc/os-release): Ubuntu 22.04 from image-builder

Webhook should deny clusters with common Ip addresses

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

I created a cluster with a specific IP range and later created another cluster with another IP range that intersect
with the other range, the cluster starts provisioning, and a new machine has the same IP as another machine from the other cluster, which leads to a conflict.

I got an apiserver error cannot verify certificate cause of the conflict.

What did you expect to happen:

The new cluster has to be rejected prior to provisioning.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version: v0.1.0
Kubernetes version: (use kubectl version): v1.27.8
OS (e.g. from /etc/os-release): ubuntu

Allow Multiple Proxmox Clusters

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

We need to make credentials part of the ProxmoxCluster CR so we can enable supporting many proxmox clusters

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Possible wrong Cloud-Init network-config generated file

What steps did you take and what happened:
Sorry I submit this as a bug, but maybe it isn't.
When deploying a cluster with Talos as provider for bootstrap and controlplane, Talos' init process finds a cloud-init drive, but then complains about network-config file. Talos' error says: "network-config metadata version=0 is not supported", maybe because it starts with "network:". Is it supported? Reading the manual, it shouldn't.

cloud-init manual

What did you expect to happen:
Maybe cluster deployment should generate a network-config file starting without a top level "network:".

Environment:

Cluster-api-provider-proxmox version: 0.1.0
Kubernetes version: (use kubectl version): 1.28.3
OS (e.g. from /etc/os-release): Talos 1.5.5

Create kube-proxy-less load balancer template

This could be a template that either uses cilium kube-proxy-less or uses kubernetes 1.30 (ref #183) with ipvs-less kube-proxy.

Internal ref: CAPI-110

Default gateway creation

From @65278:

At the moment, we try to create a default gateway for every interface.
That only works by accident and netplan can stop working any time.

Can't clone to non-shared storage

Hi, I'm facing a proxmox issue and I'm wondering if a workaround could be implemented here to fix this behavior.

What steps did you take and what happened:

Context: I have a proxmox cluster with a shared storage(rbd) containing my template disk and I want to create a cluster on multiple nodes. I want to use nodes' local storage pool as it is faster than my ceph cluster.

The problem is:
If the storage destination is a non shared storage, I can only clone the template on the node where the template is located otherwise I get this error: "500 Can't clone to non-shared storage 'local-lvm'"

What did you expect to happen:

As the disk is shared, I would expect the clone to be successful

Anything else you would like to add:

A solution could be to do a linked clone of the template on the node and clone the vm from it. Then delete the linked clone.

cf: https://forum.proxmox.com/threads/500-cant-clone-to-non-shared-storage-local.49078/
cf: https://bugzilla.proxmox.com/show_bug.cgi?id=2059

Environment:

Cluster-api-provider-proxmox version: 0.2.0
Kubernetes version: (use kubectl version): 1.29.1
OS (e.g. from /etc/os-release): Debian 12

Would it be possible to add vlan tag support?

          Would it be possible to add vlan tag support at the same time?

Originally posted by @mkamsikad2 in #29 (comment)

Hi everyone, as @mkamsikad2 mentioned some months ago, can vlan tag support be added during VM's creation?

It could be handy feature.

Thank you,
Fabio

The ability for the CAPMOX provider to work with HA

We have discovered that the CAPMOX provider cannot delete machines which are part of high availability.
Steps to reproduce:

Step 1. VM128 which is a worker node for a cluster is added into High Availability in the proxmox GUI (Datacenter > High Availability)

Step 2. Find the machine resource in the management cluster and delete it

kubectl get machines|grep mk1-busi-cl-worker-wrn46
kubectl delete machine mk1-busi-cl-workers-lnlc8-q98l6

Step 3 - Check the CAPMOX provider logs
E0515 09:29:06.758515 1 controller.go:329] "Reconciler error" err="cannot delete vm with id 128: 500 unable to remove VM 128 - used in HA resources and purge parameter not set." controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/mk1-busi-cl-worker-wrn46" namespace="default" name="mk1-busi-cl-worker-wrn46" reconcileID="6631f758-0c64-4776-9f4a-aadf0435510f"

I would like to request that HA can be added to the proxmox machine template and that if HA is added the machine is added to HA.

When deleting a machine with HA the purge parameter can be added to delete machines in HA.

There is a purge parameter for the DELETE API request.
https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu/{vmid}

Environment:

Cluster-api-provider-proxmox version: v1.6.3/v0.40
Kubernetes version: (use kubectl version): 1.28.8
OS (e.g. from /etc/os-release): ubuntu 2204

`ProxmoxCluster.spec.controlPlaneEndpoint.host` does not accept FQDNs

As we discovered today, ProxmoxCluster.spec.controlPlaneEndpoint.host does not accept FQDNs, while CAPI's Cluster CR accepts them and, with disabled validation on our end, it also works just fine.

We should address this issue, by, as discussed, checking if the provided value is a FQDN, if resolve it and then continue with the validation using it's returned records.

Add ability to skip e2e tests

Describe the solution you'd like
We need an automated way to skip e2e tests when we deem them unnecessary, e.g. because they only change documentation.
This is to save resources needed to run an e2e test.
I added a label to mark such PRs and we should hook it up with the action. One way to do it would be for the e2e action to simply succeed when the label is applied, otherwise to request deployment.

BUG: unable to initialize proxmox api client not authorized to access endpoint

What steps did you take and what happened:
After installing all prerequisites and running the command

clusterctl init --infrastructure proxmox --ipam in-cluster --core cluster-api:v1.5.3

when watching pods creating i notice that the capmox-controller is in CrashLoopback.
Pod Logs:

I0124 23:54:17.007521       1 main.go:87] "setup: starting capmox"
I0124 23:54:17.008105       1 listener.go:44] "controller-runtime/metrics: Metrics server is starting to listen" addr="localhost:8080"
I0124 23:54:17.008556       1 main.go:126] "setup: feature gates: ClusterTopology=false\n"
E0124 23:54:20.031775       1 main.go:133] "setup: unable to setup proxmox API client" err="unable to initialize proxmox api client: not authorized to access endpoint"

clusterctl.yaml

PROXMOX_URL: "https://pve.dev.local/api2/json"
PROXMOX_TOKEN: "capi@pve!token1"
PROXMOX_SECRET: "REDACTED"
PROXMOX_SOURCENODE: "pve01"
TEMPLATE_VMID: "9000"
ALLOWED_NODES: "[pve01,pve02,pve03]"
VM_SSH_KEYS: "ssh-rsa ..."
CONTROL_PLANE_ENDPOINT_IP: "192.168.254.40"
NODE_IP_RANGES: "[192.168.254.50-192.168.254.80]"
GATEWAY: "192.168.254.1"
IP_PREFIX: "24"
DNS_SERVERS: "[192.168.254.1]"
BRIDGE: "vmbr0"
BOOT_VOLUME_DEVICE: "scsi0"
BOOT_VOLUME_SIZE: "64"
NUM_SOCKETS: "1"
NUM_CORES: "4"
MEMORY_MIB: "4096"
EXP_CLUSTER_RESOURCE_SET: "false"
CONTROL_PLANE_MACHINE_COUNT: "3"
WORKER_MACHINE_COUNT: "3"

providers:
  - name: in-cluster
    url: https://github.com/kubernetes-sigs/cluster-api-ipam-provider-in-cluster/releases/latest/ipam-components.yaml
    type: IPAMProvider

capi@pve!token1 is proxmox Administrator on / so it's cluster admin.
Yesyed the user on terraform and packer and ansible and everything works fine there.

What did you expect to happen:
The capmox-controller should connect to proxmox and continue provisioning

Anything else you would like to add:
Using kind as local kubernetes provisioner and docker rootless as driver, everything it at default configuration.

Environment:

Cluster-api-provider-proxmox version: 0.1.1
Kubernetes version: (use kubectl version):

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.3

OS (e.g. from /etc/os-release):

NAME="AlmaLinux"
VERSION="9.3 (Shamrock Pampas Cat)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.3 (Shamrock Pampas Cat)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.3"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"

It's systematic so i must be doing something wrong...

Add more proxmox tags to the vms

Describe the solution you'd like
Right now there are two tags present for the vms in proxmox:

go-proxmox+cloud-init (What is the value / reason of this?)
ip_net0_<the IP of the vm>

It would be nice to have additional tags / labels like:

clustername to identify the cluster of that node
mgmt cluster name or any other reference to the capi cluster that is managing this vm / workload cluster in case there are more management clusters on that proxmox environment.

Optional:
It would be nice to add custom tags as well (at least at some point in the future)

Make e2e fail early

There is no need to continue running e2e tests if one of them failed. It's just a waste of time and compute.
Instead, the process should fail asap when any error is encountered.

Issue with CloudInit ISO creation

What steps did you take and what happened:

I am building a cluster k8s with 3 cp nodes and 3 worker nodes on a 3 node proxmox cluster. There are 3 possible proxmox urls that can be used one on each node.
I can build a cluster successfully if all k8s nodes are placed on the proxmox node which is used for the api url
If I build a k8s cluster with k8s nodes spread across the proxmox cluster (using ALLOWED_NODES en var) then k8s nodes will only build successfully on the cluster that hosts the api uri. The other nodes fail to build.
For example if https://node1.domain.cloud:8006/api2/json/ is used for the api url then a control plane and worker node will successfully build on node1. node2 and node3 will have a worker node which has cloned and has the ip tag against it.
In the capmox controller logs the following error is logged repeatedly:

E0214 13:26:35.944055       1 controller.go:329] "Reconciler error" err="failed to reconcile VM: cloud-init iso inject failed: unable to inject CloudInit ISO: Post \"https://node1.domain.cloud:8006/api2/json/nodes/node2/storage/local/upload\": EOF" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/capi-management-control-plane-xfv5m" namespace="default" name="capi-management-control-plane-xfv5m" reconcileID="136fe780-6b15-49b4-b6c1-bda537915f48"

In proxmox there will be 1 Successful resize task and 2 failed Copy data tasks which will then keep repeating. The error given is:

starting file import from: /var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7
target node: node2
target file: /var/lib/vz/template/iso/user-data-107.iso
file size is: 65536
command: /usr/bin/scp -o BatchMode=yes -p -- /var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7 [10.20.1.22]:/var/lib/vz/template/iso/user-data-107.iso
TASK ERROR: import failed: /usr/bin/scp: stat local "/var/tmp/pveupload-d8b6bc08693595b3c3911689a95457d7": No such file or directory

If you log onto the node you can see the pveupload-* file being created and then being removed. Something is removing this file before the copy can take place.
The contents of the pveupload file looks to be correct

What did you expect to happen:
The nodes should be built successfully across all 3 proxmox nodes.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:
3 proxmox nodes 8.1.3 which also host ceph cluster used for storage
clusterctl version 1.6.1
IPAM provider v0.1.0-alpha.3

Cluster-api-provider-proxmox version: cluster-api-provider-proxmox:v0.2.0.
Kubernetes version: (use kubectl version): 1.28.3
OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS

Too many calls when deleting machine

What steps did you take and what happened:
During a rolling update, I noticed multiple calls for deleting the same VM in the Proxmox Task log

Is this the normal behavior, or should there be fewer calls when deleting machines?

Environment:

Cluster-api-provider-proxmox version: v0.1.0
Kubernetes version: (use kubectl version): v1.28.5
OS (e.g. from /etc/os-release): ubuntu-22.04

bug: machines fails to reconcile due to wrong name

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

 failureMessage: expected VM name to match "capmox-e2e-p0jq9a-control-plane-b4t76"     but it was "capmox-e2e-buedhi-control-plane-cskrb"

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Verify CAPI v1.5.6 & v1.6.2 support

With Cluster API v1.5.6 and v1.6.2 released, check to make sure we support it.

feat: allow passing caBundle via credentials Secret

Currently, CAPMOX plainly forces InsecureSkipVerify: true for the ProxmoxClient. We should support passing a caBundle via the credentials Secret, to give the user the choice, whether or not he wants to enable TLS verification.

Ability of Placing Machines in specific Proxmox Nodes

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Currently, the provider uses a scheduler to find the best node to run a VM based on CPU/memory.
In some cases, we need to enforce placing Machines into some specific Proxmox Nodes

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Full IPAM 0.1.0 and Go 1.21 support

As documented in a93bff1, IPAM 0.1.0 requires Go 1.21. Some of our tests aren't working with Go 1.21; however, we do know that we can work with IPAM 0.1.0 otherwise.

This is a blocker for 0.4.0.

cluster-api 1.7.* support

https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.7.0

incorrect cloudinit template (network gateway) with ipv6only setup

The ipv6 address and gateway only get set correctly when it would get to this default case:

https://github.com/ionos-cloud/cluster-api-provider-proxmox/blob/a1666e478f0b4f990da0a0ef59a7f6d63397b2cd/internal/service/vmservice/bootstrap.go#L197C4-L197C4

It never gets to that default case because len(config.MacAddress) == 0: is true

Which results in a cloudinit template that looks like this (0.0.0.0 gateway instead of ::/0):

instance-id: e7fc6bd1-dd59-4d01-aa49-c1e9dade2672
local-hostname: proxmox-quickstart-control-plane-wvlt7
network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      match:
        macaddress: BC:24:11:B9:4A:C3
      dhcp4: 'no'
      addresses:
        - 2a03:4000:20:18b:aaaa::a/64
      routes:
        - to: 0.0.0.0/0
          via: 2a03:4000:20:18b:be24:11ff:fe20:b2ed
      nameservers:
        addresses:
          - 2a03:4000:20:18b:be24:11ff:fe20:b2ed

Full configuration I used here: https://gist.github.com/lucasl0st/4fe31d5f9b936520f177c42efeb157e7

The v4/v6 should probably be handled in this function instead:

cluster-api-provider-proxmox/internal/service/vmservice/bootstrap.go

Line 144 in a1666e4

 func getNetworkConfigDataForDevice(ctx context.Context, machineScope *scope.MachineScope, device string) (*cloudinit.NetworkConfigData, error) { 

Cluster name tag for VMs, and ability to add custom tags

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Currently, we add ip tag, it will be great to add cluster name as a tag to the machines,
and allow adding more tags

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Release v0.4.0

Release

Release v0.4.0

Checklist

Update metadata & clusterctl-settings
Update docs, (compatibility table, usage etc) .
Create tag.
Update the created draft release to include (BREAKING Changes, Important Notes).
Publish the release.

Crashloop Backoff capmox-controller-manager

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

After clusterctl init --bootstrap talos --control-plane talos --infrastructure proxmox --ipam in-cluster
the capmox-controller-manager can't setup the promos API Client

I0326 00:40:45.030910       1 main.go:88] "starting capmox" logger="setup"
I0326 00:40:45.057362       1 main.go:122] "feature gates: ClusterTopology=false\n" logger="setup"
E0326 00:40:45.159914       1 main.go:129] "unable to setup proxmox API client" err="unable to initialize proxmox api client: 501 Method 'GET /api2/json/version' not implemented" logger="setup"

What did you expect to happen:
The container should be up and running.
With the same proxmox node and the same credentials it work previously.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version: 0.3.0
Kubernetes version: (use kubectl version):
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2 - kind Cluster
OS (e.g. from /etc/os-release):
MacOS 14.4

bump go to 1.22

Needed by #183 ~~#186~~

https://github.com/ionos-cloud/cluster-api-provider-proxmox/labels/go1.22

Support e2e tests in dependabot PRs

Dependabot PRs need the e2e tests to be triggered on pull_request_target to have secrets set.

Unify DefaultNetworkDevice and AdditionalNetworkDevices

Describe the solution you'd like
At the moment, Network devices are arbitrarily split in two categories, default and additional. This duplicates code and serves no purpose.
If we just have an array of NetworkDevices, and require there to be at least 1, we get the same guarantees.

Anything else you would like to add:

IP addresses for the first network device are a problem, because they're coming from the cluster object. This needs some architecture.
This changes public API. We'll need to go v1alpha2 and provide migration patches.

Release v0.3.0

Release

Release v0.3.0

Checklist

Update metadata & clusterctl-settings
Update docs, (compatibility table, usage etc) .
Create tag.
Update the created draft release to include (BREAKING Changes, Important Notes).
Publish the release.

Release v0.2.0

Release

Release v0.2.0

Checklist

Update metadata & clusterctl-settings
Update docs, (compatibility table, usage etc) .
create Tag.
update the created draft release to include (BREAKING Changes, Important Notes).
Publish the release.
OPTIONAL: create release branch.

PROXMOX_TOKEN in CAPMOX is PROXMOX_SECRET in the Image-Builder which is error-prone

I created an image with image-builder, the environment variable for the "secret-key" is PROXMOX_TOKEN.
In CAPMOX the same environment variable contains the "token-user".

During the clusterctl init the environment variable is preferred over the config-file, so i ended up to have an secret like this

apiVersion: v1
data:
  secret: M3OWRi
  token: M3OWRi
  url: aHqc29u

I would like to suggest to have the same naming as in the image-builder to save others the trouble shooting.
PROXMOX_USERNAME & PROXMOX_TOKEN

I should be able to create a PR for that - if desired.

Another cluster-api-provider-proxmox

*This is not an issue nor feature request

Hi, did you aware of another cluster-api-provider-proxmox? This cluster-api-provider-proxmox is developed by k8s-proxmox since about 8 month ago. When I started our cluster-api-provider-proxmox project, there are no other cluster-api-proxmox implementation, that is why I decided to develop it by myself.
For now we noticed there is another cluster-api-proxmox developed by @ionos-cloud (Thanks @3deep5me for noticing it to us). Of course it's not efficient nor best way that developing two cluster-api-provider-proxmox differently if there is no reason. So let me ask some question to decide whether we should merge our efforts or not etc.

Did you aware of our cluster-api-provider-proxmox when you started (or while developing) CAPMOX project?
(if yes for 1.) What is the reason to develop another ionos-cloud/cluster-api-provider-proxmox instead of our k8s-proxmox/cluster-api-provider-proxmox ?
What do you think about somehow merging our efforts to one repository ? Are you interested in merging/adopting our efforts to one place ?

fyi some of the diffs of our providers are mentioned here : k8s-proxmox/cluster-api-provider-proxmox#163 (reply in thread)

Support k8s v1.29

Ensure support for k8s v1.29.

Document IPv6 only clusters incomaptible with kube-vip < 0.7.2

Describe the solution you'd like
kube-vip tries to define the cluster interface by looking for the default route. For versions < v0.7.2, this does not work in IPv6 only environments because IPv6 routing tables are not considered:

root@ipv6test01-control-plane-5000b-mg75w:~# crictl logs d1dd2691d6aab
time="2024-03-14T11:48:58Z" level=info msg="Starting kube-vip.io [v0.5.10]"
time="2024-03-14T11:48:58Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[false]"
time="2024-03-14T11:48:58Z" level=info msg="No interface is specified for VIP in config, auto-detecting default Interface"
....
time="2024-03-14T11:52:30Z" level=fatal msg="unable to detect default interface -> [Unable to find default route]"

Document this (and the kube vip default interface template variable) for future releases.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

ProxmoxMachine deletion fails and loops with `cannot find vm with id -1`

What steps did you take and what happened:
CAPMOX can get stuck during ProxmoxMachine deletion, when a VM has never been provisioned. It then tries to delete a VM with the ID -1, which does not (and probably can not) exist.

capmox-controller-manager-6c57697857-6lxsg manager E1214 12:10:49.116477 1 controller.go:324] "Reconciler error" err="cannot find vm with id -1: bad request: 400 Parameter verification failed. - {\"vmid\":\"invalid format - value does not look like a valid VM ID\\n\"}" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="capmox/test-worker-kf5pm" namespace="capmox" name="test-worker-kf5pm" reconcileID="f1d47249-fe64-49ba-9a3f-9d69b8e8fbce"

What did you expect to happen:
If a VMID for a VM is not known, instead of computing it to be -1 and attempting to delete it (which will not work, as you see in the error message above), just accept the deletion request and complete it.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version: v0.1.0
Kubernetes version: (use kubectl version): v1.28.4
OS (e.g. from /etc/os-release):

Support externally managed Kubernetes Control Planes

Describe the solution you'd like

Cluster API supports referencing a third-party Control Plane Provider, such as Kamaji.

The Control Plane provider will be responsible for provisioning a Control Plane backed by a Kubernetes API Server which will be used by the infrastructure cluster with the controlPlaneEndpoint contract.

The current version of CAPMOX doesn't consider this, requiring to know the Control Plane endpoint in advance.

cluster-api-provider-proxmox/internal/webhook/proxmoxcluster_webhook.go

Lines 99 to 108 in 0033e29

 ipAddr, err := netip.ParseAddrPort(fmt.Sprintf("%s:%d", ep.Host, ep.Port)) 

 if err != nil { 

 return apierrors.NewInvalid( 

 gk, 

 name, 

 field.ErrorList{ 

 field.Invalid( 

 field.NewPath("spec", "controlplaneEndpoint"), fmt.Sprintf("%s:%d", ep.Host, ep.Port), "provided endpoint is not in a valid IP and port format"), 

 }) 

 }

Kamaji has successfully integrated with other infrastructure providers by relying on externally managed Control Planes: this approach has been taken into consideration on other CAPI infrastructure providers we worked with.

Anything else you would like to add:

To avoid breaking the current UX, it could be useful to have a knob to skip the ControlPlane endpoint address for the given ProxmoxCluster.

Once the Control Plane provider has provisioned the Kubernetes API Server address, it can patch the infrastructure cluster to continue the required reconciliation, e.g.: https://github.com/clastix/cluster-api-control-plane-provider-kamaji/blob/75b0578114b236f1741d4b79b60eb39b23dfcbeb/controllers/kamajicontrolplane_controller_cluster_patch.go#L20-L52

I'm open to provide a PR to address this feature request.

Environment:

Cluster-api-provider-proxmox version: N.R.
Kubernetes version: (use kubectl version): N.R.
OS (e.g. from /etc/os-release): N.R.

Support Kubernetes v1.30

https://kubernetes.io/blog/2024/04/17/kubernetes-v1-30-release/

Support different cloud-init network-config

Describe the solution you'd like

Currently we only support network-config version 2 which supports netplan distributions.

We need a way of making this configurable so the user can choose which network-config version to use.

Anything else you would like to add:

This will make sure capmox supports various distributions

Unable to create a cluster on a PVE node with sufficient memory

What steps did you take and what happened:

I followed the quickstart guide step by step, used the Proxmox VE builder to successfully create a PVE template, then configured ~/.cluster-api/clusterctl.yaml, and finally I used the following command to create the cluster:

clusterctl generate cluster proxmox-quickstart \
    --infrastructure proxmox \
    --kubernetes-version v1.27.8 \
    --control-plane-machine-count 1 \
    --worker-machine-count 3 > cluster.yaml

kubectl apply -f cluster.yaml

Then I received an error message:

E0120 05:46:40.670202       1 controller.go:324] "Reconciler error" err="failed to reconcile VM: cannot reserve 2147483648B of memory on node newpve: 0B available memory left" controller="proxmoxmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="ProxmoxMachine" ProxmoxMachine="default/proxmox-quickstart-control-plane-gnjd5" namespace="default" name="proxmox-quickstart-control-plane-gnjd5" reconcileID="bd2f39b8-51fb-4cce-bd4d-429d596a8e31"

What did you expect to happen:
Successfully created the cluster.

Anything else you would like to add:
#36 (comment)

Environment:

Cluster-api-provider-proxmox version: 0.1.1
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release): Ubuntu 22.04

It seems like the ipam-components.yaml is missing in the release

What steps did you take and what happened:
I tried to do the quick-start. The link to the ipam-components.yaml in the quick-start redirects to a 404, properly because it is not part of the release.

What did you expect to happen:
Get redirect to the needed file.

Anything else you would like to add:
I could not find the file in the repository.

Unmount Cloundinit iso after machine are provisioned

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

The machines are started and the cloudinit stays mounted.
and the tag, is still there go-proxmox+cloud-init

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

unmount the cd rom, and remove the tag.

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Allow plain IPv6 addresses as controlPlaneEndpoints

Describe the solution you'd like
At the moment, spec.controlPlaneEndpoint is treated as a hostname. This means that while IPv4 addresses work (they're valid hostnames), IPv6 addresses aren't:

admission webhook "validation.proxmoxcluster.infrastructure.cluster.x-k8s.io" denied the request: ProxmoxCluster.infrastructure.cluster.x-k8s.io "ipv6test01" is invalid: spec.controlplaneEndpoint: Invalid value: "2001:db8::6443": provided endpoint is not in a valid IP and port format

IPv6 addresses have to be encapsulated in square brackets ([2001:db8::]) to be valid hostnames. Unfortunately this causes further issues down the line.

Anything else you would like to add:
We already have existing deployments with the webhook disabled, so this is not optional.

Environment:

Cluster-api-provider-proxmox version: all
Kubernetes version: (use kubectl version): 1.28.6

Support DHCP

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Our current setup is based on IPAM and static IP Allocation.
Since the QEMU-guest-agent is pre-installed, we can support dhcp in the network-config.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

PLEASE NOTE: since we rely on the kube-vip for control planes, the CONTROL_PLANE_ENDPOINT shall remain static and must be set when creating a new cluster.

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

bump k8s.io/apimachinery to >=1.16.13

https://github.com/ionos-cloud/cluster-api-provider-proxmox/security/dependabot/4

Fix dual-stack cluster-template

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

It seems the dual-stack flavor is not fully working,
Cause we forgot to add the
CidrBlock to the cluster CR.

What did you expect to happen:

Cluster uses IPv6

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

	ipAddr, err := netip.ParseAddrPort(fmt.Sprintf("%s:%d", ep.Host, ep.Port))
	if err != nil {
	return apierrors.NewInvalid(
	gk,
	name,
	field.ErrorList{
	field.Invalid(
	field.NewPath("spec", "controlplaneEndpoint"), fmt.Sprintf("%s:%d", ep.Host, ep.Port), "provided endpoint is not in a valid IP and port format"),
	})
	}

ionos-cloud / cluster-api-provider-proxmox Goto Github PK

cluster-api-provider-proxmox's People

Stargazers

Watchers

Forkers

cluster-api-provider-proxmox's Issues

Release

Checklist

Release

Checklist

Release

Checklist

Recommend Projects

Recommend Topics

Recommend Org