openshift / lvm-operator Goto Github PK

The LVM Operator deploys and manages LVM storage on OpenShift clusters

License: Apache License 2.0

Dockerfile 0.93% Makefile 3.02% Go 91.81% Shell 3.10% Jsonnet 1.13%

lvm-operator's Introduction

The LVM Operator - part of LVMS

Official LVMS Product Documentation

For the latest information about usage and installation of LVMS (Logical Volume Manager Storage) in OpenShift, please use the official product documentation linked above.

Overview

Use the LVM Operator with LVMCluster custom resources to deploy and manage LVM storage on OpenShift clusters.

The LVM Operator leverages the TopoLVM CSI Driver on the backend to dynamically create LVM physical volumes, volume groups and logical volumes, and binds them to PersistentVolumeClaim resources. This allows applications running on the cluster to consume storage from LVM logical volumes backed by the TopoLVM CSI Driver.

The LVM Operator, in conjunction with the TopoLVM CSI Driver, Volume Group Manager, and other related components, collectively comprise the Logical Volume Manager Storage (LVMS) solution.

Here is a brief overview of how the Operator works. See here for the architecture diagram.

graph LR
LVMOperator((LVMOperator))-->|Manages| LVMCluster
LVMOperator-->|Manages| StorageClass
StorageClass-->|Creates| PersistentVolumeA
StorageClass-->|Creates| PersistentVolumeB
PersistentVolumeA-->LV1
PersistentVolumeB-->LV2
LVMCluster-->|Comprised of|Disk1((Disk1))
LVMCluster-->|Comprised of|Disk2((Disk2))
LVMCluster-->|Comprised of|Disk3((Disk3))

subgraph Logical Volume Manager
  Disk1-->|Abstracted|PV1
  Disk2-->|Abstracted|PV2
  Disk3-->|Abstracted|PV3
  PV1-->VG
  PV2-->VG
  PV3-->VG
  LV1-->VG
  LV2-->VG
end

Deploying the LVM Operator
Cleanup
Metrics
Known Limitations
Troubleshooting
Contributing

Deploying the LVM Operator

Due to the absence of a CI pipeline that builds this repository, you will need to either build it yourself or use a pre-built image that has been made available. Please note that the pre-built image may not be in sync with the current state of the repository.

Using the pre-built images

If you are comfortable using the pre-built images, simply proceed with the deployment steps.

Building the Operator yourself

To build the Operator, install Docker or Podman and log into your registry.

Set the following environment variables to the repository where you want to host your image:

$ export IMAGE_REGISTRY=<quay/docker etc>
$ export REGISTRY_NAMESPACE=<registry-username>
$ export IMAGE_TAG=<some-tag>

Build and push the container image:
```
$ make docker-build docker-push
```

Building the Operator for OLM deployment

If you intend to deploy the Operator using the Operator Lifecycle Manager (OLM), there are some additional steps you should follow.

Build and push the bundle image:
```
$ make bundle-build bundle-push
```
Build and push the catalog image:
```
$ make catalog-build catalog-push
```

Ensure that the OpenShift cluster has read access to that repository. Once this is complete, you are ready to proceed with the next steps.

Deploying the Operator

You can begin the deployment by running the following command:

$ make deploy

Deploying the Operator with OLM

You can begin the deployment using the Operator Lifecycle Manager (OLM) by running the following command:

$ make deploy-with-olm

The process involves the creation of several resources to deploy the Operator using OLM. These include a custom CatalogSource to define the Operator source, the openshift-storage namespace to contain the Operator components, an OperatorGroup to manage the lifecycle of the Operator, a Subscription to subscribe to the Operator catalog in the openshift-storage namespace, and finally, the creation of a ClusterServiceVersion to describe the Operator's capabilities and requirements.

Wait until the ClusterServiceVersion (CSV) reaches the Succeeded status:

$ kubectl get csv -n openshift-storage

NAME                   DISPLAY       VERSION   REPLACES   PHASE
lvms-operator.v0.0.1   LVM Storage   0.0.1                Succeeded

After the previous command has completed successfully, switch over to the openshift-storage namespace:

$ oc project openshift-storage

Wait until all pods have started running:

$ oc get pods -w

Once all pods are running, create a sample LVMCluster custom resource (CR):

$ oc create -n openshift-storage -f https://github.com/openshift/lvm-operator/raw/main/config/samples/lvm_v1alpha1_lvmcluster.yaml

After the CR is deployed, the following actions are executed:

A Logical Volume Manager (LVM) volume group named vg1 is created, utilizing all available disks on the cluster.
A thin pool named thin-pool-1 is created within vg1, with a size equivalent to 90% of vg1.
The TopoLVM Container Storage Interface (CSI) plugin is deployed, resulting in the launch of the topolvm-controller and topolvm-node pods.
A storage class and a volume snapshot class are created, both named lvms-vg1. This facilitates storage provisioning for OpenShift workloads. The storage class is configured with the WaitForFirstConsumer volume binding mode that is utilized in a multi-node configuration to optimize the scheduling of pod placement. This strategy prioritizes the allocation of pods to nodes with the greatest amount of available storage capacity.
The LVMS system also creates two additional internal CRs to support its functionality:
- LVMVolumeGroup is generated and managed by LVMS to monitor the individual volume groups across multiple nodes in the cluster.
- LVMVolumeGroupNodeStatus is created by the Volume Group Manager. This CR is used to monitor the status of volume groups on individual nodes in the cluster.

Wait until the LVMCluster reaches the Ready status:

$ oc get lvmclusters.lvm.topolvm.io my-lvmcluster

NAME            STATUS
my-lvmcluster   Ready

Wait until all pods are active:

$ oc get pods -w

The topolvm-node pod remains in the initialization phase until the vg-manager completes all the necessary preparations.

Once all the pods have been launched, the LVMS is ready to manage your logical volumes and make them available for use in your applications.

Inspecting the storage objects on the node

Prior to the deployment of the Logical Volume Manager Storage (LVMS), there are no pre-existing LVM physical volumes, volume groups, or logical volumes associated with the disks.

sh-4.4# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdb       8:16   0 893.8G  0 disk
|-sdb1    8:17   0     1M  0 part
|-sdb2    8:18   0   127M  0 part
|-sdb3    8:19   0   384M  0 part /boot
`-sdb4    8:20   0 893.3G  0 part /sysroot
sr0      11:0    1   987M  0 rom
nvme0n1 259:0    0   1.5T  0 disk
nvme1n1 259:1    0   1.5T  0 disk
nvme2n1 259:2    0   1.5T  0 disk
sh-4.4# pvs
sh-4.4# vgs
sh-4.4# lvs

After successful deployment, the necessary LVM physical volumes, volume groups, and thin pools are created on the host.

sh-4.4# pvs
  PV           VG  Fmt  Attr PSize  PFree
  /dev/nvme0n1 vg1 lvm2 a--  <1.46t <1.46t
  /dev/nvme1n1 vg1 lvm2 a--  <1.46t <1.46t
  /dev/nvme2n1 vg1 lvm2 a--  <1.46t <1.46t
sh-4.4# vgs
  VG  #PV #LV #SN Attr   VSize  VFree
  vg1   3   0   0 wz--n- <4.37t <4.37t
sh-4.4# lvs
  LV          VG  Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  thin-pool-1 vg1 twi-a-tz-- <3.93t             0.00   1.19

Testing the Operator

Once you have completed the deployment steps, you can proceed to create a basic test application that will consume storage.

To initiate the process, create a Persistent Volume Claim (PVC):

$ cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lvms-test
  labels:
    type: local
spec:
  storageClassName: lvms-vg1
  resources:
    requests:
      storage: 5Gi
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
EOF

Upon creation, you may observe that the PVC remains in a Pending state.

$ oc get pvc

NAME        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
lvms-test   Pending                                      lvms-vg1       7s

This behavior is expected as the storage class awaits the creation of a pod that requires the PVC.

To move forward, create a pod that can utilize this PVC:

$ cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: lvms-test
spec:
  volumes:
    - name: storage
      persistentVolumeClaim:
        claimName: lvms-test
  containers:
    - name: container
      image: public.ecr.aws/docker/library/nginx:latest
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: storage
EOF

Once the pod has been created and associated with the corresponding PVC, the PVC is bound, and the pod transitions to the Running state.

$ oc get pvc,pods

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/lvms-test   Bound    pvc-a37ef71c-a9b9-45d8-96e8-3b5ad30a84f6   5Gi        RWO            lvms-vg1       3m2s

NAME            READY   STATUS    RESTARTS   AGE
pod/lvms-test   1/1     Running   0          28s

Cleanup

To perform a full cleanup, follow these steps:

Remove all the application pods which are using PVCs created with LVMS, and then remove all these PVCs.
Ensure that there are no remaining LogicalVolume custom resources that were created by LVMS.
```
$ oc get logicalvolumes.topolvm.io
No resources found
```
Remove the LVMCluster CR.
```
$ oc delete lvmclusters.lvm.topolvm.io my-lvmcluster
lvmcluster.lvm.topolvm.io "my-lvmcluster" deleted
```
If the previous command is stuck, it may be necessary to perform a forced cleanup procedure.

Verify that the only remaining resource in the openshift-storage namespace is the Operator.

oc get pods -n openshift-storage
NAME                                 READY   STATUS    RESTARTS   AGE
lvms-operator-8bf864c85-8zjlp        3/3     Running   0          125m

To begin the undeployment process of LVMS, use the following command:
```
make undeploy
```

E2E Tests

There are a few steps required to run the end-to-end tests for LVMS.

You will need the following environment variables set:

IMAGE_REGISTRY={{REGISTRY_URL}} # Ex: quay.io
REGISTRY_NAMESPACE={{REGISTRY_NAMESPACE}} # Ex: lvms-dev, this should be your own personal namespace

Once the environment variables are set, you can run

# build and deploy your local code to the cluster
$ make deploy-local

# Wait for the lvms-operator to have status=Running
$ oc -n openshift-storage get pods
# NAME                             READY   STATUS    RESTARTS   AGE
# lvms-operator-579fbf46d5-vjwhp   3/3     Running   0          3m27s

# run the e2e tests
$ make e2e

# undeploy the operator from the cluster
$ make undeploy

Metrics

To enable monitoring on OpenShift clusters, assign the openshift.io/cluster-monitoring label to the same namespace that you deployed LVMS to.

$ oc patch namespace/openshift-storage -p '{"metadata": {"labels": {"openshift.io/cluster-monitoring": "true"}}}'

LVMS provides TopoLVM metrics and controller-runtime metrics, which can be accessed via OpenShift Console.

Known Limitations

Unsupported Device Types

Here is a list of the types of devices that are excluded by LVMS. To get more information about the devices on your machine and to check if they fall under any of these filters, run:

$ lsblk --paths --json -o NAME,ROTA,TYPE,SIZE,MODEL,VENDOR,RO,STATE,KNAME,SERIAL,PARTLABEL,FSTYPE

Read-Only Devices:
- Condition: Devices marked as read-only are unsupported.
- Why: LVMS requires the ability to write and modify data dynamically, which is not possible with devices set to read-only mode.
- Filter: ro is set to true.
Suspended Devices:
- Condition: Devices in a suspended state are unsupported.
- Why: A suspended state implies that a device is temporarily inactive or halted, and attempting to incorporate such devices into LVMS can introduce complexities and potential issues.
- Filter: state is suspended.
Devices with Invalid Partition Labels:
- Condition: Devices with partition labels such as bios, boot, or reserved are unsupported.
- Why: These labels indicate reserved or specialized functionality associated with specific system components. Attempting to use such devices within LVMS may lead to unintended consequences, as these labels may be reserved for system-related activities.
- Filter: partlabel has either bios, boot, or reserved.
Devices with Invalid Filesystem Signatures:
- Condition: Devices with invalid filesystem signatures are unsupported. This includes:
  - Devices with a filesystem type set to LVM2_member (only valid if no children).
  - Devices with no free capacity as a physical volume.
  - Devices already part of another volume group.
- Why: These conditions indicate that either this device is already used by another volume group or have no free capacity to be used within LVMS.
- Filter: fstype is not null, or fstype is set to LVM2_member and has children block devices, or pvs --units g -v --reportformat json returns pv_free for the block device set to 0G.
Devices with Children:
- Condition: Devices with children block devices are unsupported.
- Why: LVMS operates optimally with standalone block devices that are not part of a hierarchical structure. Devices with children can complicate volume management, potentially causing conflicts, errors, or difficulties in tracking and managing logical volumes.
- Filter: children has children block devices.
Devices with Bind Mounts:
- Condition: Devices with bind mounts are unsupported.
- Why: Managing logical volumes becomes more complex when dealing with devices that have bind mounts, potentially causing conflicts or difficulties in maintaining the integrity of the logical volume setup.
- Filter: cat /proc/1/mountinfo | grep <device-name> returns mount points for the device in the 4th or 10th field.
ROM Devices:
- Condition: Devices of type rom are unsupported.
- Why: Such devices are designed for static data storage and lack the necessary read-write capabilities essential for dynamic operations performed by LVMS.
- Filter: type is set to rom.
LVM Partitions:
- Condition: Devices of type LVM partition are unsupported.
- Why: These partitions are already dedicated to LVM and are managed as part of an existing volume group.
- Filter: type is set to lvm.
Loop Devices:
- Condition: Loop Devices must not be used if they are already in use by Kubernetes.
- Why: When loop devices are utilized by Kubernetes, they are likely configured for specific tasks or processes managed by the Kubernetes environment. Integrating loop devices that are already in use by Kubernetes into LVMS can lead to potential conflicts and interference with the Kubernetes system.
- Filter: type is set to loop, and losetup <loop-device> -O BACK-FILE --json returns a back-file which contains plugins/kubernetes.io.

Devices meeting any of these conditions are filtered out for LVMS operations.

NOTE: It is strongly recommended to perform a thorough wipe of a device before using it within LVMS to proactively prevent unintended behaviors or potential issues.

Single LVMCluster support

LVMS does not support the reconciliation of multiple LVMCluster custom resources simultaneously.

Upgrades from v 4.10 and v4.11

It is not possible to upgrade from release-4.10 and release-4.11 to a newer version due to a breaking change that has been implemented. For further information on this matter, consult the relevant documentation.

Missing native LVM RAID Configuration support

Currently, LVM Operator forces all LVMClusters to work with a thinly provisioned volume in order to support Snapshotting and Cloning on PVCs. This is backed by an LVM Logical Volume of type thin, which is reflected in the LVM flags as an attribute. When trying to use LVM's inbuilt RAID capabilities, it conflicts with this thin attribute as the same flag is also indicative whether a volume is part of LVM RAID configurations (r or R flag). This means that the only way to support RAID configuration from within LVM would be to do a conversion from two RAID Arrays into a thinpool with lvconvert, after which the RAID is no longer recognized by LVM (due to said conflict in the volume attributes). While this would enable initial synchronization and redundancy, all repair and extend operations would not longer respect the RAID topology in the Volume Group, and operations like lvconvert --repair are not even supported anymore. This means that it would be quite a complex situation to recover from.

Instead of doing LVM based RAIDs, we recommend using the mdraid subsystem in linux instead of the LVM RAID capabilities. Simply create a RAID array with mdadm and then use this in your deviceSelector within LVMCluster:

For a simple RAID1, you could use mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdc1
Then you can reference /dev/md0 in the deviceSelector as normal
Any recovery and syncing will then happen with mdraid: Replacing Disks and Repairing will work transparently of LVMS and can be covered by a sysadmin of the Node.

NOTE: Currently, RAID Arrays created with mdraid are not automatically recognized when not using any deviceSelector, thus they MUST be specified explicitly.

Missing LV-level encryption support

Currently, LVM Operator does not have a native LV-level encryption support. Instead, you can encrypt the entire disk or partitions, and use them within LVMCluster. This way all LVs created by LVMS on this disk will be encrypted out-of-the-box.

Here is an example MachineConfig that can be used to configure encrypted partitions during an OpenShift installation:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 98-encrypted-disk-partition-master
  labels:
    machineconfiguration.openshift.io/role: master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      disks:
        - device: /dev/nvme0n1
          wipeTable: false
          partitions:
            - sizeMiB: 204800
              startMiB: 600000
              label: application
              number: 5
      luks:
        - clevis:
            tpm2: true
          device: /dev/disk/by-partlabel/application
          name: application
          options:
          - --cipher
          - aes-cbc-essiv:sha256
          wipeVolume: true

Then, the path to the encrypted partition /dev/mapper/application can be specified in the deviceSelector.

For non-OpenShift clusters, you can encrypt a disk using LUKS with cryptsetup, and then use this in your deviceSelector within LVMCluster:

Set up the /dev/sdb device for encryption. This will also remove all the data on the device:
```
cryptsetup -y -v luksFormat /dev/sdb
```
You'll be prompted to set a passphrase to unlock the volume.
Create a logical device-mapper device named encrypted, mounted to the LUKS-encrypted device:
```
cryptsetup luksOpen /dev/sdb encrypted
```
You'll be prompted to enter the passphrase you set when creating the volume.
You can now reference /dev/mapper/encrypted in the deviceSelector.

Snapshotting and Cloning in Multi-Node Topologies

In general, since LVMCluster does not ensure data replication, VolumeSnapshots and consumption of them is always limited to the original dataSource. Thus, snapshots must be created on the same node as the original data. Also, all pods relying on a PVC that is using the snapshot data will have to be scheduled on the node that contained the original LogicalVolume in TopoLVM.

It should be noted that snapshotting is based on Thin-Pool Snapshots from upstream TopoLVM and are still considered experimental in upstream. This is because multi-node Kubernetes clusters have the scheduler figure out pod placement logically onto different nodes (with the node topology from the native Kubernetes Scheduler responsible for deciding the node where Pods should be deployed), and it cannot always be guaranteed that Snapshots are provisioned on the same node as the original data (which is based on the CSI topology, known by TopoLVM) if the PersistentVolumeClaim is not created upfront.

If you are unsure what to make of this, always make sure that the original PerstistentVolumeClaim that you want to have Snapshots on is already created and Bound. With these prerequisites it can be guaranteed that all follow-up VolumeSnapshot Objects as well as PersistentVolumeClaim objects depending on the original one are scheduled correctly. The easiest way to achieve this is to use precreated PersistentVolumeClaims and non-ephemeral StatefulSet for your workload.

NOTE: All of the above also applies for cloning the PersistentVolumeClaims directly by using the original PersistentVolumeClaims as data source instead of using a Snapshot.

Validation of `LVMCluster` CRs outside the `openshift-storage` namespace

When creating an LVMCluster CR outside the openshift-storage namespace by installing it via ClusterServiceVersion, the Operator will not be able to validate the CR. This is because the ValidatingWebhookConfiguration is restricted to the openshift-storage namespace and does not have access to the LVMCluster CRs in other namespaces. Thus, the Operator will not be able to prevent the creation of invalid LVMCluster CRs outside the openshift-storage namespace. However, it will also not pick it up and simply ignore it.

This is because Operator Lifecycle Manager (OLM) does not allow the creation of ClusterServiceVersion with installMode OwnNamespace while also not restricting the webhook configuration. Validation in the openshift-storage namespace is processed normally.

Troubleshooting

See the troubleshooting guide.

Contributing

See the contribution guide.

lvm-operator's People

Contributors

Stargazers

Watchers

lvm-operator's Issues

components stuck in init

Hello, after deploying operator and a lvmcluster, the pods fail to spawn

[root@cnf10-worker-0 ~]# oc get pod -n odf-lvm
NAME                                  READY   STATUS             RESTARTS        AGE
controller-manager-765f44745b-hgcpn   3/3     Running            0               31m
topolvm-controller-5ffdc8cd9f-sktg9   4/4     Running            8 (7m41s ago)   31m
topolvm-node-8ffm7                    0/4     Init:0/1           0               31m
topolvm-node-w2rvd                    0/4     Pending            0               31m
topolvm-node-w5s7w                    0/4     Init:0/1           0               31m
vg-manager-8mvn8                      0/1     CrashLoopBackOff   7 (4m4s ago)    31m
vg-manager-mg9xj                      0/1     CrashLoopBackOff   7 (4m19s ago)   31m
vg-manager-wvhbv                      0/1     CrashLoopBackOff   7 (3m53s ago)   31m

[root@cnf10-worker-0 ~]# oc describe pod  -n odf-lvm topolvm-node-8ffm7
Name:         topolvm-node-8ffm7
Namespace:    odf-lvm
Priority:     0
Node:         ci-ovirt-master-0.karmalabs.com/10.19.135.249
Start Time:   Wed, 20 Apr 2022 13:53:03 -0400
Labels:       app=topolvm-node
              controller-revision-hash=5685697cf9
              pod-template-generation=1
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.133.0.246"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.133.0.246"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: odf-lvm-topolvm-node
Status:       Pending
IP:           10.133.0.246
IPs:
  IP:           10.133.0.246
Controlled By:  DaemonSet/topolvm-node
Init Containers:
  file-checker:
    Container ID:  cri-o://a8a17b40bc03851f13063e7bb245e4a0214b39411a54ab1ebfabec0b634ef14b
    Image:         registry.redhat.io/odf4/odf-lvm-rhel8-operator@sha256:2bad9a3ab52faf43f8f5258c64ea6734ab40114addfdde116c0bd27d9088bf49
    Image ID:      registry.redhat.io/odf4/odf-lvm-rhel8-operator@sha256:2bad9a3ab52faf43f8f5258c64ea6734ab40114addfdde116c0bd27d9088bf49
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/bash
      -c
      until [ -f /etc/topolvm/lvmd.yaml ]; do echo waiting for lvmd config file; sleep 5; done
    State:          Running
      Started:      Wed, 20 Apr 2022 13:53:14 -0400
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/topolvm from lvmd-config-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pck6f (ro)
Containers:
  lvmd:
    Container ID:
    Image:         registry.redhat.io/odf4/odf-topolvm-rhel8@sha256:4fb7b673d4a14021df0ad89cd99eed68dd837163bfc32aa8dc8b3eb10d60acee
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /lvmd
      --config=/etc/topolvm/lvmd.yaml
      --container=true
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  250Mi
    Requests:
      cpu:        250m
      memory:     250Mi
    Environment:  <none>
    Mounts:
      /etc/topolvm from lvmd-config-dir (rw)
      /run/lvmd from lvmd-socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pck6f (ro)
  topolvm-node:
    Container ID:
    Image:         registry.redhat.io/odf4/odf-topolvm-rhel8@sha256:4fb7b673d4a14021df0ad89cd99eed68dd837163bfc32aa8dc8b3eb10d60acee
    Image ID:
    Port:          9808/TCP
    Host Port:     0/TCP
    Command:
      /topolvm-node
      --lvmd-socket=/run/lvmd/lvmd.sock
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  250Mi
    Requests:
      cpu:     250m
      memory:  250Mi
    Liveness:  http-get http://:healthz/healthz delay=10s timeout=3s period=60s #success=1 #failure=3
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /run/lvmd from lvmd-socket-dir (rw)
      /run/topolvm from node-plugin-dir (rw)
      /var/lib/kubelet/plugins/kubernetes.io/csi from csi-plugin-dir (rw)
      /var/lib/kubelet/pods from pod-volumes-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pck6f (ro)
  csi-registrar:
    Container ID:
    Image:         registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:3308ef98afab494b80aa1a702924407cf114bce6e0ad92436e508d7dc951521c
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/run/topolvm/csi-topolvm.sock
      --kubelet-registration-path=/var/lib/kubelet/plugins/topolvm.cybozu.com/node/csi-topolvm.sock
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /registration from registration-dir (rw)
      /run/topolvm from node-plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pck6f (ro)
  liveness-probe:
    Container ID:
    Image:         registry.redhat.io/openshift4/ose-csi-livenessprobe@sha256:6b40bb1cb5bffc8e8689b8d01e43096a2d57981aa20ae7859618054ed3800bd7
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/run/topolvm/csi-topolvm.sock
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /run/topolvm from node-plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pck6f (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  Directory
  node-plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/topolvm.cybozu.com/node
    HostPathType:  DirectoryOrCreate
  csi-plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/kubernetes.io/csi
    HostPathType:  DirectoryOrCreate
  pod-volumes-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods/
    HostPathType:  DirectoryOrCreate
  lvmd-config-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/topolvm
    HostPathType:  Directory
  lvmd-socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  kube-api-access-pck6f:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  Scheduled       31m   default-scheduler  Successfully assigned odf-lvm/topolvm-node-8ffm7 to ci-ovirt-master-0.karmalabs.com
  Normal  AddedInterface  31m   multus             Add eth0 [10.133.0.246/23] from openshift-sdn
  Normal  Pulling         31m   kubelet            Pulling image "registry.redhat.io/odf4/odf-lvm-rhel8-operator@sha256:2bad9a3ab52faf43f8f5258c64ea6734ab40114addfdde116c0bd27d9088bf49"
  Normal  Pulled          30m   kubelet            Successfully pulled image "registry.redhat.io/odf4/odf-lvm-rhel8-operator@sha256:2bad9a3ab52faf43f8f5258c64ea6734ab40114addfdde116c0bd27d9088bf49" in 7.936196491s
  Normal  Created         30m   kubelet            Created container file-checker
  Normal  Started         30m   kubelet            Started container file-checker

LVMVolumeGroupNodeStatus CR name should follow DNS naming standard.

LVMVolumeGroupNodeStatus CR name uses the node name directly. This might not always follow the DNS naming standards for K8s resources.

Support disconnected installs

Update the CSV to support disconnected installations.

got an error when making deploy

Hi All,

I got an error when making deploy. Does anyone know how to fix it? is it a bug? Thanks!

output rules (optionally as output:<generator>:...)

+output:artifacts[:code=<string>],config=<string>  package  outputs artifacts to different locations, depending on whether they're package-associated or not.   
+output:dir=<string>                               package  outputs each artifact to the given directory, regardless of if it's package-associated or not.      
+output:none                                       package  skips outputting anything.                                                                          
+output:stdout                                     package  outputs everything to standard-out, with no separation.                                             

run `controller-gen rbac:roleName=manager-role crd webhook paths=./... output:crd:artifacts:config=config/crd/bases -w` to see all available markers, or `controller-gen rbac:roleName=manager-role crd webhook paths=./... output:crd:artifacts:config=config/crd/bases -h` for usage
make: *** [Makefile:92: manifests] Error 1

Investigate volumegroup/physicalvolume removal using `--nolock` option

Quick starting with the operator

Hi there,
I am starting to play with this operator with the idea to add a Dynamic Storage Provisioning to an SNO. I managed to build and deploy the operator (I have created and pushed my own operator image, because the default one points to a quay.io closed repo).
It seems the operator is working oka

$> oc get pods
NAME                                  READY   STATUS    RESTARTS   AGE
controller-manager-66b84d759f-9zpv9   3/3     Running   0          2m54s

also I created a first StorageClass according to the documentation from topolvm:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: topolvm-provisioner
provisioner: topolvm.cybozu.com
parameters:
  "csi.storage.k8s.io/fstype": "xfs"
  "topolvm.cybozu.com/device-class": "ssd"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

$ oc get sc
NAME                            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-sc                        kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  2d4h
topolvm-provisioner (default)   topolvm.cybozu.com             Delete          WaitForFirstConsumer   true                   8m46s

I have tried to create a PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: topolvm-pv-claim
spec:
  storageClassName: topolvm-provisioner
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

but it remains pending (for a pod, that is oka) for a PV:

$ oc get pvc
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS          AGE
topolvm-pv-claim   Pending                                      topolvm-provisioner   2m43s
$> oc describe pvc topolvm-pv-claim
Name:          topolvm-pv-claim
Namespace:     lvm-operator-system
StorageClass:  topolvm-provisioner
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: topolvm.cybozu.com
               volume.kubernetes.io/selected-node: master-0.apollo2.hpecloud.org
               volume.kubernetes.io/storage-provisioner: topolvm.cybozu.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       task-pv-pod
Events:
  Type    Reason                Age                   From                         Message
  ----    ------                ----                  ----                         -------
  Normal  WaitForFirstConsumer  3m9s (x2 over 3m19s)  persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ExternalProvisioning  9s (x14 over 3m6s)    persistentvolume-controller  waiting for a volume to be created, either by external provisioner "topolvm.cybozu.com" or manually created by system administrator

The external provisioner should create the PV? Checking the operator logs seems not aware of these resources, so I can imagine I have to create resources of kind LogicalVolumes or LVMCluster.

But now I am not sure how to proceed. I guess I have to create a LogicalVolume pointing to one local volume in my node.
Please, some examples or quickstart would be really appreciated.

Use PSPs/Pod security standards to restrict unwanted resource/namespace access to host

With SCC we can only target namespace level isolation and SELinux, Syscalls filteration
Add PSP to further restrict this access, for example restricting access to a single directory when using hostPath

make deploy will fail if no prometheus CRDs in cluster

If it's not strictly required to have prometheus to install the LVM Operator, could the prometheus-related CRs be marked an optional step?

Resources documentation

Hi there,

would it be possible to have a short documentation about the different resources managed by the operator? In the installation document we can check the LVMCluster Resource, that it is easy to understand how to use and what it does. But, would it be possible to know a little about LogicalVolumes, LVMVolumeGroupNodeStatus, etc?

many thanks,

No news disks are added to the LVMCluster/LVMvolumegroupnodestatuses

I had one SNO with an LVMCluster created to manage the VG vg1 (/dev/nvme0n1, /dev/nvme1n1, /dev/sda). I wanted to test the addition of new disks. So, I rebooted and created some more disks from the raid I have in the server.

After creating some more disks the SNO is rebooted and:

The new disks (/dev/sdc, /dev/sdd) have been added to the vg1 in the node:

[root@master-0 core]# vgs
  VG  #PV #LV #SN Attr   VSize VFree 
  vg1   5   4   0 wz--n- 6.00t <6.00t
[root@master-0 core]# pvs
  PV           VG  Fmt  Attr PSize   PFree  
  /dev/nvme0n1 vg1 lvm2 a--  745.21g 745.21g
  /dev/nvme1n1 vg1 lvm2 a--  745.21g 745.21g
  /dev/sda     vg1 lvm2 a--   <2.73t   2.72t
  /dev/sdc     vg1 lvm2 a--  931.48g 931.48g
  /dev/sde     vg1 lvm2 a--  931.48g 931.48g

This new disks are not recognized by the LVMCluster/LVMvolumegroupnodestatuses:

$ oc get lvmvolumegroupnodestatuses -o yaml
apiVersion: v1
items:
- apiVersion: lvm.topolvm.io/v1alpha1
  kind: LVMVolumeGroupNodeStatus
  metadata:
    creationTimestamp: "2022-02-05T18:58:01Z"
    generation: 1
    name: master-0.apollo2.hpecloud.org
    namespace: lvm-operator-system
    resourceVersion: "6276721"
    uid: b6cc0395-e17c-4116-941f-9b95b8c4ed82
  spec:
    nodeStatus:
    - devices:
      - /dev/nvme0n1
      - /dev/nvme1n1
      - /dev/sda
      name: vg1
      status: Ready
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

$ oc get lvmcluster  lvmcluster-sample -o yaml
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"lvm.topolvm.io/v1alpha1","kind":"LVMCluster","metadata":{"annotations":{},"name":"lvmcluster-sample","namespace":"lvm-operator-system"},"spec":{"deviceClasses":[{"name":"vg1"}]}}
  creationTimestamp: "2022-02-05T18:56:54Z"
  finalizers:
  - lvmcluster.topolvm.io
  generation: 1
  name: lvmcluster-sample
  namespace: lvm-operator-system
  resourceVersion: "6276722"
  uid: fb91728e-1300-4fe1-9041-78d5a8e166d6
spec:
  deviceClasses:
  - name: vg1
status:
  deviceClassStatuses:
  - name: vg1
    nodeStatus:
    - devices:
      - /dev/nvme0n1
      - /dev/nvme1n1
      - /dev/sda
      node: master-0.apollo2.hpecloud.org
      status: Ready
  ready: true

I have created a second LVMCluster because I thought this would collect the new disks. Now I understand it is only one LVMCluster supported, would this have interfere?

Check infra features in CSV

It is important that we verify that our CSVs are labeled correctly for when we appear in OperatorHub.

See this link for labels:
https://docs.openshift.com/container-platform/4.9/operators/operator_sdk/osdk-generating-csvs.html#osdk-csv-manual-annotations_osdk-generating-csvs

IMO we should make sure operators.openshift.io/infrastructure-features is set to csi and maybe also to fips (once we test it)

Migrate the sqlite based catalog to file based catalogs

WARN[0000] DEPRECATION NOTICE:
Sqlite-based catalogs and their related subcommands are deprecated. Support for
them will be removed in a future release. Please migrate your catalog workflows
to the new file-based catalog format.

lsblk detecting my mini sas hd connected drives as removable

vg-manager logs:

{"level":"info","ts":1644897063.4425514,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"sdc","filter.Name":"notRemovable"}
{"level":"info","ts":1644897063.449244,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"sdd","filter.Name":"notRemovable"}```

Add labels to lvm-operator created resources

Currently we using labels for some resources and for rest we aren't using any
Use consistent labels for all resources created by operator

can this operator work with empty pvs instead of disks?

hello, we have many snos and each one has only one disk of 1 tb.
is there any way for us to make this operator work if we split the disk into multiple partitions instead of buying new disks?

LVMO should only allow a single LVMCluster instance.

LVMO should allow a single LVMCluster instance and fail the reconciliation if multiple instances are found.

After operator upgrade PVC can not bee processed due to provisioner change

I've upgraded operator with make deploy command. Old storage class had provisioner topolvm.cybozu.com, now it's topolvm.io, but it didn't updated in my current sc. Now when all pods restarted they can't mount pvc with an error "topolvm.cybozu.com". Now all pv can't be resolved with an error kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name topolvm.cybozu.com not found in the list of registered CSI drivers. How can I correctly upgrade sc and is it possible to upgrade current pvc without loosing data and how avoid such unexpected bugs in future

Deployment fails due to dockerhub pull rate limit

The LVMCluster creation is stuck in a loop, because the topolvm-controller waits for its initContainer, which wants to pull "alpine/openssl".
It fails to pull this container from Dockerhub (the only location where this is available) due to DockerHubs pull rate limite:

Failed to pull image "alpine/openssl": rpc error: code = Unknown desc = initializing source docker://alpine/openssl:latest: reading manifest latest in docker.io/alpine/openssl: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Can we use a different image? As a quick workaround, I rehosted the image at quay.io/mulbc/alpine-openssl

Configure loglevel of the csisidecars

Currently, I dont see if we can configure the csi log level for the sidecar args. Is it something we are planning to do it which helps in debugging issues?

cc @nbalacha

topolvm container getting OOM killed

From the topolvm -node pod:

containerID: cri-o://ad1270fd321439c82dfad3a0775065e15cc3ce96ce3d085dc94c66c5a2138302
image: quay.io/topolvm/topolvm:0.10.3
imageID: quay.io/topolvm/topolvm@sha256:4fbbac323f1cc2310d717efa76b9205f1a79d360291be2efd08eb2d1f6971ca2
lastState:
terminated:
containerID: cri-o://ea27712cbf16533dff038f2a1871f24868f4a3bab6d264342c1e4c25c1e57b34
exitCode: 137
finishedAt: "2022-01-10T13:10:53Z"
reason: OOMKilled
startedAt: "2022-01-10T13:10:36Z"

vg-manager failed to create/extend volume group.

Using prebuilt lvm image.

sh-4.4# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 250G 0 disk
|-sda1 8:1 0 1M 0 part
|-sda2 8:2 0 127M 0 part
|-sda3 8:3 0 384M 0 part
`-sda4 8:4 0 249.5G 0 part /dev/termination-log
sdb 8:16 0 250G 0 disk
sr0 11:0 1 1024M 0 rom

Pod log:
{"level":"info","ts":1652381771.6423192,"logger":"controller.lvmvolumegroup.vg-manager","msg":"reconciling","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","lvmvolumegroup":"lvm-operator-system/vg1"}
{"level":"info","ts":1652381771.6431682,"logger":"controller.lvmvolumegroup.vg-manager","msg":"getting block devices for volumegroup","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","VGName":"vg1"}
{"level":"info","ts":1652381771.6524365,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"sda","filter.Name":"noChildren"}
{"level":"info","ts":1652381771.6624043,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"sr0","filter.Name":"usableDeviceType"}
{"level":"info","ts":1652381771.6625292,"logger":"controller.lvmvolumegroup.vg-manager","msg":"lvmd config file doesn't exist, will create","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system"}
{"level":"info","ts":1652381771.7175312,"logger":"controller.lvmvolumegroup.vg-manager","msg":"creating a new volume group","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","VGName":"vg1"}
{"level":"error","ts":1652381771.7751362,"logger":"controller.lvmvolumegroup.vg-manager","msg":"failed to create/extend volume group","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","VGName":"vg1","error":"failed to create or extend volume group "vg1". exit status 5","stacktrace":"github.com/red-hat-storage/lvm-operator/pkg/vgmanager.(*VGReconciler).Reconcile\n\t/workspace/pkg/vgmanager/vgmanager_controller.go:97\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

oc describe pod vg-manager-9zjnk
Name: vg-manager-9zjnk
Namespace: lvm-operator-system
Priority: 0
Node: mx-sno-ocp/192.168.5.89
Start Time: Thu, 12 May 2022 18:53:41 +0000
Labels: app.lvm.openshift.io=vg-manager
controller-revision-hash=749865ff7b
pod-template-generation=1
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.128.1.3/23"],"mac_address":"0a:58:0a:80:01:03","gateway_ips":["10.128.0.1"],"ip_address":"10.128.1.3/23","...
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.128.1.3"
],
"mac": "0a:58:0a:80:01:03",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.128.1.3"
],
"mac": "0a:58:0a:80:01:03",
"default": true,
"dns": {}
}]
openshift.io/scc: odf-lvm-vgmanager
Status: Running
IP: 10.128.1.3
IPs:
IP: 10.128.1.3
Controlled By: DaemonSet/vg-manager
Containers:
vg-manager:
Container ID: cri-o://48330dd0356d83e383da84159ffd544c1a17b68f3bf778ace9f407e43a76947a
Image: quay.io/ocs-dev/lvm-operator:latest
Image ID: quay.io/ocs-dev/lvm-operator@sha256:642717a0be4c9fbb5be6b3c9891bcaa16634dadacb7a269996722af0bb8e7fb6
Port:
Host Port:
Command:
/vgmanager
State: Running
Started: Thu, 12 May 2022 18:53:43 +0000
Ready: True
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
POD_NAMESPACE: lvm-operator-system (v1:metadata.namespace)
POD_NAME: vg-manager-9zjnk (v1:metadata.name)
Mounts:
/dev from device-dir (rw)
/etc/topolvm from lvmd-conf (rw)
/run/udev from run-udev (rw)
/sys from sys (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qt64z (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
lvmd-conf:
Type: HostPath (bare host directory volume)
Path: /etc/topolvm
HostPathType: DirectoryOrCreate
device-dir:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType: Directory
run-udev:
Type: HostPath (bare host directory volume)
Path: /run/udev
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
kube-api-access-qt64z:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional:
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message

Normal Scheduled 7m11s default-scheduler Successfully assigned lvm-operator-system/vg-manager-9zjnk to mx-sno-ocp
Normal AddedInterface 7m9s multus Add eth0 [10.128.1.3/23] from ovn-kubernetes
Normal Pulling 7m9s kubelet Pulling image "quay.io/ocs-dev/lvm-operator:latest"
Normal Pulled 7m9s kubelet Successfully pulled image "quay.io/ocs-dev/lvm-operator:latest" in 478.429234ms
Normal Created 7m9s kubelet Created container vg-manager
Normal Started 7m9s kubelet Started container vg-manager

The image referenced in the documentation lacks required utils (lsblk)

When trying to deploy the operator using the prebuilt image referenced in the documentation (quay.io/mulbc/lvm-operator), everything seems to work fine until you create the LVMCluster resource. All pods are started, but when creating a pvc and a pod, nothing happens.

Looking at the pod logs, I found the following in the vg-manager pod:

{"level":"error","ts":1644504805.3839536,"logger":"controller.lvmvolumegroup","msg":"Reconciler error","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vglvmoperator","namespace":"lvm-operator-system","error":"failed to list block devices: exec: "lsblk": executable file not found in $PATH","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

It looks like the image is based on Fedora 35, and lacks the util-linux package that provides lsblk.

I managed to build the images manually (replacing centos:8 with centos:stream8 in the Dockerfile definition), and then everything worked as expected.

[VgManager] Cannot use /dev/dm-0: device is not in a usable state

vgManager fails to filter dm devices

{"level":"error","ts":1640139614.0910594,"logger":"controller.lvmcluster.vg-manager","msg":"could not prepare volume group","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMCluster","name":"lvmcluster-sample","namespace":"lvm-operator-system","name":"vg1","error":"exit status 5","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1640139614.091191,"logger":"controller.lvmcluster.vg-manager","msg":"reconcile complete","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMCluster","name":"lvmcluster-sample","namespace":"lvm-operator-system","result":{"Requeue":false,"RequeueAfter":120000000000}}

deviceClass to storage class mapping

there's a possibility that deviceClass name may not conform to dns naming convention
in that scenario directly mapping it to the storage class name may fail, need to handle that scenario
ref: #38 (comment)

Config file is not created by the LVM operator.

I am getting "waiting for lvmd config file" in the topolvm-node pod's log for the file checker container.

sh-4.4# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
nvme0n1 259:0 0 1000G 0 disk
|-nvme0n1p1 259:1 0 1M 0 part
|-nvme0n1p2 259:2 0 127M 0 part
|-nvme0n1p3 259:3 0 384M 0 part
`-nvme0n1p4 259:4 0 999.5G 0 part /dev/termination-log

do not use block devices created by topolvm in vg-manager

create block PVCs and restart vgmanger
vgmanager seems to think block PVCs can be used to create VGs
so it seemed to have created physical volumes and ran vgcreate with them!!

lvm-operator vs topolvm

Hi,

I'm looking at this operator to add Dynamic Storage Provisioning to an SNO (without the ability to use ODF) and I stumbled upon this operator.
Is there a bit more documentation on what this operator provides compared to what topolvm already does? Is it planned to be made available in the Openshift's OperatorHub soon? I'd like to avoid an operator not yet in the OperatorHub.

Is the ODF LVM Operator in the 4.10 OperatorHub related to this operator? What is the subscription required for this Operator, only ODF or the whole RHACM is required as specified here (https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/deploying_openshift_data_foundation_on_single_node_openshift_clusters/installing-odf-logical-volume-manager-operator-using-rhacm_sno)

Thank you for your time!

cannot make it work after re-install

After making a mistake I decided to delete everything, uninstall, and re-install the operator. I have followed your guide for that: delete all pods, pvc, pvs, and all the resources from the operator. Then, I manually connected to the only SNO and deleted the VG and LVs. I uninstalled the operator.
After re-installing it again and creating a new LVMCluster (I am using a different device name this time) I cannot make it work:

$ oc get pods
NAME                                  READY   STATUS     RESTARTS   AGE
controller-manager-66b84d759f-mpg5p   3/3     Running    0          6m7s
topolvm-controller-df459cdd5-pfkjb    4/4     Running    0          5m50s
topolvm-node-fd92q                    0/4     Init:0/1   0          5m50s
vg-manager-2mxdr                      1/1     Running    0          5m50s

the topolvm-node never start, If I connect to host and take a look to the processes, there is something waiting for the file that I deleted during the uninstall process

root       31332  0.0  0.0  11920  2888 ?        Ss   18:29   0:00 sh -c until [ -f /etc/topolvm/lvmd.yaml ]; do echo waiting for lvmd config file; sleep 5; done

The vg-manager is also pointing to this file not existing, but it seems it is trying to create it. But it also failing creating the VG:

{"level":"info","ts":1644085993.3831933,"logger":"controller.lvmvolumegroup.vg-manager","msg":"reconciling","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","lvmvolumegroup":"lvm-operator-system/vg3"}
{"level":"info","ts":1644085993.3832853,"logger":"controller.lvmvolumegroup.vg-manager","msg":"listing block devices","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","VGName":"vg3"}
{"level":"info","ts":1644085993.4152625,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","Device.Name":"sdb","filter.Name":"noChildren"}
{"level":"info","ts":1644085993.4331303,"logger":"controller.lvmvolumegroup.vg-manager","msg":"lvmd config file doesn't exist, will create","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system"}
{"level":"info","ts":1644085993.5088563,"logger":"controller.lvmvolumegroup.vg-manager","msg":"creating a new volume group","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","Name":"vg3"}
{"level":"error","ts":1644085993.6091464,"logger":"controller.lvmvolumegroup.vg-manager","msg":"failed to create/extend volume group","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","VGName":"vg3","error":"failed to create or extend volume group \"vg3\". exit status 5","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1644085993.6093926,"logger":"controller.lvmvolumegroup.vg-manager","msg":"lvmvolumegroupnodestatus unchanged","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system"}
{"level":"error","ts":1644085993.6094112,"logger":"controller.lvmvolumegroup.vg-manager","msg":"reconcile error","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","error":"failed to create or extend volume group \"vg3\". exit status 5","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1644085993.6094544,"logger":"controller.lvmvolumegroup.vg-manager","msg":"reconcile complete","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","result":{"Requeue":true,"RequeueAfter":60000000000}}
{"level":"error","ts":1644085993.6095316,"logger":"controller.lvmvolumegroup","msg":"Reconciler error","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg3","namespace":"lvm-operator-system","error":"failed to create or extend volume group \"vg3\". exit status 5","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

Is there something I would try?

Feature request : arm64 Image

Hello!

I didn't see this operator available on Openshift on ARM. It is planned ?

Thanks!

Rename the service account for the controller manager

The current serviceAccount for the controller manager is the auto generated controller-manager. This should be renamed to odf-lvm-operator or similar in order to distinguish it from any other operators that are installed in the same namespace.

LVM operator configuration does not create volume groups.

Installed SNO using Assisted-Installer on Bare Metal with two disks (1 TB each)
Installed ODF LVM operator
Configured using vg2.

Storage class was created
vg2 was not created ( output is attached)
topolvm_node pod is not running.

Attached Logs and Screenshots:

topolvm-node-8x8q7-file-checker.log
vg-manager-896xc-vg-manager.log

topolvm-controller-57d5786b76-hbf9z-topolvm-controller.log
controller-manager-744c976db4-bpqzn-kube-rbac-proxy.log