canonical / microceph Goto Github PK

View Code? Open in Web Editor NEW

179.0 19.0 25.0 1.96 MB

Ceph for a one-rack cluster and appliances

Home Page: https://snapcraft.io/microceph

License: GNU Affero General Public License v3.0

Shell 5.11% Makefile 0.30% Go 94.59%

ceph cluster hacktoberfest

microceph's Introduction

MicroCeph

MicroCeph is snap-deployed Ceph with built-in clustering.

💡 Philosophy
🎯 Features
⚡️Quickstart
📖 Documentation
💫 Project & community
📰 License

💡 Philosophy

Deploying and operating a Ceph cluster is complex because Ceph is designed to be a general-purpose storage solution. This is a significant overhead for small Ceph clusters. MicroCeph solves this by being opinionated and focused on the small scale. With MicroCeph, deploying and operating a Ceph cluster is as easy as a Snap!

🎯 Features

Quick and consistent deployment with minimal overhead
Single-command operations (for bootstrapping, adding OSDs, service enablement, etc)
Isolated from the host and upgrade-friendly
Built-in clustering so you don't have to worry about it!

⚡️Quickstart

The below commands will set you up with a testing environment on a single machine using file-backed OSDs - you'll need about 15 GiB of available space on your root drive:

sudo snap install microceph --channel quincy/edge
sudo snap refresh --hold microceph
sudo microceph cluster bootstrap
sudo microceph disk add loop,4G,3
sudo ceph status

You're done!

You can remove everything cleanly with:

sudo snap remove microceph

📖 Documentation

The documentation is found in the docs directory. It is written in RST format, built with Sphinx, and published on Read The Docs:

MicroCeph documentation

💫 Project & community

Join our online forum - Ubuntu Ceph on Matrix
Contributing guidelines
Code of conduct
File a bug

Excited about MicroCeph? Become one of our Stargazers!

📰 License

MicroCeph is free software, distributed under the AGPLv3 license (GNU Affero General Public License version 3.0). Refer to the COPYING file (the actual license) for more information.

microceph's People

Contributors

Stargazers

Watchers

microceph's Issues

GH's top page with README.md doesn't have any documentation

The current README.md is minimum at the moment.

microceph/README.md

Lines 1 to 7 in 8135a30

 # MicroCeph 

 [![Documentation Status](https://readthedocs.com/projects/canonical-microceph/badge/?version=latest)](https://canonical-microceph.readthedocs-hosted.com/en/latest/?badge=latest) 

 MicroCeph is snap-deployed Ceph with built-in clustering. 

 MicroCeph handles scaling out the monitor cluster, as well as automating placement and management of the manager and metadata processes in a Ceph cluster when units are dynamically added via easy-to-use tooling. It also helps with adding disks (OSDs) to the Ceph cluster.

It would be good to have an overview like the snapcraft page:
https://snapcraft.io/microceph
or to link to the documentation
https://canonical-microceph.readthedocs-hosted.com/en/latest/
So that visitors would know what to do with microceph.

microceph.rbd map

Hi,

Is it expected/understood that microceph.rbd map doesn't work from inside the snap package?

Even if the kernel modules are loaded externally before that is invoked.

microceph status times out

I noticed while deploying Sunbeam that microceph status times out after 30 seconds and reports an error Error: Get "http://control.socket/cluster/1.0/cluster": context deadline exceeded
Sunbeam is using microceph:edge:rev6
It times out only on multinode deployment

Incorrect SQL on tables `disks` and `services`

These tables should have an ON DELETE CASCADE so that we can remove cluster members.

Automatically update ceph.conf

Currently ceph.conf is generated on bootstrap/join but not refreshed afterwards.
This should change to have it be regenerated on startup and then every minute or so, in case the cluster has changed.

The "every minute or so" part isn't amazing as it means we need to pull the list of monitors from the database every minute or so, but this saves us from having to do reasonably complex notifications across the cluster.

Allow mgr module zabbix to run /usr/bin/zabbix_sender

The snap definition needs to be updated to allow the ceps mgr module for zabbix to be able to access the /usr/bin/zabbix_sender binary outside of the sandbox.

microceph radosgw crashes after snap refresh

The microceph radosgw services crash after a snap refresh with the following stack trace:

Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]: warning: unable to create /var/snap/microceph/483/runNo such file or directory
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]: terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:   what():  filesystem error: cannot set permissions: No such file or directory [/var/snap/microceph/483/run]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]: *** Caught signal (Aborted) **
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  in thread 7fdd0f663e40 thread_name:radosgw
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7fdd1377e520]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  2: pthread_kill()
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  3: raise()
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  4: abort()
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2bbe) [0x7fdd11eaebbe]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae24c) [0x7fdd11eba24c]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae2b7) [0x7fdd11eba2b7]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae518) [0x7fdd11eba518]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  9: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa702e) [0x7fdd11eb302e]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  10: (global_init(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_>
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  11: (radosgw_Main(int, char const**)+0x213) [0x7fdd13f0de13]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fdd13765d90]
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  13: __libc_start_main()
Jul 11 22:38:25 juju-ba47b1-default-0 microceph.rgw[328259]:  14: _start()

This is due to the rgw config file containing the current revision in the $SNAP_DATA environment variable, which is used for rendering the run dir part of the ceph mon config file (e.g. /var/snap/microceph/483/run will become /var/snap/microceph/509/run - which will no longer exist after multiple refreshes).

User interface: `--debug` and `--verbose` not informative

We have a --verbose and a --debug flag for microceph, but they're not really doing much for some commands, e.g.

ubuntu@vm-0:~$ sudo microceph disk list
Disks configured in MicroCeph:
+-----+----------+-----------------------------------------------------------+
| OSD | LOCATION |                           PATH                            |
+-----+----------+-----------------------------------------------------------+
| 0   | vm-0     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--0--disk0 |
+-----+----------+-----------------------------------------------------------+
| 1   | vm-1     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--1--disk0 |
+-----+----------+-----------------------------------------------------------+
| 2   | vm-2     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--2--disk0 |
+-----+----------+-----------------------------------------------------------+

Available unpartitioned disks on this system:
+-------+----------+------+------+
| MODEL | CAPACITY | TYPE | PATH |
+-------+----------+------+------+
ubuntu@vm-0:~$ sudo microceph disk list -v
Disks configured in MicroCeph:
+-----+----------+-----------------------------------------------------------+
| OSD | LOCATION |                           PATH                            |
+-----+----------+-----------------------------------------------------------+
| 0   | vm-0     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--0--disk0 |
+-----+----------+-----------------------------------------------------------+
| 1   | vm-1     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--1--disk0 |
+-----+----------+-----------------------------------------------------------+
| 2   | vm-2     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--2--disk0 |
+-----+----------+-----------------------------------------------------------+

Available unpartitioned disks on this system:
+-------+----------+------+------+
| MODEL | CAPACITY | TYPE | PATH |
+-------+----------+------+------+
ubuntu@vm-0:~$ sudo microceph disk list -d
Disks configured in MicroCeph:
+-----+----------+-----------------------------------------------------------+
| OSD | LOCATION |                           PATH                            |
+-----+----------+-----------------------------------------------------------+
| 0   | vm-0     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--0--disk0 |
+-----+----------+-----------------------------------------------------------+
| 1   | vm-1     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--1--disk0 |
+-----+----------+-----------------------------------------------------------+
| 2   | vm-2     | /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_lxd_vm--2--disk0 |
+-----+----------+-----------------------------------------------------------+

Available unpartitioned disks on this system:
+-------+----------+------+------+
| MODEL | CAPACITY | TYPE | PATH |
+-------+----------+------+------+

UNIQUE constraint failed: disks.osd

While adding several OSDs on two cluster members concurrently I got these errors:

Error: Failed adding new disk: Failed to record disk: Failed to create "disks" entry: UNIQUE constraint failed: disks.osd
Error: Failed adding new disk: Failed to record disk: Failed to create "disks" entry: UNIQUE constraint failed: disks.osd

Think we run into races starting here where we're computing an OSD id but only much later record it

As the whole OSD formatting is between these two steps there's ample time for nodes to race each other.

Probably could fix this by doing the id computation and INSERT in a single transaction

Transfer snap ownership to the canonical account

As MicroCeph is a company product, its ownership should be moved to the canonical account rather than be under Frode's personal account.

Make the repository public

As the snap is public now, so should this repository.

setHostFailureDomain unknown crush rule 'microceph_auto_osd'

User report:

Hey I'm getting this error very intermittently: Error: Peer "micro03" failed to join the cluster: Failed to update cluster status of services: Failed to join "MicroCeph" cluster: Failed adding new disk: Failed to set host failure domain: Failed to run: ceph osd crush rule dump microceph_auto_osd: exit status 2 (Error ENOENT: unknown crush rule 'microceph_auto_osd')

When adding several disks we possibly could run a race between setHostFailureDomain and removing the auto crush rule in updateFailureDomain, need to add a safeguard

Add usual Github actions

@simondeziel can you add the Canonical CLA and DCO actions here too?

Allow adding/remove services

This is to track the implementation of the microceph enable and microceph disable commands which should use the service API to start/stop services on the particular system. This will be interacting with #22.

Implement `microceph disk list`

microceph disk list should show a list of disks in the cluster as well as available local disks.

Improve OSD next number detection

Currently we rely on ceph osd ls to figure out the next OSD id.
The problem is that if an OSD is bootstrapping, it may not be listed yet.

Instead we should consider both ceph osd ls and our own database records to find the next unused ID.

Access MicroCeph as Storage Class for k8s and as a Kernel mount - Just like we do in Ceph Octopus /Pacifi/Quincy

Hi all,
I am new to the MicroCeph .

I need to access the MicroCeph storage by my Ubuntu Linux Clients.
I want it work as a Kernel mount just like we have that in the actual Ceph storage.
I want it to be accessible by the existing K8s Cluster as a Storage Class.

Please guide with all the commands and steps or any reference guide that will help me to do above requirements.

Thanks and Best Regards,
Mark.

delete join-token or redisplay it

There is no way to delete a pending cluster join if it fails.
There is no way to retrieve the join token if you copied it incorrectly.

ceph osd won't start after reboot - where are the logs?

I am running a 3 member ceph cluster in separate VMs, each with a 10GB block disk passed in for the OSDs.

All was well until suddenly, during the LXD ceph test suite run, all VMs hung and consumed 100% CPU.
So I had to forcefully stop them and now one of the VM's OSD won't come online.

root        1917  4.3  2.2 1342264 32472 ?       Ssl  13:10   0:16 microcephd --state-dir /var/snap/microceph/common/state
root        1922  0.0  1.8 693148 27048 ?        Ssl  13:10   0:00 ceph-mds -f --cluster ceph --id ceph1
root        1927  4.2 19.0 1193448 278760 ?      Ssl  13:10   0:15 ceph-mgr -f --cluster ceph --id ceph1
root        1935  2.8  4.3 783600 63848 ?        Ssl  13:10   0:10 ceph-mon -f --cluster ceph --id ceph1
root        1944  0.0  0.0   2888  1056 ?        Ss   13:10   0:00 /bin/sh /snap/microceph/35/commands/osd.start
root        2023  0.0  0.0   2788  1004 ?        S    13:10   0:00 sleep infinity
root        2166  0.0  0.0   8368  1012 pts/0    S    13:10   0:00 sleep infinity

The /bin/sh /snap/microceph/35/commands/osd.start sleeping for infinity is a concern.

The problem is I can't diagnose this as there doesn't appear to be any logs.
I've looked in /var/snap/microceph/common/logs but it is empty.

microceph.ceph status
  cluster:
    id:     4bb5c238-1fef-461b-8bc3-cfd06f2c6011
    health: HEALTH_WARN
            Reduced data availability: 65 pgs inactive
 
  services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 6m)
    mgr: ceph1(active, since 6m), standbys: ceph2, ceph3
    osd: 3 osds: 2 up (since 109m), 2 in (since 12m)
 
  data:
    pools:   3 pools, 65 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             65 unknown

Tries to wipe a directory - dd: failed to open '/dev/disk/by-id/': Is a directory

$ snap list microceph
Name       Version        Rev  Tracking       Publisher   Notes
microceph  0+git.6208776  220  latest/stable  canonical✓  -

By following the tutorial:
https://canonical-microceph.readthedocs-hosted.com/en/latest/tutorial/add_osds/
microceph tried to wipe a directory.

$ sudo microceph disk add /dev/vdc --wipe
Error: Failed adding new disk: Failed to wipe the device: Failed to run: dd if=/dev/zero of=/dev/disk/by-id/ bs=4M count=10 status=none: exit status 1 (dd: failed to open '/dev/disk/by-id/': Is a directory)

Docs should include information about where to find logs

Document tutorial using containers and partitions or a notice with reason why it cannot work

I'm about to explore microceph to see if and how it can create OSDs in a homelab setup with one or more lxd containers and partitions on a single laptop with a single disk. (The current microceph tutorial focus on and mentions vms + disks, which are not a possibility in this case due to resource constraints, so need to explore containers and partitions. This homelab is for learning, HA is not required nor expected).

In a separate setup I'm running a homelab with a laptop + ubuntu desktop + lxd + k8s(kubeadm) + rook + ceph, but having problems with creating OSDs. rook-discover detects the partition, but rook-ceph-osd-prepare complains.

I am wondering if the same issue I have with rook-ceph will also be evident with microceph, so this issue is shared by both microceph and rook-ceph. I wonder if it is related to how partitions are recognized and accessible to rook-ceph with snap and lxd. And I'm looking for ways to resolve the issue, so it will work with containers and partitions. Thinking solving it for either will solve for both.

rook-ceph-osd-prepare output:

exec: Running command: lsblk /dev/sda4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
Stream closed EOF for rook-ceph/rook-ceph-osd-prepare-km-lw9t2 (copy-bins)
sys: lsblk output: "SIZE=\"68719476736\" ROTA=\"0\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/sda\" NAME=\"/dev/sda4\" KNAME=\"/dev/sda4\" MOUNTPOINT=\"\" FSTYPE=\"\""
exec: Running command: udevadm info --query=property /dev/sda4
sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/pci-0000:00:11.0-ata-2.0-part4 /dev/disk/by-path/pci-0000:00:11.0-ata-2-part4\nDEVNAME=/dev/sda4\nDEVPATH=/devices/pci0000:00/0000:00:11.0/ata2/host1/target1:0:0/1:0:0:0/block/sda/sda4\nDEVTYPE=partition\nDISKSEQ=9\nID_BUS=scsi\nID_PATH=pci-0000:00:11.0-ata-2.0\nID_PATH_ATA_COMPAT=pci-0000:00:11.0-ata-2\nID_PATH_TAG=pci-0000_00_11_0-ata-2_0\nID_SCSI=1\nMAJOR=8\nMINOR=4\nPARTN=4\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=95900940776"
inventory: discovered disks are:
inventory: &{Name:sda4 Parent:sda HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:00:11.0-ata-2.0-part4 /dev/disk/by-path/pci-0000:00:11.0-ata-2-part4 Size:68719476736 UUID: Serial: Type:part Rotational:false Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/sda4 KernelName:sda4 Encrypted:false}
cephosd: creating and starting the osds
cephosd: desiredDevices are [{Name:all OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:true IsDevicePathFilter:false}]
cephosd: context.Devices are:
cephosd: &{Name:sda4 Parent:sda HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:00:11.0-ata-2.0-part4 /dev/disk/by-path/pci-0000:00:11.0-ata-2-part4 Size:68719476736 UUID: Serial: Type:part Rotational:false Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model: WWN: WWNVendorExtension: Empty:false CephVolumeData: RealPath:/dev/sda4 KernelName:sda4 Encrypted:false}
cephosd: old lsblk can't detect bluestore signature, so try to detect here
exec: Running command: udevadm info --query=property /dev/sda4
sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/pci-0000:00:11.0-ata-2.0-part4 /dev/disk/by-path/pci-0000:00:11.0-ata-2-part4\nDEVNAME=/dev/sda4\nDEVPATH=/devices/pci0000:00/0000:00:11.0/ata2/host1/target1:0:0/1:0:0:0/block/sda/sda4\nDEVTYPE=partition\nDISKSEQ=9\nID_BUS=scsi\nID_PATH=pci-0000:00:11.0-ata-2.0\nID_PATH_ATA_COMPAT=pci-0000:00:11.0-ata-2\nID_PATH_TAG=pci-0000_00_11_0-ata-2_0\nID_SCSI=1\nMAJOR=8\nMINOR=4\nPARTN=4\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=95900940776"
exec: Running command: lsblk /dev/sda4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
sys: lsblk output: "SIZE=\"68719476736\" ROTA=\"0\" RO=\"0\" TYPE=\"part\" PKNAME=\"/dev/sda\" NAME=\"/dev/sda4\" KNAME=\"/dev/sda4\" MOUNTPOINT=\"\" FSTYPE=\"\""
exec: Running command: ceph-volume inventory --format json /dev/sda4
cephosd: skipping device "sda4": ["Failed to determine if parent device is BlueStore", "Insufficient space (<5GB)"].
cephosd: configuring osd devices: {"Entries":{}}
cephosd: no new devices to configure. returning devices already configured with ceph-volume.
exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
cephosd: {}
cephosd: 0 ceph-volume lvm osd devices configured on this node
exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json
cephosd: {}
cephosd: 0 ceph-volume raw osd devices configured on this node
cephosd: skipping OSD configuration as no devices matched the storage settings for this node "km"

Context

  1. parted ...create an unformatted partition on the disk (/dev/sda4)
  2. sudo modprobe rbd
  3. lxc launch -p km images:ubuntu/22.04/cloud km
  4. lxc config device add km sda4 unix-block source=/dev/sda4
  5. ...create k8s control plane node with kubeadm + cilium cni stuff + rook-ceph csi stuff
  5. ...can also skip creating k8s, and just do ceph stuff
  6. lxc exec km bash
    apt install -y lvm2 # installed, but not sure it matters in this case
    apt install -y ceph # only installed to help with inspection
  7. ...inspect

  fdisk -x /dev/sda4
      Disk /dev/sda4: 64 GiB, 68719476736 bytes, 134217728 sectors
      Units: sectors of 1 * 512 = 512 bytes
      Sector size (logical/physical): 512 bytes / 512 bytes
      I/O size (minimum/optimal): 512 bytes / 512 bytes
  lsblk /dev/sda4
      NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
      sda4   8:4    0  64G  0 part 
  df -h | grep sda4
      /dev/sda2        62G   20G   39G  34% /dev/sda4
  udevadm info --query=property /dev/sda4 | grep sda4
      DEVPATH=/devices/pci0000:00/0000:00:11.0/ata2/host1/target1:0:0/1:0:0:0/block/sda/sda4
      DEVNAME=/dev/sda4
  ceph-volume inventory --format json /dev/sda4
      {"path": "/dev/sda4", "sys_api": {}, "ceph_device": false, "lsm_data": {}, "available": false, "rejected_reasons": ["Failed to determine if parent device is BlueStore", "Insufficient space (<5GB)"], "device_id": "", "lvs": []}

large log file size in /var/snap/microceph/common/logs/

how do I keep the logs from growing large in /var/snap/microceph/common/logs/?

is it configurable? how do I rotate them if so?

empty path of /dev/disk/by-id/ for virtio drives

Follow-up of #108

$ snap list microcloud microceph
Name        Version        Rev  Tracking     Publisher   Notes
microceph   0+git.38a6bb6  289  latest/edge  canonical✓  -
microcloud  0+git.445d39a  264  latest/edge  canonical✓  -

When a system has virtio drives, the following table will be shown in microcloud init.

$ sudo microcloud init
Please choose the address MicroCloud will be listening on [default=192.168.122.68]: 
Scanning for eligible servers...
Press enter to end scanning for servers
 Found "microcloud-2" at "192.168.122.143"
 Found "microcloud-3" at "192.168.122.145"

Ending scan
Would you like to setup local storage? (yes/no) [default=yes]: 
Select exactly one disk from each cluster member:
Space to select; Enter to confirm; Esc to exit; Type to filter results.
Up/Down to move; Right to select all; Left to select none.
       +--------------+---------------+-----------+--------+-------------------------------------------------+
       |   LOCATION   |     MODEL     | CAPACITY  |  TYPE  |                      PATH                       |
       +--------------+---------------+-----------+--------+-------------------------------------------------+
> [ ]  | microcloud-1 | QEMU HARDDISK | 32.00GiB  | scsi   | /dev/disk/by-id/scsi-SATA_QEMU_HARDDISK_QM00001 |
  [ ]  | microcloud-1 |               | 32.00GiB  | virtio | /dev/disk/by-id/                                |
  [ ]  | microcloud-1 |               | 372.00KiB | virtio | /dev/disk/by-id/                                |
  [ ]  | microcloud-2 | QEMU HARDDISK | 32.00GiB  | scsi   | /dev/disk/by-id/scsi-SATA_QEMU_HARDDISK_QM00001 |
  [ ]  | microcloud-2 |               | 32.00GiB  | virtio | /dev/disk/by-id/                                |
  [ ]  | microcloud-2 |               | 372.00KiB | virtio | /dev/disk/by-id/                                |
  [ ]  | microcloud-3 | QEMU HARDDISK | 32.00GiB  | scsi   | /dev/disk/by-id/scsi-SATA_QEMU_HARDDISK_QM00001 |
  [ ]  | microcloud-3 |               | 32.00GiB  | virtio | /dev/disk/by-id/                                |
  [ ]  | microcloud-3 |               | 372.00KiB | virtio | /dev/disk/by-id/                                |
       +--------------+---------------+-----------+--------+-------------------------------------------------+


Would you like to setup distributed storage? (yes/no) [default=yes]: 
Select from the available unpartitioned disks:

Space to select; Enter to confirm; Esc to exit; Type to filter results.
Up/Down to move; Right to select all; Left to select none.
       +--------------+-------+-----------+--------+------------------+
       |   LOCATION   | MODEL | CAPACITY  |  TYPE  |       PATH       |
       +--------------+-------+-----------+--------+------------------+
  [x]  | microcloud-1 |       | 32.00GiB  | virtio | /dev/disk/by-id/ |
  [ ]  | microcloud-1 |       | 372.00KiB | virtio | /dev/disk/by-id/ |
  [x]  | microcloud-2 |       | 32.00GiB  | virtio | /dev/disk/by-id/ |
  [ ]  | microcloud-2 |       | 372.00KiB | virtio | /dev/disk/by-id/ |
> [x]  | microcloud-3 |       | 32.00GiB  | virtio | /dev/disk/by-id/ |
  [ ]  | microcloud-3 |       | 372.00KiB | virtio | /dev/disk/by-id/ |
       +--------------+-------+-----------+--------+------------------+

Automatically re-assign services on server removal

When a server is removed from the cluster with microceph cluster remove, its roles should be re-assigned to other servers to avoid degrading the cluster.

This will most likely involved the removal hook and the leader then calling the services API on the new servers to have them bring up the needed services. Depends on #23

Docs should reassure users when things look unexpected in the tutorial

the HEALTH_WARN message is to be expected in the intiial-setup tutorial and it should disappear once the OSD's are up.

Add rgw support

Should be a prime example for microceph enable and microceph disable.

Better way to spawn OSD processes

Currently the OSD processes are spawned by a looping shell script which scans for listed OSDs every 5 seconds and starts them as needed.

If we can have snapd provide per-instance systemd units, that'd be a far better way of doing things.

Setting custom peer name breaks MicroCeph

Setting a non-default (hostname) peer name when beginning a new cluster fails. I think this is because we still use the hostname in some places.

[RFE] mgr nfs module?

Would it be possible to add the nfs manager module? I think such a simple NFS service is still in microceph scope.

Error adding partition as an OSD

I'm trying to add a partition from one of my disks to microceph.

I tried with ext4 fromat, and unformated.

microceph disk add /dev/sda3

Error: Failed adding new disk: Failed to bootstrap OSD: Failed to run: ceph-osd --mkfs --no-mon-config -i 0: exit status 250 (2022-12-12T18:43:58.421+0000 7fc07269a5c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
2022-12-12T18:43:58.421+0000 7fc07269a5c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
2022-12-12T18:43:58.421+0000 7fc07269a5c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied)

I tried to run the command with sudo and as root but nothing worked and /var/lib/ceph/osd/ceph-0/block(neither /var/lib/ceph/) do not exist (shouldn't it be /var/snap/microceph/common/data/osd/ceph-0 ?)

Microceph disk add error

When attempting to add disks to a new cluster or an existing one without any disks I receive the following error messages:

Error: Failed adding new disk: Failed to bootstrap OSD: Failed to run: ceph-osd --mkfs --no-mon-config -i 0: exit status 250 (2023-01-25T10:57:28.939-0500 ffff8c68b020 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db_and_around failed to load os-type: (2) No such file or directory
2023-01-25T10:57:28.939-0500 ffff8c68b020 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs fsck found fatal error: (2) No such file or directory
2023-01-25T10:57:28.939-0500 ffff8c68b020 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2023-01-25T10:57:28.939-0500 ffff8c68b020 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory)

Here is a screenshot:

I have confirmed the location /var/lib/ceph does not exist.

Implement the `status` command

The status command should show a brief overview of the cluster.
Basically, a short list of the servers, their IP address, service list and disk count.

Is it possible to add support for using a disk partition or loop device (rather than a whole disk) as an osd

There seems to be some restriction in apparmor that only allows very specific whole disks as osd.

cluster sql tables should be documented

microceph cluster sql command provides no feedback as to what the sql structure is. This should be documented or document a way for users to query the structure.

Setup automated snap builds

Once the repo is public, we should setup Launchpad to automatically make and publish builds to the edge channel.

ceph daemon ... commands do not work

In a microceph installation, ceph daemon commands do not work.

For eg.

root@demonax:/home/ubuntu# ps -efa | grep osd
root       34627       1  0 Jun02 ?        00:00:00 /bin/sh /snap/microceph/338/commands/osd.start
root       36645       1  0 Jun02 ?        02:15:10 ceph-osd --cluster ceph --id 0
root       38275       1  0 Jun02 ?        02:17:26 ceph-osd --cluster ceph --id 1
root       39929       1  0 Jun02 ?        02:16:48 ceph-osd --cluster ceph --id 2
root      195108  195052  0 09:23 pts/1    00:00:00 grep --color=auto osd
root@demonax:/home/ubuntu# ceph daemon osd.0 help
Can't get admin socket path: "ceph-conf" not found

Is the issue that its looking for ceph.conf in /etc while it actually is in /var/snap/microceph/current/conf ?

Document any LXD setup requirements to run the tutorial

Three Node Cluster Losing Quorum

We've run into this issue twice now, where somehow a three node microceph cluster when faced with a temporary failure of the master node somehow gets into a state where there are no active masters and everything more or less locks up and becomes unavailable.

We haven't yet been able to fully isolate or reproduce the issue but we did grab a raft/dqlite trace from when we restarted the former master node, I've attached it below.

Cluster instability issues are critical blockers to us being able to use this in production and we think this is a huge shame because see we the massive potential here.

microcephtrace.txt

OOM will be invoked with the documented steps

The installation tutorials uses the default VM size of LXD, which is translated to 1 CPU core and 1GB of memory if I'm not mistaken.
https://github.com/canonical/microceph/blob/38a6bb606617ec00355912af2aeef4d6e6520ffe/docs/tutorial/install.rst#setup-vms

This leads to OOM after some time even with an idle state. When this OOM happens, the system gets into a very tricky state to diagnose especially when microceph is used for other components such as microcloud with LXD and dqlite.

Mar 27 21:48:34 microceph-3 kernel: [29726.347384] systemd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 27 21:48:34 microceph-3 kernel: [29726.347450]  oom_kill_process.cold+0xb/0x10
Mar 27 21:48:34 microceph-3 kernel: [29726.347615] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Mar 27 21:48:34 microceph-3 kernel: [29726.347697] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/snap.microceph.mgr.service,task=ceph-mgr,pid=2968,uid=0
Mar 27 21:48:34 microceph-3 kernel: [29726.347724] Out of memory: Killed process 2968 (ceph-mgr) total-vm:1002128kB, anon-rss:250092kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:752kB oom_score_adj:0
Mar 27 21:48:34 microceph-3 kernel: [29728.401021] oom_reaper: reaped process 2968 (ceph-mgr), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Mar 27 21:48:35 microceph-3 kernel: [29729.682899] oom_reaper: reaped process 4956 (snapd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Control Group                                                       Tasks   %CPU   Memory  Input/s Output/s
/                                                                     367    4.9   899.0M       0B   113.2K
system.slice                                                          221    6.9   657.6M       0B    18.0K
system.slice/snap.microceph.mgr.service                                70    0.3   274.1M       0B     1.3K
system.slice/snap.microceph.mon.service                                24    1.5   117.9M       0B    15.9K
system.slice/snap.microceph.daemon.service                             13    2.4    37.6M       0B     711B
system.slice/snap.microovn.switch.service                               9    1.3    35.6M        -        -
system.slice/snap.microovn.daemon.service                              13    0.2    31.4M        -        -
system.slice/lxd-agent.service                                         11    0.5    30.5M        -        -
system.slice/snapd.service                                             10      -    21.2M        -        -
system.slice/snap.microcloud.daemon.service                            13    0.2    19.9M        -        -
system.slice/snap.microceph.mds.service                                15    0.0    19.2M        -        -
system.slice/networkd-dispatcher.service                                1      -     8.8M        -        -
system.slice/systemd-udevd.service                                      1      -     8.2M        -        -
system.slice/unattended-upgrades.service                                2      -     8.2M        -        -
system.slice/snap.microovn.central.service                             11    0.4     7.5M        -        -
init.scope                                                              1      -     5.5M        -        -

Implement `microceph disk remove`

microceph disk remove should be offered as a way to remove an OSD from Ceph and wipe the drive and its configuration.

Module 'devicehealth' has failed: disk I/O error

Since #67 was fixed, I'm starting to see these errors:

microceph.ceph -s
  cluster:
    id:     016b1f4a-bbe5-4c6a-aa66-64a5ad9fce7f
    health: HEALTH_ERR
            Module 'devicehealth' has failed: disk I/O error
 
  services:
    mon: 3 daemons, quorum v1,v2,v3 (age 19s)
    mgr: v1(active, since 8s), standbys: v3, v2
    osd: 3 osds: 3 up (since 13s), 3 in (since 6m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 1 objects, 0 B
    usage:   17 MiB used, 30 GiB / 30 GiB avail
    pgs:     1 active+clean

I'm using the demo setup described here:

#67 (comment)

Can you help me diagnose this please?

Thanks

dpkg-dev seems to be missing

Building the snap errored with

...
patching file src/ceph-volume/ceph_volume/util/system.py
'microceph' has dependencies that need to be staged: ceph-volume
Skipping pull ceph-volume (already ran)
Building ceph-volume 
/bin/sh: 32: dpkg-source: not found
Failed to run 'override-build': Exit code was 127.
Run the same command again with --debug to shell into the environment if you wish to introspect this failure.

Add support for LUKS encryption

It would be great to have a story for encryption at rest.

Ideally, we'd want the ability to provide a single passphrase or key file to then unlock all the OSDs.
OSDs probably still ought to be individually encrypted with individual keys.

I'd suggest using one small LUKS volume that contains key files for each of the OSDs.
After system reboot, the user would need to do microceph unlock which would prompt for the passphrase (or key file) used to unlock that small LUKS volume. At that point, the OSD startup script will be able to read the keys for each of the OSDs from that volume and unlock them individually.

This makes it easy to change the key without having to alter each of the OSDs.

A microceph lock command could be added which would unmount and close the small LUKS volume.
Doing so would prevent any access to the individual OSD keys, which would in turn prevent the addition of additional OSDs but also prevent crashed OSD daemons from respawning (so a bit of a double edged sword).

VM creation fails when using microcloud with microceph

I have a microcloud cluster with 3 nodes using microceph with a disk on each machine.

When I try to launch a VM that fails with:

root@juju-4617ca-microcloud-4:~# lxc launch ubuntu:22.04 --vm
Creating the instance
Retrieving image: Unpack: 100% (494.55MB/s)
Instance name is: rational-humpback
Starting rational-humpback
Error: Failed setting up device via monitor: Failed adding block device for disk device "root": Failed adding block device: error reading conf file /etc/ceph/ceph.conf: Permission denied
Try `lxc info --show-log local:rational-humpback` for more info
root@juju-4617ca-microcloud-4:~# lxc info --show-log local:rational-humpback
Name: rational-humpback
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Location: juju-4617ca-microcloud-5
Created: 2023/02/14 16:51 UTC

Log:

warning: tap: open vhost char device failed: Permission denied
warning: tap: open vhost char device failed: Permission denied

snap versions:

root@juju-4617ca-microcloud-4:~# snap list | egrep "micro|lxd"
lxd         5.0.2-838e1b2  24322  5.0/stable/…   canonical**  -
microceph   0+git.00fe8d8  120    latest/stable  canonical**  -
microcloud  0+git.d78a41a  70     latest/stable  canonical**  -

Implement `init` command

The goal behind the init command is to avoid having the user directly deal with cluster bootstrap and cluster join.

Instead init should provide an interactive workflow where it asks the user whether to create a new cluster or join an existing one.
When creating a new one, it should ask if the user wants to add additional servers to it and if so, issue join tokens.

Once the initial clustering part handled, it should scan available disks and ask the user which should be added as OSDs.

Allow for custom service list at join time

Allow for a list of services to be passed to microceph cluster join which would override the default logic which just checks what's currently deployed and deploys until we reach 3 mon/mgr/mds.

This will need decoupling the service spawning from the joining stage as we can't feed data to the join hook.
Basically we'll need to change both microceph cluster join and microceph init (if implemented already, see #21) to make the join process use two separate client calls.

Document backup/recovery procedures

Document how to backup the critical pieces so that you are able to restore a microceph instance back into a running cluster. Or how to remove it and rejoin etc.

Bootstrap fails on a host with ceph-common installed

Due to using the symlink layout in the snap for /etc/ceph, /var/log/ceph, etc, bootstrapping fails after installing the ceph-common package because these locations aren't empty on the host system.

root@mceph:~# microceph cluster bootstrap
cannot update snap namespace: cannot create symlink in "/etc/ceph": existing file in the way
snap-update-ns failed with code 1

RFE: non-interactive `microceph init`

For automation it would be desirable to have a method of initializing a cluster non-interactively, e.g. by supplying all necessary info as cli args to microceph init

	# MicroCeph

	[![Documentation Status](https://readthedocs.com/projects/canonical-microceph/badge/?version=latest)](https://canonical-microceph.readthedocs-hosted.com/en/latest/?badge=latest)

	MicroCeph is snap-deployed Ceph with built-in clustering.

	MicroCeph handles scaling out the monitor cluster, as well as automating placement and management of the manager and metadata processes in a Ceph cluster when units are dynamically added via easy-to-use tooling. It also helps with adding disks (OSDs) to the Ceph cluster.