Giter Club home page Giter Club logo

sesdev's People

Contributors

bk201 avatar colder-is-better avatar conan-kudo avatar dcermak avatar imthekai avatar jan--f avatar jecluis avatar kshtsk avatar lenzgr avatar m-ildefons avatar mgfritch avatar p-se avatar ricardoasmarques avatar rjfd avatar schoolguy avatar sebastian-philipp avatar smithfarm avatar toabctl avatar trociny avatar tserong avatar votdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sesdev's Issues

bug: provision destination needs trailing slash if directories are copied

when invoking sesdev with:

sesdev create ses7 --single-node --qa-test test_ses7

it will fail on the provisioning (file) phase with:

DEBUG ssh: Uploading: $sesdev_path_to_qa to /home/vagrant/sesdev-qa
DEBUG ssh: Re-using SSH connection.
ERROR warden: Error occurred: scp: error: unexpected filename: .

This is probably due to how the destination key in the Vagrantfile is implemented.

node.vm.provision "file", source: "{{ sesdev_path_to_qa }}", destination: "/home/vagrant/sesdev-qa" <- no trailing slash will look for a . in sesdev_path_to_qa

Adding the trailing slash actually fixes this issue by copying the entire directory sesdev_path_to_qa under /home/vagrant/sesdev-qa/

-> /home/vagrant/sesdev-qa/qa/$content_of_qa_dir

feature request: "sesdev make-check TARGET"

The overall CI effort includes regularly running "make check" for the various products/environments (e.g. SES5 on SLE-15-SP3, Octopus on openSUSE Leap 15.2, etc.)

Implement a "sesdev make-check TARGET" command which would:

  • deploy a single node with 16 GB of memory
  • add all the repos needed to run "make check" on that target
  • install sudo package
  • create a normal user and give it passwordless sudo privileges
  • as that normal user:
    • clone a ceph repo/branch (each target would have a default repo/branch, which could be overrided by supplying --repo and/or --branch options)
    • run install-deps.sh, and if that succeeds:
    • run run-make-check.sh

`--memory` parameter cannot be used due to missing cgroup feature

Podmans --memory parameter is used in monitoring component containers. This fails with

Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.

While removing the --memory parameter off the run files works, it's tedious to do so and can be avoided by adding cgroup_enable=memory swapaccount=1 to the kernel boot parameters variable GRUB_CMDLINE_LINUX in the /etc/default/grub file, followed by calling update-bootloader and a reboot of the VM.

It may also be noteworthy that I do not have a swap file configured on my host system but sufficient memory, not sure though if that's related.

Allow user to override default image (registry) paths via "~/.sesdev/config.yaml"

The default registry path(s) to container image(s) are hardcoded. They can be overrided via a command-line option.

Currently octopus deployments only need a single registry/image path because only one container image is used. However, in the future, multiple images might be needed.

Users might conceivably need to point sesdev (by default) at a non-standard image path, or set of image paths. Providing these on the command line each time sesdev is run would be tedious.

Two-node cluster gets only two disks per storage node, regardless of number of storage nodes

The following command unexpectedly brings up a cluster with only two OSDs:

$ sesdev create nautilus --roles "[admin, storage, mon, mgr, mds], [client]" cephfuse 
=== Creating deployment with the following configuration ===
Deployment VMs:
  -- admin:
     - OS:               leap-15.1
     - ses_version:      nautilus
     - deployment_tool:  deepsea
     - roles:            ['admin', 'storage', 'mon', 'mgr', 'mds']
     - fqdn:             admin.cephfuse.com
     - public_address:   10.20.24.200
     - cpus:             2
     - ram:              4G
     - storage_disks:    2
       - /dev/vdb        8G
       - /dev/vdc        8G

  -- node1:
     - OS:               leap-15.1
     - ses_version:      nautilus
     - roles:            ['client']
     - fqdn:             node1.cephfuse.com
     - public_address:   10.20.24.201
     - cpus:             2
     - ram:              4G
     - custom_repos:
       - https://download.opensuse.org/repositories/filesystems:/ceph:/nautilus:/test

(See where it says - storage_disks: 2)

Since there is only one node with role storage, that node should get four disks.

ses5 deployment fails due to NTP issue

How to reproduce:

sesdev create ses5 ses5_test1

DeepSea Stage 3 fails with:

admin: Failures summary:
    admin: ceph.time (/srv/salt/ceph/time):
    admin:   node3.ses5.com:
    admin:     start ntp: The named service ntpd is not available
    admin:     sync time: Command "sntp -S -c admin.ses5.com" run
    admin:         stdout: sntp [email protected] Wed Mar 13 12:24:27 UTC 2019 (1)
    admin:         stderr: sock_cb: 10.20.197.200 not in sync, skipping this server
    admin:   node1.ses5.com:
    admin:     start ntp: The named service ntpd is not available
    admin:     sync time: Command "sntp -S -c admin.ses5.com" run
    admin:         stdout: sntp [email protected] Wed Mar 13 12:24:27 UTC 2019 (1)
    admin:         stderr: sock_cb: 10.20.197.200 not in sync, skipping this server
    admin:   node2.ses5.com:
    admin:     start ntp: The named service ntpd is not available
    admin:     sync time: Command "sntp -S -c admin.ses5.com" run
    admin:         stdout: sntp [email protected] Wed Mar 13 12:24:27 UTC 2019 (1)
    admin:         stderr: sock_cb: 10.20.197.200 not in sync, skipping this server
Command '['vagrant', 'up']' failed: ret=1 stderr:

sesdev create --help does not list available optional arguments

% sesdev create --help 
Usage: sesdev create [OPTIONS] COMMAND [ARGS]...                                         

  Creates a new Vagrant based SES cluster.

  It creates a deployment directory in <working_directory>/<deployment_id>
  with a Vagrantfile inside, and calls `vagrant up` to start the deployment.

  By default <working_directory> is located in `~/.sesdev`.

  Checks all the options available with:

  $ sesdev create --help

Options:
  --help  Show this message and exit.

Commands:
  nautilus  Creates a Ceph Nautilus cluster using openSUSE Leap 15.1 and...
  octopus   Creates a Ceph Octopus cluster using openSUSE Leap 15.2 and...
  ses5      Creates a SES5 cluster using SLES-12-SP3
  ses6      Creates a SES6 cluster using SLES-15-SP1
  ses7      Creates a SES7 cluster using SLES-15-SP2

Separate qa run from deployment

For CI, it's useful to be able to first do the deployment and then do the QA run. With that, it's easier to see what actually fails.

oS Leap 15.2: podman: podman: undefined symbol: g_date_copy

via sesdev octopus

node1:/var/log/ceph # cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.2 Alpha"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.2"
PRETTY_NAME="openSUSE Leap 15.2 Alpha"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.2"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
node1:/var/log/ceph # cat cephadm.log 
INFO:root:Cluster fsid: e4da5eea-48e5-11ea-ac19-52540035718e
DEBUG:cephadm:Acquiring lock 139682512336824 on /run/cephadm/e4da5eea-48e5-11ea-ac19-52540035718e.lock
DEBUG:cephadm:Lock 139682512336824 acquired on /run/cephadm/e4da5eea-48e5-11ea-ac19-52540035718e.lock
INFO:cephadm:Verifying IP 10.20.57.201 port 3300 ...
INFO:cephadm:Verifying IP 10.20.57.201 port 6789 ...
DEBUG:cephadm:Final addrv is [v2:10.20.57.201:3300,v1:10.20.57.201:6789]
INFO:cephadm:Pulling latest ceph/daemon-base:latest-master-devel container...
DEBUG:cephadm:Running command: /usr/bin/podman pull ceph/daemon-base:latest-master-devel
DEBUG:cephadm:/usr/bin/podman:stderr /usr/bin/podman: symbol lookup error: /usr/lib64/libgobject-2.0.so.0: undefined symbol: g_date_copy
INFO:cephadm:Non-zero exit code 127 from /usr/bin/podman pull ceph/daemon-base:latest-master-devel
INFO:cephadm:/usr/bin/podman:stderr /usr/bin/podman: symbol lookup error: /usr/lib64/libgobject-2.0.so.0: undefined symbol: g_date_copy
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 2811, in <module>
    r = args.func()
  File "/usr/sbin/cephadm", line 1654, in command_bootstrap
    call_throws([container_path, 'pull', args.image])
  File "/usr/sbin/cephadm", line 491, in call_throws
    raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/podman pull ceph/daemon-base:latest-master-devel
DEBUG:cephadm:Releasing lock 139682512336824 on /run/cephadm/e4da5eea-48e5-11ea-ac19-52540035718e.lock
DEBUG:cephadm:Lock 139682512336824 released on /run/cephadm/e4da5eea-48e5-11ea-ac19-52540035718e.lock
node1:/var/log/ceph # zypper if podman
Loading repository data...
Reading installed packages...


Information for package podman:
-------------------------------
Repository     : Main Repository                                                      
Name           : podman                                                               
Version        : 1.4.4-lp152.2.10                                                     
Arch           : x86_64                                                               
Vendor         : openSUSE                                                             
Installed Size : 103.2 MiB                                                            
Installed      : Yes (automatically)                                                  
Status         : up-to-date                                                           
Source package : podman-1.4.4-lp152.2.10.src                                          
Summary        : Daemon-less container engine for managing containers, pods and images
Description    :                                                                      
    Podman is a container engine for managing pods, containers, and container
    images.
    It is a standalone tool and it directly manipulates containers without the need
    of a container engine daemon.
    Podman is able to interact with container images create in buildah, cri-o, and
    skopeo, as they all share the same datastore backend.

node1:/var/log/ceph # zypper if libgobject-2_0-0
Loading repository data...
Reading installed packages...


Information for package libgobject-2_0-0:
-----------------------------------------
Repository     : Main Repository                
Name           : libgobject-2_0-0               
Version        : 2.62.4-lp152.1.1               
Arch           : x86_64                         
Vendor         : openSUSE                       
Installed Size : 368.8 KiB                      
Installed      : Yes (automatically)            
Status         : up-to-date                     
Source package : glib2-2.62.4-lp152.1.1.src     
Summary        : Object-Oriented Framework for C
Description    :                                
    GLib is a general-purpose utility library, which provides many useful
    data types, macros, type conversions, string utilities, file utilities,
    a main loop abstraction, and so on.

    The GObject library provides an object-oriented framework for C.

My question is, where does it come from?

  • podman dependencies?
  • problem with the Leap repositories?

ceph-bootstrap deployment: salt cluster not ready after restarting salt-master.service

This is biting me quite often:

    admin: ++ chown -R salt:salt /srv/pillar
    admin: ++ systemctl restart salt-master
    admin: ++ sleep 5
    admin: ++ salt '*' saltutil.pillar_refresh
    admin: ERROR: Minions returned with non-zero exit code
    admin: admin.mini_ses7.com:
    admin:     Minion did not return. [No response]
Command '['vagrant', 'up']' failed: ret=1 stderr:
b'The SSH command responded with a non-zero exit status. Vagrant\nassumes that this means the command failed. The output for this command\nshould be in the log above. Please read the output to determine what\nwent wrong.\n'

In teuthology, we deal with this by running test.ping in a loop until all the minions start to respond - see https://github.com/SUSE/ceph/blob/ses6-downstream-commits/qa/tasks/salt_manager.py#L74-L90

Will implement something similar for sesdev.

"sesdev create" returns 0 even when "vagrant up" reports that the command failed

    admin: total nodes actual/expected:  1/1
    admin: MON nodes actual/expected:    1/1
    admin: MGR nodes actual/expected:    1/1
    admin: OSD nodes actual/expected:    0/1
    admin: total OSDs actual/expected:   0/4
    admin: Actual number of nodes/node types/OSDs differs from expected number
    admin: 
    admin: Overall result: NOT_OK (error )
    admin: + local actual_osds=0
    admin: + set +x
Command '['vagrant', 'up']' failed: ret=1 stderr:
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

(venv) smithfarm@vanguard1:~/sesdev> echo $?
0

deepsea: command not found

When installing a cluster using
VAGRANT_DEFAULT_PROVIDER=libvirt sesdev create octopus --roles="[admin, mon, mgr], [storage, mon, mgr, mds], [storage, mon, mds]" --use-deepsea --num-disks=4 --disk-size=10 octopus (the example in on the --help screen), the installations fails (actually pretends to have succeeded) due to the deepsea command being missing.

admin:~ # zypper ref
Repository 'octopus-repo' is up to date.                                                                                                                       
Repository 'Non-OSS Repository' is up to date.                                                                                                                 
Repository 'Main Repository' is up to date.                                                                                                                    
Repository 'Main Update Repository' is up to date.                                                                                                             
Repository 'Update Repository (Non-Oss)' is up to date.                                                                                                        
All repositories have been refreshed.
admin:~ # zypper se deepsea
Loading repository data...
Warning: Repository 'Main Update Repository' appears to be outdated. Consider using a different mirror or server.
Reading installed packages...
No matching items found.
admin:~ # zypper lr -u
Repository priorities in effect:                                                                                               (See 'zypper lr -P' for details)
      98 (raised priority)  :  1 repository  
      99 (default priority) :  4 repositories

#  | Alias                     | Name                               | Enabled | GPG Check | Refresh | URI                                                                                              
---+---------------------------+------------------------------------+---------+-----------+---------+--------------------------------------------------------------------------------------------------
 1 | octopus-repo              | octopus-repo                       | Yes     | (r ) Yes  | No      | https://download.opensuse.org/repositories/filesystems:/ceph:/master:/upstream/openSUSE_Leap_15.2
 2 | repo-debug                | Debug Repository                   | No      | ----      | ----    | http://download.opensuse.org/debug/distribution/leap/15.2/repo/oss/                              
 3 | repo-debug-non-oss        | Debug Repository (Non-OSS)         | No      | ----      | ----    | http://download.opensuse.org/debug/distribution/leap/15.2/repo/non-oss/                          
 4 | repo-debug-update         | Update Repository (Debug)          | No      | ----      | ----    | http://download.opensuse.org/debug/update/leap/15.2/oss/                                         
 5 | repo-debug-update-non-oss | Update Repository (Debug, Non-OSS) | No      | ----      | ----    | http://download.opensuse.org/debug/update/leap/15.2/non-oss/                                     
 6 | repo-non-oss              | Non-OSS Repository                 | Yes     | (r ) Yes  | No      | http://download.opensuse.org/distribution/leap/15.2/repo/non-oss/                                
 7 | repo-oss                  | Main Repository                    | Yes     | (r ) Yes  | No      | http://download.opensuse.org/distribution/leap/15.2/repo/oss/                                    
 8 | repo-source               | Source Repository                  | No      | ----      | ----    | http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/                             
 9 | repo-source-non-oss       | Source Repository (Non-OSS)        | No      | ----      | ----    | http://download.opensuse.org/source/distribution/leap/15.2/repo/non-oss/                         
10 | repo-update               | Main Update Repository             | Yes     | (r ) Yes  | No      | http://download.opensuse.org/update/leap/15.2/oss/                                               
11 | repo-update-non-oss       | Update Repository (Non-Oss)        | Yes     | (r ) Yes  | No      | http://download.opensuse.org/update/leap/15.2/non-oss/                                           
admin:~ # 

sesdev.log
Revision 362328c

saltutil.pillar_refresh fails

    admin: ++ cat
    admin: ++ chown -R salt:salt /srv/pillar
    admin: ++ systemctl restart salt-master
    admin: ++ sleep 5
    admin: ++ salt '*' saltutil.pillar_refresh
    admin: ERROR: Minions returned with non-zero exit code
    admin: admin.two_node_ses7.com:
    admin:     Minion did not return. [No response]
    admin: node1.two_node_ses7.com:
    admin:     Minion did not return. [No response]
    admin: ++ sleep 2
    admin: ++ ceph-bootstrap config /Cluster/Minions add admin.two_node_ses7.com
    admin: /tmp/vagrant-shell: line 95: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Cluster/Minions add node1.two_node_ses7.com
    admin: /tmp/vagrant-shell: line 96: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /SSH/ generate
    admin: /tmp/vagrant-shell: line 98: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Containers/Images/ceph set registry.suse.de/devel/storage/7.0/cr/images/ses/7/ceph/ceph
    admin: /tmp/vagrant-shell: line 99: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Time_Server/Server_Hostname set admin.two_node_ses7.com
    admin: /tmp/vagrant-shell: line 100: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Time_Server/External_Servers add 0.pt.pool.ntp.org
    admin: /tmp/vagrant-shell: line 101: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Deployment/Mon enable
    admin: /tmp/vagrant-shell: line 102: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Deployment/Mgr enable
    admin: /tmp/vagrant-shell: line 103: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Deployment/OSD enable
    admin: /tmp/vagrant-shell: line 105: ceph-bootstrap: command not found
    admin: +++ seq 1 3
    admin: +++ awk 'BEGIN{i=0; printf("[");}{if (i>0){printf(", ")}; i++; printf("\"/dev/vd%c\"", $1 + 97)}END{printf("]")}'
    admin: ++ DEV_LIST='["/dev/vdb", "/dev/vdc", "/dev/vdd"]'
    admin: ++ ceph-bootstrap config /Storage/Drive_Groups add 'value={"host_pattern": "node1*", "data_devices": { "paths": ["/dev/vdb", "/dev/vdc", "/dev/vdd"] }}'
    admin: /tmp/vagrant-shell: line 109: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Deployment/Dashboard/username set admin
    admin: /tmp/vagrant-shell: line 111: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config /Deployment/Dashboard/password set admin
    admin: /tmp/vagrant-shell: line 112: ceph-bootstrap: command not found
    admin: ++ ceph-bootstrap config ls
    admin: /tmp/vagrant-shell: line 114: ceph-bootstrap: command not found
    admin: ++ salt -G ceph-salt:member state.apply ceph-salt
    admin: ERROR: No return received
    admin: No minions matched the target. No command was sent, no jid was assigned.
Command '['vagrant', 'up']' failed: ret=1 stderr:
b"==> admin: An error occurred. The error will be shown after all tasks complete.\nAn error occurred while executing multiple actions in parallel.\nAny errors that occurred are shown below.\n\nAn error occurred while executing the action on the 'admin'\nmachine. Please handle this error then try again:\n\nThe SSH command responded with a non-zero exit status. Vagrant\nassumes that this means the command failed. The output for this command\nshould be in the log above. Please read the output to determine what\nwent wrong.\n"

a few seconds later, I did this manually:

(venv) ➜  two_node_ses7 vagrant ssh admin
vagrant@admin:~> sudo bash
admin:/home/vagrant # salt '*' test.ping
admin.two_node_ses7.com:
    True
node1.two_node_ses7.com:
    True
admin:/home/vagrant # salt '*' saltutil.pillar_refresh
admin.two_node_ses7.com:
    True
node1.two_node_ses7.com:
    True
admin:/home/vagrant # 

`

Before running ceph-bootstrap deploy, print versions of ceph-bootstrap and cephadm

The ceph-bootstrap deploy operation will definitely install the cephadm package, so sesdev can do that immediately prior to running ceph-bootstrap deploy, and then it can:

  1. rpm -q cephadm
  2. print version of ceph-bootstrap that is installed (depending on how ceph-bootstrap was installed, whether by source or RPM)

Rationale: deployment can fail due to ceph-bootstrap/cephadm version mismatch, and we need a way to debug this when it occurs, and possibly have sesdev stop and refuse to continue for version combinations known to be incompatible.

Not possible to set higher priority on repos defined via "version_os_repo_mapping"

This problem came to my attention when I tried to achieve the correct set of repos for the ceph in Devel:Storage:7.0:CR. Since this contains an "unadulterated" upstream build, the RPM version number is merely 15.1.0, which is lower than the 15.1.0.1521 that is on the SUSE:SLE-15-SP2:Update:Products:SES7 "Media1" image, even though the upstream build is newer. As a result ceph-salt was installing cephadm 15.1.0.1521 - because that's what zypper was evaluating as the newest version available in the repos.

The fix would be to allow the user to set an explicit zypper priority on a per-repo basis.

For example, like this:

version_os_repo_mapping:
    ses7:
        sles-15-sp2:
            - 'http://download.suse.de/ibs/SUSE:/SLE-15-SP2:/Update:/Products:/SES7/images/repo/SUSE-Enterprise-Storage-7-POOL-x86_64-Media1/'
            - '96!http://download.suse.de/ibs/Devel:/Storage:/7.0:/CR/SLE_15_SP2/'

In my mind, it makes sense to include the priority inside the repo URL string using a "magic prefix" (like 96!, meaning "set repo priority to 96") for two reasons:

  1. this is the same convention that teuthology's install task uses
  2. it doesn't make the config.yaml structure any more complicated than it already is

"sesdev create" sometimes hangs

Sometimes when during the provision phase of vagrant up (ran by sesdev create) when some bash command fails (probably returning a retcode != 0) sesdev does not return to the shell keeps running without doing nothing.

Last time it happened was when running the following configuration:

sesdev create octopus --roles="[admin],[mon,mgr,storage],[mon,mgr,storage],[mon,mgr,storage]" --ram=2 --qa-test test

ses5 deployment tries to remove ntp and fails when it isn't there

Apparently, the SLE-12-SP3 Vagrant box used by sesdev used to have the ntp package installed, but that is no longer the case. As a result, the ses5 deployment fails with:

    master: ++ zypper -n rm ntp
    master: Loading repository data...
    master: Warning: No repositories defined. Operating only with the installed resolvables. Nothing can be installed.
    master: Reading installed packages...
    master: 'ntp' not found in package names. Trying capabilities.
    master: No provider of 'ntp' found.
    master: Resolving package dependencies...
    master: Nothing to do.
Command '['vagrant', 'up']' failed: ret=1 stderr:
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

This is a consequence of #166 which added set -e at the top of the provisioning script.

With vagrant-libvirt < 0.0.45, vagrant up fails with "The following settings shouldn't exist: qemu_use_session"

vagrant up
Bringing machine 'admin' up with 'libvirt' provider...
Bringing machine 'node1' up with 'libvirt' provider...
Bringing machine 'node2' up with 'libvirt' provider...
Command '['vagrant', 'up']' failed: ret=1 stderr:
b"==> node2: An error occurred. The error will be shown after all tasks complete.
==> node1: An error occurred. The error will be shown after all tasks complete.
==> admin: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'admin'
machine. Please handle this error then try again:

There are errors in the configuration of this machine. Please fix
the following errors and try again:

Libvirt Provider:
* The following settings shouldn't exist: qemu_use_session



An error occurred while executing the action on the 'node1'
machine. Please handle this error then try again:

There are errors in the configuration of this machine. Please fix
the following errors and try again:

Libvirt Provider:
* The following settings shouldn't exist: qemu_use_session



An error occurred while executing the action on the 'node2'
machine. Please handle this error then try again:

There are errors in the configuration of this machine. Please fix
the following errors and try again:

Libvirt Provider:
* The following settings shouldn't exist: qemu_use_session
$ vagrant plugin list
vagrant-libvirt (0.0.43, system)

So - this is supposed to work in newer versions of vagrant-libvirt?

ceph_salt_deploy: refuse to deploy known-incompatible cephadm/ceph-salt version combinations

As a follow-up to #86 , I'd like to consider breaking ceph_salt_deployment.sh into two scripts:

The first one, called "ceph_salt_prep.sh" would end with this version information. Sesdev would take the "cephadm" and "ceph-salt" versions and check that they are compatible. Then (and only if the versions are not known to be incompatible) it would run the second part, "ceph_salt_deployment.sh", which would just trigger ceph-salt deploy and possibly the qa tests.

Support for Ceph on K8s

Without diving into details, it looks like adding support for rook deployments shouldn't be too hard to add...?

"Node 'admin' does not exist in this deployment" error when trying to establish an SSH tunnel

This happens in a current git checkout (rev f6f2697):

After creating a new cluster using sesdev create nautilus --single-node nautilus-singlenode, establishing an SSH tunnel using the command displayed in the end of the output fails:

$ sesdev tunnel nautilus-singlenode dashboard
Opening tunnel to service 'dashboard'...
Node 'admin' does not exist in this deployment

The node is actually called node1 in this setup.

Option "--no-deploy-mgrs" does not work

sesdev command line:

sesdev create octopus --ceph-salt-repo https://github.com/smithfarm/ceph-salt.git --ceph-salt-branch wip-fix-broken-osd-deploy --ceph-container-image="registry.opensuse.org/filesystems/ceph/master/upstream/images/ceph/ceph" --no-deploy-mons --no-deploy-mgrs --no-deploy-osds octopus_test1

Results in:

    admin:   |   o- Mgr .......................................................................................................... [Minions: 3]
    admin:   |   | o- node1.octopus_test1.com .............................................................................. [other roles: mon]
    admin:   |   | o- node2.octopus_test1.com .............................................................................. [other roles: mon]
    admin:   |   | o- node3.octopus_test1.com .............................................................................. [other roles: mon]
...
    admin:   o- Deployment .............................................................................................................. [...]
    admin:   | o- Bootstrap ......................................................................................................... [enabled]
    admin:   | o- Dashboard ............................................................................................................. [...]
    admin:   | | o- password ............................................................................................................ [***]
    admin:   | | o- username .......................................................................................................... [admin]
    admin:   | o- Mgr ............................................................................................................... [enabled]
    admin:   | o- Mon .............................................................................................................. [disabled]
    admin:   | o- OSD .............................................................................................................. [disabled]

which is wrong. With --no-deploy-mgrs it should say:

    admin:   | o- Mgr ............................................................................................................... [disabled]

octopus/ses7: Allow Ceph clusters to be deployed with AppArmor completely disabled

Octopus deployments currently have AppArmor enabled by default.

Past experience with testing ceph deployments has shown that AppArmor, when enabled, can potentially be a big headache, because it typically causes errors that are difficult to link to AppArmor.

Clearly, an easy way to turn off AppArmor is needed -- i.e., an--apparmor/--no-apparmor option to sesdev create.

However, this option must ensure that AppArmor is not running anywhere. It is not sufficient to disable it on the nodes themselves, because it might be running in the containers.

Need a way to turn off zypper repo priority elevation

Repos listed in VERSION_OS_REPO_MAPPING get an elevated priority, but this is not desired. (These repos are maintained and are not expected to have problems with version numbers.)

It's possible to specify custom zypper repos on the command line, but they automatically get an elevated priority setting which may not be desired.

When the repos are sane, elevated repo priority should not be needed. For custom repos specified on the command line, we can keep the default to support users who might be surprised to see their custom repo not being used. But there needs to be a way to turn off the priority elevation.

Conclusion:

  • implement a --no-repo-priority option providing the user a way to completely disable priority elevation
  • drop priority elevation for repos hard-coded in VERSION_OS_REPO_MAPPING

Create a new deployment version "pacific" (or "master"?)

Currently we have upstream master (call it "m") and octopus is "m-1". The current version of the product, SES7, was tracking "m" but is now tracking "m-1", and development on the next version, SES8, has not yet begun. Once it begins, it will likely want to track "m". So, we can distinguish two periods:

  1. the period during which the product-in-development is tracking "m", and
  2. the period during which the product-in-development is tracking "m-1"

Think about what a new deployment version "master" should mean during both of these periods, and also give some thought to the current naming convention, where the name of the upstream branch (e.g. "nautilus") actually means "deploy the downstream product code on openSUSE".

Default cluster has only two MGRs

This was found using --qa-test:

    admin: ++ ceph osd ls --format json
    admin: total nodes actual/expected:  4/4
    admin: MON nodes actual/expected:    3/3
    admin: MGR nodes actual/expected:    2/2
    admin: OSD nodes actual/expected:    3/3
    admin: total OSDs actual/expected:   6/6
    admin: number_of_nodes_actual_vs_expected_test: OK

It seems reasonable to have, by default, a MGR for every MON.

Jenkins runs sesdev with "--image-path null"

The new automated deployment test has a glitch:

+ sesdev create octopus --non-interactive --image-path null --qa-test --single-node mini
=== Creating deployment with the following configuration ===
1 Deployment VMs:
  -- admin:
     - OS:               leap-15.2
     - ses_version:      octopus
     - deployment_tool:  orchestrator
     - roles:            ['mds', 'igw', 'storage', 'prometheus', 'grafana', 'ganesha', 'mon', 'rgw', 'mgr', 'admin']
     - fqdn:             admin.mini.com
     - public_address:   10.20.47.200
     - cpus:             2
     - ram:              4G
     - storage_disks:    4
                         (device names will be assigned by vagrant-libvirt)
     - repo_priority:    True
     - qa_test:          True
     - image_path:       null

sesdev has no way of updating or deleting the Vagrant boxes it uses

The way sesdev currently works, once a vagrant box for a particular OS target has been downloaded, sesdev will continue to use it forever. The box will eventually go stale and need updating.

To address this issue, implement an --update-box option to sesdev create

Alternatively, the issue could be addressed by implementing a new subcommand that deletes the vagrant box corresponding to a given target, e.g.: sesdev delete-box ses7. Then a subsequent sesdev create ses7 would download a fresh box.

And sesdev delete-box --all could be implemented to enable the user to get rid of all the boxes at once.

Bootstrap minion doesn't have admin role

After ceph/ceph-salt#121, deploying octopus fails with:

    admin:   o- Containers .............................................................................................................. [...]
    admin:   | o- Images ................................................................................................................ [...]
    admin:   |   o- ceph ............................................................... [172.17.0.1:5001/ceph/daemon-base:latest-master-devel]
    admin:   o- Deployment .............................................................................................................. [...]
    admin:   | o- Bootstrap ......................................................................................................... [enabled]
    admin:   | o- Dashboard ............................................................................................................. [...]
    admin:   | | o- password ............................................................................................................ [***]
    admin:   | | o- username .......................................................................................................... [admin]
    admin:   | o- Mgr ............................................................................................................... [enabled]
    admin:   | o- Mon ............................................................................................................... [enabled]
    admin:   | o- OSD ............................................................................................................... [enabled]
    admin:   o- SSH ............................................................................................................ [Key Pair set]
    admin:   | o- Private_Key ............................................................... [cd:5f:30:e0:12:67:46:f9:97:bd:94:49:1e:66:0c:04]
    admin:   | o- Public_Key ................................................................ [cd:5f:30:e0:12:67:46:f9:97:bd:94:49:1e:66:0c:04]
    admin:   o- Storage ................................................................................................................. [...]
    admin:   | o- Drive_Groups ............................................................................................................ [1]
    admin:   |   o- {"testing_dg_admin": {"host_pattern": "admin*", "data_devices": {"all": true}}} ..................................... [...]
    admin:   o- System_Update ........................................................................................................... [...]
    admin:   | o- Packages ......................................................................................................... [disabled]
    admin:   | o- Reboot ........................................................................................................... [disabled]
    admin:   o- Time_Server ......................................................................................................... [enabled]
    admin:     o- External_Servers ........................................................................................................ [1]
    admin:     | o- 0.pt.pool.ntp.org ................................................................................................... [...]
    admin:     o- Server_Hostname ............................................................................................ [admin.mini.com]
    admin: ++ ceph-salt status
    admin: Bootstrap minion must be 'Admin'```

Error: no such option: --no-deploy-osds

According to sesdev create octopus --help, there are three options:

--deploy-osds
--deploy-mgrs
--deploy-mons

all of which default to "true". But I could find no way to turn these options off:

(venv) smithfarm@vanguard1:~/sesdev> sesdev create octopus --ceph-container-image="registry.opensuse.org/filesystems/ceph/master/upstream/images/ceph/ceph" --no-deploy-osds --no-deploy-mgrs --no-deploy-mons --single-node octopus_test1
Usage: sesdev create octopus [OPTIONS] DEPLOYMENT_ID
Try "sesdev create octopus --help" for help.

Error: no such option: --no-deploy-osds

podman container cannot be stopped due to AppArmor

Octopus cluster, but likely also affects ses7, where the package also isn't installed.

node2:~ # podman ps 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
019bb018c46c docker.io/prom/node-exporter:latest /bin/node_exporte... 3 minutes ago Up 3 minutes ago ceph-93c29e18-309f-11ea-83a2-52540028a9f3-node-exporter.ceph.com
23bd957645b8 docker.io/ceph/daemon-base:latest-master-devel /usr/bin/ceph-osd... 28 hours ago Up 28 hours ago ceph-93c29e18-309f-11ea-83a2-52540028a9f3-osd.4
f02f2155fe52 docker.io/ceph/daemon-base:latest-master-devel /usr/bin/ceph-osd... 28 hours ago Up 28 hours ago ceph-93c29e18-309f-11ea-83a2-52540028a9f3-osd.5
0251709461e1 docker.io/ceph/daemon-base:latest-master-devel /usr/bin/ceph-osd... 28 hours ago Up 28 hours ago ceph-93c29e18-309f-11ea-83a2-52540028a9f3-osd.6
09c9178b0506 docker.io/ceph/daemon-base:latest-master-devel /usr/bin/ceph-osd... 28 hours ago Up 28 hours ago ceph-93c29e18-309f-11ea-83a2-52540028a9f3-osd.7

node2:~ # podman stop ceph-93c29e18-309f-11ea-83a2-52540028a9f3-node-exporter.ceph.com
ERRO[0000] container_linux.go:389: signaling init process caused "permission denied" 
container_linux.go:389: signaling init process caused "permission denied"
Error: permission denied

node2:~ # aa-teardown 
Unloading AppArmor profiles 

node2:~ # podman stop ceph-93c29e18-309f-11ea-83a2-52540028a9f3-node-exporter.ceph.com
019bb018c46c78ebd2cb758ff7b13a1ca67a9546c6dc48ed13fbe9432edb6b8a

node2:~ # 

dmesg related output:

[101894.050966] audit: type=1400 audit(1578575016.796:3): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=19883 comm="podman" requested_mask="receive" denied_mask="receive" signal=exists peer="unconfined"
[101894.057036] audit: type=1400 audit(1578575016.800:4): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=19893 comm="runc" requested_mask="receive" denied_mask="receive" signal=term peer="unconfined"
[101894.058783] audit: type=1400 audit(1578575016.804:5): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=19883 comm="podman" requested_mask="receive" denied_mask="receive" signal=exists peer="unconfined"

"ceph device ls" does not work

ceph device ls does not work. It returns no devices. This is probably due to the virtual machines having not sufficient device meta data (vendor, serial number, etc) for a device ID to be generated.

This is on an octopus cluster, but I expect this to be an issue with any other setup as well.

The Ceph Dashboard will also have functionality limited by this issue, which may not easily be resolvable as SMART data also cannot be read out in virtual machines.

➜  ~ ceph device ls
DEVICE HOST:DEV DAEMONS LIFE EXPECTANCY 
➜  ~ udevadm info /dev/vda
P: /devices/pci0000:00/0000:00:03.0/virtio0/block/vda
N: vda
S: disk/by-path/pci-0000:00:03.0
S: disk/by-path/virtio-pci-0000:00:03.0
E: DEVLINKS=/dev/disk/by-path/pci-0000:00:03.0 /dev/disk/by-path/virtio-pci-0000:00:03.0
E: DEVNAME=/dev/vda
E: DEVPATH=/devices/pci0000:00/0000:00:03.0/virtio0/block/vda
E: DEVTYPE=disk
E: ID_PART_TABLE_TYPE=gpt
E: ID_PART_TABLE_UUID=42c809fd-9a5d-4dfe-84b2-6d931bf350bf
E: ID_PATH=pci-0000:00:03.0
E: ID_PATH_TAG=pci-0000_00_03_0
E: MAJOR=253
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=5101428

Adding device information for VMs might resolve the problem for Ceph creating device IDs: https://lists.opensuse.org/opensuse-bugs/2012-04/msg00361.html

--num-disks > 25 doesn't work properly

Will result in:

<snip>
       - /dev/vdy        20G
       - /dev/vdz        20G
       - /dev/vd{        20G
       - /dev/vd|        20G
       - /dev/vd}        20G
       - /dev/vd~        20G
       - /dev/vd        20G

in the === Creating deployment with the following configuration === dialog.

openattic role gets added to ses5 automatically (forced)

There doesn't seem to be a way to deploy a ses5 cluster without openattic:

$ sesdev create ses5 --roles="[admin, storage, mon, mgr, mds]" mini_ses5
=== Creating deployment with the following configuration ===
Deployment VMs:
  -- admin:
     - OS:               sles-12-sp3
     - ses_version:      ses5
     - deployment_tool:  deepsea
     - roles:            ['admin', 'storage', 'mon', 'mgr', 'mds', 'openattic']
     - fqdn:             admin.mini_ses5.com
     - public_address:   10.20.36.200
     - cpus:             2
     - ram:              4G
     - storage_disks:    2
       - /dev/vdb        8G
       - /dev/vdc        8G


Do you want to continue with the deployment? [Y/n]: 

"sesdev list" fails if one or more remote libvirt deployments are not reachable

Currently when you have a remote deployment and this deployment is unreachable, sesdev list fails with:

Command '['vagrant', 'status']' failed: ret=1 stderr:
b'Error while connecting to libvirt: Error making a connection to libvirt URI qemu+ssh://root@g113/system?no_verify=1&keyfile=/home/enno/.ssh/workstation_ses:\nCall to virConnectOpen failed: Cannot recv data: ssh: Could not resolve hostname g113.suse.de: Name or service not known: Connection reset by peer\n'

Which shows another error with the formatting of the message.

feature request: documented way to transfer file from guest to host

Use case:

I need to copy a file (e.g. /var/log/ceph-bootstrap.log) from a guest VM to the host.

The existing documentation does not mention this anywhere, and I have no idea how to achieve it. Ordinarily I would try to run vagrant port to find out which port on the host the guest's SSH daemon is listening on, but this doesn't work:

$ cd ~/.vagrant.d/boxes/sles-15-sp2/0/libvirt
$ vagrant port
The libvirt provider does not support listing forwarded ports. This is
most likely a limitation of the provider and not a bug in Vagrant. If you
believe this is a bug in Vagrant, please search existing issues before
opening a new one.

Also, the synced directory /vagrant appears to be unidirectional: i.e. when the guest VM is created, /vagrant is populated from the host via rsync, but changes made by the guest to this directory are not reflected on the host (?).

Deploy to public/private cloud (Amazon EC2, OpenStack) instead of libvirt/kvm

It would be nice if sesdev would support deploying a Ceph cluster not only on a local or remote machine running libvirt/qemu, but also "in the Cloud", e.g. on Amazon EC2, OpenStack, Azure, Public-Cloud-Flavor-Of-The-Week, etc. Ideally, I would only have to flip a config switch to use one of these instead of libvirt and provide my Public Cloud access credentials in a config file to switch the deployment mode.


UPDATE: this was originally just for Amazon EC2, but since sesdev does not currently support any public cloud, I repurposed the ticket to cover them all.

Change "admin role" semantics

Until ceph/ceph-salt#121 it made sense to call the Salt Master node the "admin" node and enforce that there is only one "admin" node in the cluster.

After ceph/ceph-salt#121, ceph-salt is using more or less the same semantics as DeepSea's policy.cfg - i.e. the "master" role means that this node is the Salt Master and the "admin" role means that this node should get ceph.conf and the admin keyring.

So, for all deployment versions except caasp4 it now makes sense to:

  • change current "admin" role to "master" everywhere
  • newly give all nodes the "admin" role, regardless of whether or not the user specified it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.