redhatqe / rhui3-automation Goto Github PK

View Code? Open in Web Editor NEW

4.0 13.0 4.0 887 KB

Tools to deploy and test RHUI3

License: GNU General Public License v2.0

Shell 2.50% Python 93.93% Jinja 3.57%

rhui3-automation's Introduction

RHUI 3 Automation

Tools to deploy and test Red Hat Update Infrastructure 3 (RHUI 3).

Overview

RHUI 3 Automation consists of a script that prepares AWS EC2 machines and a set of Ansible playbooks that turn the machines into fully functional RHUI 3 nodes: RHUA, CDSes, HAProxy etc. Optionally, a test machine and a client machine can also be installed and made available for automated testing and/or further manual experimenting.

deploy/: Ansible playbooks to set up the individual nodes.
docs/: Documentation for test modules. Needs Sphinx to build.
scripts/: Scripts to simplify the deployment even more by creating a cloudformation stack with the individual RHUI 3 nodes and a hosts configuration file to use by Ansible.
tests/: Test suite (test cases and libraries) to verify the functionality of an installed RHUI 3 environment or check for potential regressions in updated RHUI 3 ISOs.
hosts.cfg: A template for the hosts configuration.
{ATOMIC,RHEL{5,6,7{,_arm64},8}}mapping.json: IDs of the latest Atomic and RHEL 5, 6, 7 & 8 AMIs. The deployment script uses this data.

Usage

See the deployment readme file for details.

Data Files

In addition to hosts.cfg and the JSON files, the following data file exists:

tests/rhui3_tests/tested_repos.yaml: Names of repositories to test. These repositories must be part of the entitlement certificate which is uploaded to RHUA. For more information about the certificate, see the requirements section in the tests readme file.

rhui3-automation's People

Contributors

Stargazers

Watchers

Forkers

vex21 pombredanne lmctv taftsanders

rhui3-automation's Issues

[TESTS] created in hosts.cfg instead of [TEST]

create-cf-stack.py --tests creates a [TESTS] section, but the expected role in the ansible playbook is [TEST]. Consequently, the test machine isn't configured and extra files aren't uploaded to RHUA.

Please make sure the script creates [TEST].

replace 'proceed_without_check' with 'proceed_with_check'

If possible every occurrence of 'proceed_without_check' function in tests should be replaced with 'proceed_with_check'.

[1] https://github.com/RedHatQE/rhui3-automation/blob/master/tests/rhui3_tests_lib/rhuimanager.py#L115

rhui-installer fails when --remote-fs-type=glusterfs

rhui-manager is not installed

NFS install differs for RHEL6 and RHEL7

The current version of automation will return error for NFS role on RHEL6 iso setup. On RHEL7 service is called nfs-server, but on RHEL6 it's nfs.
Update of deploy/roles/nfs/tasks/main.yml and deploy/roles/nfs/handlers/main.yml is needed.

Include username in stack ID

The stack ID is currently created this way:

STACK_ID = "STACK" + ''.join(random.choice(string.ascii_lowercase) for x in range(10))

Consequently, it's not clear from the cloudformation page in AWS who owns which stack. Or from a different perspective, I can't see which stacks are mine.

Please include the user name (the ec2_name variable) in the stack ID as well to provide direct ownership information.

Test coverage for an expired certificate

Action item: Obtain an expired certificate and extend the test suite so it can verify that rhui-manager is able to handle it correctly. Both the interactive and the CLI mode tests are to be extended.

Variables and return values contain formatting

In https://github.com/RedHatQE/rhui3-automation/blob/master/tests/rhui3_tests/tested_repos.yaml#L6 --

version: " \$7Server-x86_64\$"

This doesn't look like clean code (data handling). The variable itself shouldn't contain that space character. If someone wants to connect "name" and "version" into one string, they are responsible for connecting these two variables with a space character.

In https://github.com/RedHatQE/rhui3-automation/blob/master/tests/rhui3_tests_lib/rhuimanager_repo.py#L181 --

return " $" + repo_version + "$"

This doesn't look like clean code, either. The function shouldn't return the initial space character, and one could argue that it shouldn't return the parentheses, either. If the purpose of the function is to get you the repo version, it should just:

return repo_version

So that the caller gets just the version string and deals with it any way it wants. If/when the caller then needs to use the repo version in a specific way, it's responsible for adding whatever characters that are needed around it according to the context.

update rhui3-automation to ansible 2.2

Using bare variables is deprecated. The update of playbooks is needed so that the environment
value uses the full variable syntax.

libffi-devel and openssl-devel missing

The ansible installation of test suite fails because a couple of -devel packages are missing.

libffi-devel
openssl-devel

Having installed those two manually, I was able to complete the ansible run.

test naming

@RadekBiba Is there any way test naming can be changed? It's not cool to update numeration every time a new test added.

get major.minor RHEL version

I'd like Util.get_rhua_version() from rhui3_tests_lib.util to return a float representing the complete major.minor version of RHEL running on the RHUA node. At present, the method only returns an integer with the major version (6 or 7), but I'd appreciate the minor version, too (6.9, 7.5), so I could assess the environment more accurately.

I'll make sure that existing test cases don't break.

Any objections, @alexxa ?

Switch to make yum update prior to installation

As customer will probably want to install RHUI on fully updated OSes, we should also test it on these configurations. Switch that will cause ansible to run "yum update -y" first is for situations where you actually want to skip this lengthy process (might take more than several minutes) in order to test something not relevant to versions of packages (eg. do development on install scripts themselves).

RHUIManagerRepo.add_rh_repo_by_repo works for Yum repos only or which repo attributes to store in tested_repos.yaml

Line [1] will cause an error if repo has any other type but 'Yum', e.g for

Red Hat Enterprise Linux Atomic Host (Trees) from RHUI
- 6 : Red Hat Enterprise Linux Atomic Host (Trees) from RHUI (Version 7.3.3) (Atomic)

One of possible solutions: update RHUIManagerRepo.add_rh_repo_by_repo so it will take reponame of a single repo as an argument, not a repolist of several repos, and add repo_type as another argument, with default value 'Yum', since that type is used the most. Mind to update all tests calling that function.

Before fixing this issue, we need to define what repo attributes to store in tested_repos.yaml or how to store them. Some rhui-manager functions need the full name (+version+type), e.g. add_by_repo, schedule_sync, others just a name without version or type. Therefore, functions are supposed to take_plane|concatenate|split|extract relevant repo attributes.

To me, storing a full repo name (i.e. +version+type) looks the most reasonable right now. Just because, it requires the minimum actions/knowledge from a user, as well as from functions which may need only name+version or just a name. However, this solution is not applicable to a special atomic repo, which has a floating version by the moment.

[1] https://github.com/RedHatQE/rhui3-automation/blob/master/tests/rhui3_tests_lib/rhuimanager_repo.py#L123

dnf support

For future, when yum will be obsoleted we might need a possibility to use dnf on those machines. Please note that we will still need yum for installation on RHEL6 machines.

Warnings from ansible 2.3

The following warnings are printed when ansible 2.3 is used with rhui3-automation:

< TASK [rhua : install rhui-installer] >
 --------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

changed: [ec2-54-74-204-139.eu-west-1.compute.amazonaws.com]
 __________________________________________
< TASK [rhua : call rhui installer if nfs] >
 ------------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

 [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{install_rhui_installer|changed and 'NFS' in groups and
groups['NFS']|length > 0 }}

changed: [ec2-54-74-204-139.eu-west-1.compute.amazonaws.com]
 ______________________________________________
< TASK [rhua : call rhui installer if gluster] >
 ----------------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

 [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{install_rhui_installer|success and 'GLUSTER' in groups and
groups['GLUSTER']|length > 0 }}

skipping: [ec2-54-74-204-139.eu-west-1.compute.amazonaws.com]

Limit queries to DNS server

In the wake of a security incident reported by Amazon, we need to limit access to the DNS server to the machines that are part of the RHUI setup only. We can either modify named.conf file or work with security group rules, depends on what proves to be a better approach.

[RFE] merge CDS and HAProxy libraries

CDS and HAProxy libraries can be merged. Particularly, files:

rhui3_tests_lib/cds.py and rhui3_tests_lib/hap.py
rhui3_tests_lib/rhuimanager_cds.py and rhui3_tests_lib/rhuimanager_hap.py

Code is the same and differs in Object name only. 'cds' and 'haproxy' can be joined to 'cds_hap_instance', for example, and strings replaced with regular expressions, etc.. When merged, comments should explain that the library is valid for CDS and HAProxy instances.

test_repo_managment.py doesn't upload an entitlement certificate, and almost no other tests remove certificates before they finish

There's no step in test_repo_managment.py that would upload a Red Hat entitlement certificate, so unless a cert has been uploaded previously, e.g. by another test case in the test suite executed prior to this one, Add a RH repo by repository won't work. That's not nice because every test case should be self sufficient.

On a related note, I think every test case that uploads a cert should remove it as part of the cleanup. There's RHUIManager.remove_rh_certs() for that, and it's currently only used in two test cases.

/etc/resolv.conf isn't preserved after rebooting

When you reboot an AWS system which is a component of your RHUI 3 environment, its hostname is no longer $ROLE$NUMBER.example.com as originally configured by ansible. It becomes the hostname that matches the IP address of eth0, ie. ip-$IP.$REGION.compute.internal.

After a few hours of investigation, it turned out that cloud-init was the culprit. Specifically, the following line in /etc/cloud/cloud.cfg:

- update_hostname

In order to be able to preserve the custom hostname, we need to remove this line. Or uninstall cloud-init.

I've already verified that the custom hostname is preserved after rebooting when the offending line is missing from cloud.cfg.

unify TESTS and MASTER

In playbooks, sometimes one of the hosts and roles is called TESTS, sometimes MASTER. It's the same and should be unified.

A separate list of AMIs used for stack creation

We need to find a way to maintain the list of AMIs that are used for stack creation.

The questions that need to be addressed are:

do we want to keep the list in a file separate from the main script?
what is the easiest way to allow the user change AMIs that are used when need be
we need to update the list of AMIs after each RHEL minor version - vhutsky to create a script which would generate the dictionary

Add Server- and Client-side quruom configuration and tests

Quorum configuration should be optional.

use the yum module instead of running yum

< TASK [rhui_nodes : perform repolist] >
...
 [WARNING]: Consider using the yum module rather than running yum.

The "offending" task reads:

shell: "yum repolist --disableplugin=* --disablerepo=* --enablerepo=local-rhui3 -v | grep 'Repo-pkgs' | cut -d: -f 2"
...
assert: { that: "{{yum_repolist_result.stdout|int > 0 }}" }

IOW, with all yum plugins and repos disabled -- and only local-rhui3 (from the ISO) enabled -- we check if there are some available packages. Running the same command in a login shell shows there are 101 packages in the repo.

This should be done more elegantly using the yum Ansible module.

changing screens between functions in one test case

By now every function starts and ends on the same screen. If a user wants to call a function from a different screen in the same test case, it's needed to switch into 'home' screen first. E.g.

upload_rh_certification()
Expect.enter(connection, "home")
Expect.expect(connection, ".*rhui (" + "home" + ") =>")
add_rh_repo_by_repo()
Expect.enter(connection, "home")
Expect.expect(connection, ".*rhui (" + "home" + ") =>")
sync_repo()
get_repo_status()
Expect.enter(connection, "home")
Expect.expect(connection, ".*rhui (" + "home" + ") =>")
delete_repo()

Therefore, switching to a 'home' screen or quit'ing RHUI should be added to every function. And every function should start with a relative screen. Maybe there is a different solution...

tests duplication

Instead of tests/code duplication between rhui3-automation/tests/rhui3_tests/test_*.py, they should be imported or maybe inherited e.g.

add a task on rhui-manager login

Currently, a RHUI3 installation playbook returns exit code of rhui-installer. If it returns '0', it doesn't necessary mean that the installation went fine and a user can login into rhui-manager. For example, a user will see "Network error" if RHUA hostname doesn't match a hostname in the certs, but rhui-installer returns '0'. A task to login into rhui-manager should be added into installation playbook.

Exceptions are "raised" not "risen" (cosmetic/grammar issue)

... rhui3-automation]$ git grep risen
tests/rhui3_tests_lib/rhuimanager.py: to be risen when the line isn't actually a selection line
tests/rhui3_tests_lib/rhuimanager_cds.py: To be risen when trying to add an already tracked Cds
tests/rhui3_tests_lib/rhuimanager_cds.py: To be risen when e.g. trying to select non-existing Cds
tests/rhui3_tests_lib/rhuimanager_cds.py: To be risen in case rhui-manager wasn't able to locate provided SSH key path
tests/rhui3_tests_lib/screenitem.py: to be risen in case the item can't be located

All of the occurrences of the word risen are used when referring to exceptions, but the correct usage is to raise an exception, past participle being "raised". You can't rise an exception, past participle being risen, because the word rise has a different meaning and isn't even transitive, so it can't be used in the passive voice.

update ReadMe

Add to ReadMe all the switches which go with
ansible-playbook -i ~/pathto/hosts.cfg site.yml --extra-vars "rhui_iso=~/Path/To/Your/RHUI.iso"

Add rh-amazon-rhui-client-ha to extra-vars to be installed if HAProxy on RHEL6

improve the way some commands are executed and checked

There are several places in the code where the following statement is used:

Expect.ping_pong(connection, "some_command && echo SUCCESS", "[^ ]SUCCESS")

This isn't optimal, though. When the command fails, the code won't know until ping_pong times out. So the execution is halted and time is wasted basically waiting for a string that will never occur. To demonstrate the waiting, see the following reproducer:

[root@test ~]# cat ping_pong_and_success.py 
#!/usr/bin/python
import stitches
from stitches.expect import Expect

CONNECTION = stitches.Connection("rhua.example.com", "root", "/root/.ssh/id_rsa_test")

Expect.ping_pong(CONNECTION, "test -f /notafile && echo SUCCESS", "[^ ]SUCCESS")
[root@test ~]# time ./ping_pong_and_success.py 
Traceback (most recent call last):
  File "./ping_pong_and_success.py", line 7, in <module>
    Expect.ping_pong(CONNECTION, "test -f /notafile && echo SUCCESS", "[^ ]SUCCESS")
  File "/usr/lib/python2.7/site-packages/stitches/expect.py", line 176, in ping_pong
    return Expect.expect(connection, strexp, timeout)
  File "/usr/lib/python2.7/site-packages/stitches/expect.py", line 87, in expect
    timeout)
  File "/usr/lib/python2.7/site-packages/stitches/expect.py", line 63, in expect_list
    raise ExpectFailed(result)
stitches.expect.ExpectFailed: test -f /notafile && echo SUCCESS
[root@rhua ~]# 

real	1m41.560s
user	0m0.293s
sys	0m0.081s

The Expect class provides a different method for this, expect_retval, which runs the given command and checks its exit code. I suggest we use it instead.

add ATOMIC_CLI role into stack

--atomic_cli [number] - amount of Atomic CLI machines, default = 0

name of a role in output conf file: [ATOMIC_CLI]

Atomic CLI can be only a RHEL7 machine, and only if RHUA role is also on RHEL7.

Deprecation warnings from Ansible

I noticed several deprecation warnings earlier, but after updating to Ansible 2.5, a lot more warnings appear. We should modernize our playbooks.

Here's a list of current warnings (squeezed by sort -u):

[DEPRECATION WARNING]: 'include' for playbook includes. You should use 'import_playbook' instead. This feature will be removed in version 2.8. Deprecation warnings can be 
[DEPRECATION WARNING]: state=running is deprecated. Please use state=started. This feature will be removed in version 2.7. Deprecation warnings can be disabled by setting 
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result|changed` instead use `result is changed`. This feature will be removed in version 2.9.
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result|success` instead use `result is success`. This feature will be removed in version 2.9.

Make repo names as constants loaded from yaml file

It's not cool to update repo names used in tests every time we update the cert. Therefore, repo names can be stored in one yaml file and test cases can take them from there.