red-hat-data-services / ods-ci Goto Github PK

View Code? Open in Web Editor NEW

8.0 16.0 77.0 13.06 MB

odh qe tier tests

License: MIT License

Python 15.40% Shell 3.93% RobotFramework 78.92% Dockerfile 0.20% Jinja 0.22% HCL 0.10% Jupyter Notebook 1.23%

ods-ci's Introduction

ODS-CI

ODS-CI is a framework to test Red Hat Open Data Science and its upstream project, Open Data Hub.

Requirements

Linux distribution that supports Selenium automation on top of either a Chromium/Google-Chrome web browser using ChromeDriver or Firefox web browser using geckodriver:
- relevant web driver binaries can be downloaded here: ChromeDriver or geckodriver
- the ChromeDriver version must match the installed version of Chromium/Google-Chrome, for geckodriver see the release notes for particular release
- install your web driver so that it's visible by Robot Framework during tests execution, e.g. into ~/.local/bin path
Poetry tool installed and added to your $PATH.

Quick Start

Move to the ods_ci folder inside ods-ci repo
```
cd ods_ci
```

Create a variables file for all of the global test values

# Create the initial test variables from the example template variables file
cp test-variables.yml.example test-variables.yml

Edit the test variables file to include information required for this test run. You will need to add info required for test execution:
- URLs based on the test case you are executing.
  - OpenShift Console.
  - Open Data Hub Dashboard.
  - JupyterHub.
- Test user credentials.
- Browser webdriver to use for testing.
Run this script that will create the virtual environment, install the required packages and kickoff the Robot test suite.

   # Running all the tests
   sh run_robot_test.sh

   # Running Smoke test suite via tag
   sh run_robot_test.sh --include Smoke

   # Running a specific test via tag
   sh run_robot_test.sh --include ODS-XYZ

   # Running tests in Open Data Hub:
   # You need to set accordingly the PRODUCT, APPLICATIONS_NAMESPACE, MONITORING_NAMESPACE,
   # OPERATOR_NAMESPACE and NOTEBOOKS_NAMESPACE in test-variables.yaml (or pass them as parameters
   # when launching the tests) and overwrite some local variables used in the test suites
   # adding --variablefile ./ods_ci/test-variables-odh-overwrite.yml
   sh run_robot_test.sh \
    --test-variable PRODUCT:ODH \
    --test-variable APPLICATIONS_NAMESPACE:opendatahub \
    --test-variable MONITORING_NAMESPACE:opendatahub \
    --test-variable OPERATOR_NAMESPACE:openshift-operators \
    --test-variable NOTEBOOKS_NAMESPACE:opendatahub \
    --extra-robot-args '--variablefile test-variables-odh-overwrite.yml' \
    --include OpenDataHub

This run_robot_test.sh is a wrapper for creating the python virtual environment and running the Robot Framework CLI.
The wrapper script has several arguments and you can find details in the dedicated document file. See run_args.md
As alternative, you can run any of the test cases by creating the python virtual environment, install the packages in poetry.lock and running the robot command directly

Contributing

See CONTRIBUTING.md

ODS-CI Container Image

See build README on how you can build and use a container to run ODS-CI automation in OpenShift.

License

This project is open sourced under MIT License.

ods-ci's People

Contributors

Stargazers

Watchers

ods-ci's Issues

Cluster provisioning doesn't properly handle errors

Currently, the cluster creation does not handle properly the reason because it fails making it complicated to debug the root cause of the failure.

Various problems with `util.execute_command`

          I have to admit that I know almost nothing about Python, but from what I tired, the while loop isn't called after the command is completed but during the command is running. Try to play e.g. with something like this:

import subprocess
import sys

cmd = "while true; do echo 'ahoj'; sleep 2; done"
#cmd = "echo ahoj; sleep 2"
output = ""

print(f"CMD: {cmd}")

with subprocess.Popen(
    cmd,
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True,
    encoding="utf-8",
    errors="replace",
) as p:
    while True:
        line = p.stdout.readline()
        if line != "":
            output += line + "\n"
           # print(line)
        elif p.poll() is not None:
            break
    sys.stdout.flush()
    print(f"OUTPUT: {output}")

Thus as I said - we don't have to face any issue for a short running calls here, but we may eventually run into issue of missing a long-running command execution that has been terminated before the command ended and could print out the final output. Also, with this change we don't have any idea what is happening in the meantime for these long-running commands.

Originally posted by @jstourac in #1234 (comment)

Items to backport to releases/2.8.0 branch

This serves as a tracking issue for the changes candidates to be backported to the releases/2.8.0 branch.

Target release for these changes are before the RHOAI 2.8.2 z.stream release right now.

From https://github.com/red-hat-data-services/ods-ci/commits/master/:

Latest commit checked on master: 3e55ab3

Additional:

#1420 if the must-gather will be updated for the new 2.8.z release

Details on the expected variables for running ods-ci tests in the docs

The readme doesn't explain about the variables expected - https://github.com/red-hat-data-services/ods-ci/tree/master?tab=readme-ov-file

It would be helpful if there are more details provided on:

What are these test users and AWS credentials are necessary - what is it referring to exactly? and from which resource on the cluster should we extract the same? https://github.com/red-hat-data-services/ods-ci/blob/master/ods_ci/test-variables.yml.example
I would want to understand more on what the users (TEST_USER.NAME, etc) here, is this users of type user.openshift.io/v1 resource in openshift? I understand these values are used for testing, but I am unclear about where these values should be extracted from.
The details of test cases which need these variables.

Generic 'Click Action From Actions Menu' keyword

The keyword Click Action From Actions Menu is duplicated in Workbenches.resource and Pipelines.resource
We can make it generic.
See the discussion in https://github.com/red-hat-data-services/ods-ci/pull/843/files#r1252061113

robotidy stop working because configuring transformer 'AlignTestCases' failed

Following the documentation https://github.com/red-hat-data-services/ods-ci/blob/master/ods_ci/docs/check-code-style.md

(dev39) [dlovison@dlovison ods-ci]$ robotidy ods_ci/tests/Tests/400__ods_dashboard/430__data_science_pipelines/431__data-science-pipelines-api.robot
Loaded configuration from /home/dlovison/github/openshift-data-science/ods-ci/ods_ci/robotidy.toml
Error: Configuring transformer 'AlignTestCases' failed. Verify if correct name was provided.

Identity provider is installed. Skipping installation [0m [Pipeline] echo

jenkins sanity failure due to skipped identity providers

https://opendatascience-jenkins-csb-rhods.apps.ocp4.prod.psi.redhat.com/job/rhods-sanity/132/console

OC login should retry few times


oc login -u OCP_ADMIN_USER_USERNAME -p ******** OPENSHIFT_CONSOLE_URL:6443 -

rror: dial tcp: lookup OPENSHIFT_CONSOLE_URL on 10.11.5.19:53: no such host - verify you have provided the correct host and port and that the server is currently running.
[Pipeline] }
[Pipeline] // maskPasswords
Error when executing always post condition:
java.lang.Exception: Failed to perform oc login to test cluster OPENSHIFT_CONSOLE_URL
	at ocLoginToCluster.call(ocLoginToCluster.groovy:39)
	at ___cps.transform___(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
	at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:77)
	at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)

JupyterHub Spawner UI: Verify that only non-negative integers are allowed for gpu count

Users should only be able to input non-negative integers for the gpu count when spawning a user notebook. This value can never be greater than the number of available gpus present in the cluster

Selenium tests crashing with InvalidArgumentException: Message: invalid argument (Session info: chrome=120.0.6099.109) Stacktrace:

Hello, I am having the following issue running selenium tests

$ ./ods_ci/run_robot_test.sh --extra-robot-args '-i ODS-1680' --open-report true

Using project "default".
Kubernetes control plane is running at https://api.xxx.com:6443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

connected as openshift user ' htpasswd-cluster-admin-user '
since the oc login was successful, continuing.
Updating dependencies
[...]

Installing the current project: ods-ci (0.1.0)
==============================================================================
Tests                                                                         
==============================================================================
Tests.Ods Dashboard                                                           
==============================================================================
2023-12-18 10:48:16,538 - RPA.core.certificates - INFO - Truststore not in use, HTTPS traffic validated against `certifi` package. (requires Python 3.10.12 and 'pip' 23.2.1 at minimum)
Tests.Ods Dashboard.Ods Dashboard User Mgmt                                   
==============================================================================
[ WARN ] No Prometheus namespace found
Verify Unauthorized User Is Not Able To Spawn Jupyter Notebook :: ... | FAIL |
InvalidArgumentException: Message: invalid argument
  (Session info: chrome=120.0.6099.109)
Stacktrace:
#0 0x564154baad33 <unknown>
#1 0x564154867dbb <unknown>
#2 0x56415484e267 <unknown>
#3 0x56415484bad9 <unknown>
#4 0x56415484c2ca <unknown>
#5 0x56415486ae4e <unknown>
#6 0x564154900a45 <unknown>
#7 0x5641548e1342 <unknown>
#8 0x564154900297 <unknown>
#9 0x5641548e10e3 <unknown>
#10 0x5641548a9044 <unknown>
#11 0x5641548aa44e <unknown>
#12 0x564154b6f861 <unknown>
#13 0x564154b73785 <unknown>
#14 0x564154b5d285 <unknown>
#15 0x564154b7441f <unknown>
#16 0x564154b4120f <unknown>
#17 0x564154b98028 <unknown>
#18 0x564154b981f7 <unknown>
#19 0x564154ba9ed4 <unknown>
#20 0x7fbf147ad897 start_thread
#21 0x7fbf148346fc __clone3
------------------------------------------------------------------------------
Tests.Ods Dashboard.Ods Dashboard User Mgmt                           | FAIL |
1 test, 0 passed, 1 failed
==============================================================================
Tests.Ods Dashboard                                                   | FAIL |
1 test, 0 passed, 1 failed
==============================================================================
Tests                                                                 | FAIL |
1 test, 0 passed, 1 failed
==============================================================================
Output:  /home/jdanek/repos/ods-ci/ods_ci/test-output/ods-ci-2023-12-18-10-48-JeDylUytpL/output.xml
XUnit:   /home/jdanek/repos/ods-ci/ods_ci/test-output/ods-ci-2023-12-18-10-48-JeDylUytpL/xunit_test_result.xml
Log:     /home/jdanek/repos/ods-ci/ods_ci/test-output/ods-ci-2023-12-18-10-48-JeDylUytpL/log.html
Report:  /home/jdanek/repos/ods-ci/ods_ci/test-output/ods-ci-2023-12-18-10-48-JeDylUytpL/test_report.html
1

I suspected this might be caused by

SeleniumHQ/selenium#12649

but upgrading project.toml to use selenium at a newer (unaffected version) did not help. If I create a hello-world style robot framework selenium project from scratch, it runs fine.

I'll keep trying with different versions of selenium/webdriver, etc.

use dockerfile locally for running tests

30 tests missing Polarion-ID's

Currently Smoke has 16 tests and Sanity 42, but in Polarion's tests runs only have 10 and 14 tests for each of them.

I think this is because we have 30 tests without the polarion-id tag. We should add the polarion id to all of them and run smoke/sanity again to see if the results are shown properly in Polarion.

Two different `yq` implementations used among our tooling

As a result of my work on #943 I realized that there are two yq implementations and both are still alive (I was under the impression that the first one is dead since we download the second one in our codebase):

https://github.com/kislyuk/yq - Python based
https://github.com/mikefarah/yq - GoLang based

I'm not sure what were reasons to move to the GoLang based one. From the usage in our tests here we use it either to get particular value from yaml file or to update particular variable in the yaml file (using the --inplace flag).

If I don't miss anything, both features are implemented also by the Python based yq implementation which we depend on in our Poetry definition files. See, e.g.:

yq -y --in-place '.BROWSER.NAME="ahoy"' /tmp/test-variables.yml

My proposal is to ditch usage of the GoLang based yq implementation among this project (and also Jenkins docker image) so that we can rely on Poetry definition completely and these two implementations aren't mixed up in our environments.

remove chrome drive as it breaks the ci with latest centos/fedora

JupyterHub Spawner UI: Verify that only valid environment variable names are allowed

Users should only be allowed to add new environment variables with valid names (alphanumeric and underscore)

Migrate to Browser library

This was discussed on Automation meeting and on internal slack. Originally proposed by @tarukumar.

https://robotframework-browser.org/

Browser is incompatible with kfp

kfp (1.8.21) depends on protobuf (>=3.13.0,<4)
and robotframework-browser (18.0.0) depends on protobuf (4.25.1)

There is no easy solution, https://stackoverflow.com/questions/77939981/how-to-handle-diamond-dependency-in-python/77940013

Missing waits

When logging into OpenShift, we do

${oauth_prompt_visible} =  Is OpenShift OAuth Login Prompt Visible
   IF  ${oauth_prompt_visible}       Click Button  Log in with OpenShift
   ${login-required} =  Is OpenShift Login Visible
...

The problem is that Browser is too fast and it runs the Is OpenShift Login Visible before the page is reloaded after Click Button

Strict mode

Browser library will fail the test if locators match more than a single element. https://playwright.dev/docs/locators#strictness

Exception has occurred.
Suspended due to logged failure: Error: locator.waitFor: Error: strict mode violation: locator('//input[contains(@id, "minimal-notebook")]') resolved to 3 elements:
    1) <input type="radio" aria-invalid="false" name="Minimal …/> aka getByLabel('Minimal Python2023.2')
    2) <input type="radio" aria-invalid="false" data-ouia-safe…/> aka locator('[id="s2i-minimal-notebook\\:2023\\.2"]')
    3) <input type="radio" aria-invalid="false" data-ouia-safe…/> aka locator('[id="s2i-minimal-notebook\\:2023\\.1"]')

Call log:
  �[2m- waiting for locator('//input[contains(@id, "minimal-notebook")]')�[22m

robotframework-browser-migration is not a drop-in replacement

I had hight hopes for https://pypi.org/project/robotframework-browser-migration. Sadly, it does not magically work when I put it in. Instead, the issues above appear, plus additional test failures not yet understood.

Assert URLs shouldn't consider slash in the end

The following URL must match. Today it is failing
"url": "https://docs.pachyderm.com/latest/getting_started/"
"url": "https://docs.pachyderm.com/latest/getting-started"

See: #712

Running a test:

ODS_CI_RUN_SCRIPT_ARGS=--test-case ods_ci/tests/Tests/400__ods_dashboard/400__ods_dashboard.robot --set-urls-variables true --extra-robot-args '-L DEBUG' --skip-oclogin

drop python version check in smoke

drop python version check in smoke and move to sanity as a single

[Tracking issue] Radio buttons on the Explore page cards ODH 2.6 change

But it would be nice to create a tracking issue for this based on the discussion and eventually provide there link to this PR for our future reference so we can save a bit time during the next fix of the tests.

Originally posted by @jstourac in #1111 (comment)

Edit:

update readme and how stable branches are created

Needs update to Readme on how to run and updates to how branches are created

JupyterHub Spawner UI: Verify that duplicate environment variable names are not allowed

When adding new environment variables, duplicate environment variable names should not be allowed.

Updates to dev documentation

[jdanek] ods-ci pre-commit; I suggest we merge it and ask volunteers to enable it and report back
https://github.com/red-hat-data-services/ods-ci/pull/1341
[jgarciao] Looking at the [rh-pre-commit Readme](https://gitlab.corp.redhat.com/infosec-public/developer-workbench/tools/-/blob/main/rh-pre-commit/README.md) it is possible to use rh-multi-pre-commit to install multiple pre-commit hooks in their projects (I haven’t tested it)

#1351
#1337

Todo

Remove notes about black, isort, explain ruff
Explain how to setup vs code project with pylance and robot support (we have private docs for this)
Instructions for pre-commit

Dockerfile and need for oc / kubeconfig

Hello,

I was trying to use ODS-CI, yesterday, and I noticed some changes. (I had last used it a few months ago).
I managed to get it working again, but before I submit PRs, I wanted to check a few things.

Dockerfile
It seems that the Dockerfile has not been udpated in a while. That's what I prefer to use, personally.
If I get that working again, is it ok to send a PR to get that fixed?
oc and OpenShiftCLI
To get the container version of ods-ci to work I had to update the Dockerfile. I had to:

install the oc cli in the container
add the ./libs folder into the container
provide a valid $HOME/.kube/config
That last one has given me the most trouble, and is also the one I'm the least sure about.
Since User and Pass are provided as variables, I would have thought that would be used to authenticate.
However, without a valid config file, it keeps failing with:

[ ERROR ] Error in file '/tmp/ods-ci/tests/Resources/Page/OCPDashboard/InstalledOperators/InstalledOperators.robot' 
on line 3: Initializing library 'OpenShiftCLI' with no arguments failed: 
ConfigException: Invalid kube-config file. No configuration found.

Is there a way to not require this file?

Thanks!

Pod crashes while running ods-ci tests for version v2.11.0

Description of problem:

Pod goes to Error status while running the ods-ci tests following this Readme - https://github.com/red-hat-data-services/ods-ci/blob/master/ods_ci/docs/ODS-CI-IMAGE-README.md#running-the-ods-ci-container-image-in-openshift and the following image

quay.io/modh/ods-ci:2.11.0

The issue is - the scripts and test folders are not copied in the expected path.

Prerequisites (if any, like setup, operators/versions):

We have all the pre requisites satisfied. as mentioned here - [https://github.com/red-hat-data-services/ods-ci/blob/master/ods_ci/docs/ODS-CI-IMAGE-README.md#running-the-ods-ci-container-image-in-openshift]

Steps to Reproduce

For both pre requisites and steps to deploy - this doc was being followed, rather than building a custom image - I used quay.io/modh/ods-ci:v2.11.0

Actual results:

RUN SCRIPT ARGS: --test-variables-file /tmp/ods-ci-test-variables/test-variables.yml --skip-oclogin true --set-urls-variables true --include Smoke
ROBOT EXTRA ARGS: -L DEBUG --dryrun
SET TEST ENVIRONMENT: 0
-- USE OCM to install IDPs: 1
-----| ODS-CI is starting the tests run...|-----
./ods_ci/build/run.sh: line 28: ./run_robot_test.sh: No such file or directory

Also. looks like the tests folder isn't copied properly either

skipping OC login as per parameter --skip-oclogin
[ ERROR ] Parsing '/tmp/ods-ci/tests/Tests' failed: File or directory to execute does not exist.

Expected results:
The tests should run and show some results, this is a part of the output is from pod logs from the image that was custom built

Tests.Model Serving.Model Serving Modelmesh
==============================================================================
Verify Model Serving Installation :: Verifies that the core compon... | PASS |
------------------------------------------------------------------------------
Verify Openvino_IR Model Via UI :: Test the deployment of an openv... | PASS |
------------------------------------------------------------------------------
Test Inference Without Token Authentication :: Test the inference ... | PASS |
------------------------------------------------------------------------------
Tests.Model Serving.Model Serving Modelmesh                           | PASS |
3 tests, 3 passed, 0 failed

Reproducibility (Always/Intermittent/Only Once):
Always

Workaround:
Building a custom image with appropriate changes in Dockerfile.

Sanity test failed with "Element with locator 'id:KeyForm-' not found" error

Can Spawn Notebook | FAIL |
Element with locator 'id:KeyForm-' not found.

Can Launch Python3 | FAIL |
Text 'New' did not appear in 5 seconds.

Test | FAIL |
5 tests, 3 passed, 2 failed

Dockerfile installs Python 3.6 but uses 3.8

Tentative fix: #404

However, this fails later, failing to find robotframework-openshift==1.0.0:

Collecting robotframework-OpenShiftCLI==1.0.1
  Downloading robotframework-OpenShiftCLI-1.0.1.tar.gz (14 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
ERROR: Could not find a version that satisfies the requirement robotframework-openshift==1.0.0 (from ods-ci) (from versions: none)
ERROR: No matching distribution found for robotframework-openshift==1.0.0

I don't really understand the problem, this is latest available release.
Is that Dockerfile even used?

run_robot_test.sh is not compatible with macOS

Script won't run on macOS

Smoke tests screenshots are showing "Authorize access" page for requesting permission to ldap user.

Smoke tests screenshots are showing "Authorize access" page for requesting permission to ldap user. However, this is seen only for the very first time after ods deployment and identifier provider installation.