elabit / robotmk Goto Github PK

View Code? Open in Web Editor NEW

52.0 3.0 9.0 27.13 MB

Robotmk - the Robot Framework integration for Checkmk

Home Page: https://robotmk.org

License: GNU General Public License v2.0

Python 0.95% Shell 1.44% RobotFramework 7.82% PowerShell 0.09% Rust 89.69%

checkmk robot robot-framework nagios monitoring testing e2e-testing hacktoberfest end2end-tests robot-tests

robotmk's People

Contributors

Stargazers

Watchers

Forkers

guillaume-durville mike1098 ypid-geberit a-lohmann karasheed88 robwalt solojacobs skandashield shortfinga

robotmk's Issues

Check Output: Replace Unicode symbols by cmk HTML status placeholders

(!!) (crit) and (!) (warn)

Fill unicode symbol also when output_depth hides the real cause

output_depth can hide some elements of the output im checkMK which is not really needed.
The unicode symbol in front of the error cause keyword thus can not be seen. In this case try to display it on the lowest visible layer.

Distribute Robot checks with CheckMK Agent bakery

Add option to discovery level: display Suite total time

It should be possible to see the total suite runtime even if the discovery suite level has changed to a deeper level.

enable Piggyback data

It should be possible to assign robot results to other hosts as piggyback data.

Inventory: test default discovery level 0

No graphs when using a discovery level

Check in bakery script if paths are OS compliant

There is no possibility in the bakery rule to determine which OS the target system will be.
For this reason, we are forced to check the path fields with a regex matching both windows and linux paths.

Adopt the bakery script so that it produces a meaningful error when the target OS during build time is e.g. windows but the path does not match windows regex. And vice versa.

MK anomaly detection / predictive monitoring

See https://github.com/tribe29/checkmk/tree/master/doc/predictive

Add logging feature to check

Thresholds of 0 should disable completely

Even when WARNING is set to 0, in case of a failure, all runtime exceedances are mentioned as WARN

Check fails rule enables only perfdata (not threshold)

  File "/omd/sites/cmk/local/share/check_mk/checks/robotmk", line 179, in nagios_result_recursive
    if this_runtime_threshold:
UnboundLocalError: local variable 'this_runtime_threshold' referenced before assignment

Service Discovery: generate services from suites & tests

If nested suite folders are used, there is no reliable way to generate services from tests because the suite depth down to the test can vary for each subsuite.
=> disallow service discovery from keywords (makes no sense)
=> generate from tests
=> generate from suites provide level)

Wrong formatting of variable argument

Wrong:

suites:
  sum_prices:
    name: Buchpreis Summe
    variables:
    - BOOKPRICE_SOLLWERT:500

Correct:

suites:
  sum_prices:
    name: Buchpreis Summe
    variable:
      BOOKPRICE_SOLLWERT: 500

Introduce CRITICAL state for thresholds

Currently only WARNING is supported for thresholds of suites/tests/keywords.
Also introduce CRITICAL.

(Who ever uses this: bear in mind to use this with caution. When you generate SLA reports in CheckMK, you'll throw together the Critical states of application outages with Critical states generated by too long runtimes (even if the application worked in the end).

complete/finish runtime threshold evaluation

On Windows, execute Robot on the desktop of the logged in user

The checkmk agent runs by default with SYSTEM user under session ID 0. This session ID is reserved for all processes which are started as services. Processes started/forked from this session ID 0 have by definition NO access to the desktop (security reasons).

This means that robot tests will fail when they import/use libraries which need to scrape the screen content, like SikuliX and ImageHorizonLibrary. (OSError: screen grab failed - pyautogui module)

Even when the agent runs with a normal user instead of SYSTEM, it runs still as a service => ID 0 => no UI access.

Solution: the plugin should make use of win32process.CreateProcessAsUser to start the robot tests as a user process so that robot runs on session ID 1.

This is not needed for e.g. Selenium based tests as browsers like Chrome or Firefox can run fine without a desktop environment. For that reason, it should be configurable in WATO if there is a need to run robot with another user than SYSTEM.

User context switch should only be done if WATO setting is set. By default, robot should still be executed via the robot API so that it is possible to do deep debugging of the Plugin and Robot Framework as well.

CreateDesktop() could be a nice way to isolate the AUT from the logged in user. Check out if pyautogui is still able to do Keyboard/Mouse events on a non-interactive Desktop.
Use PrivilegeCheck method to ensure that the CMK agent is running under LocalSystem account (Update: PrivilegeCheck is not implemented in win32security)

References:

Add spooldir mode

As an alternative to the async execution of the robotmk plugin, the plugin should be able to write its result directly into the agent's spooldir (instead to STDOUT).
This is important for tests which use screen capturing libraries like Sikuli, ImageHorizon etc.

discovery_suite_level cannot be defined per root suite

discovery_suite_level is currently bound to one host and cannot be specified by the top level suite

Result remapping

It should be possible to remap certain results to another state:

Suite/Test/Keyword regex pattern / Tags
PASSED/FAILED/whatever => OK/WARN/CRITICAL

Output of subprocesses spoils agentplugin data

While the CMK agent gets fed with the STDOUT of the robotmk plugin, we have to make sure that STDOUT only contains the CMK section header and the XML data of Robot.

The problem here is that we rely on the fact that robot test does not produce any output.
For Robot itself this is controlled by the argument "console:none" which mutes Robot's output in general.
But there is still no control over the tools which are called by Robot libraries. The chromedriver.exe for example produces this line on STDOUT when starting up chrome:

DevTools listening on ws://127.0.0.1:52843/devtools/browser/db51a7ca-b606-4466-a20f-54b55a7c0738

This is then the first plugin output line (wrong), followed by the section header and XML dat (correct).

It is no solution to tinker with the 3rd party tool options to mute them. This would break with the aim to be fully compatible with every robot test without changing anything in the test code. (Apart from that, we have to consider that there are libs which do not have such an option and will always be noisy).

Furthermore, it does not seem to be so easy to redirect stdout and stderr as this example shows:

robot --console none sum_prices_headless.robot 2>&1 > NUL
DevTools listening on ws://127.0.0.1:52988/devtools/browser/96f16f4a-7e39-4226-898c-157344979ec4

Although both FDs are redirected to NUL, chrome still shows log lines.

We need a reliable solution which guarantees that there is no output at all when calling robot; independently from tools and libraries.

WATO option to create perfdata for sublayer

Currently perfdata are only created for elements which match the pattern.
Add a check box which also includes the perfdata of the sublayer.

Docker based development environment

Rewrite the Travis test setup to test everything in the CEE Docker container

White/Blacklisting of libraries

The output of RobotMK can get huge as more different libraries you are using. Things get problematic if keywords from other libraries itself are nested and produce a lot of noisy recursion.

It should be possible to control which libraries should be taken into account / or be blacklisted then building the result structure by defining regular expression.

Predective Levels for Thresholds (Machine learning)

Please implement "Predective Levels" for the runtime thresholds in order to have automatic baselining. See https://github.com/tribe29/checkmk/tree/master/doc/predictive for help.

MK Grapher: stacked Performance graphs

RobotMK Agent Plugin

This wrapper

executes the robot test/suite using the given mode in (--path) with -o result-dir/result-file
reads the XML result file from result-dir/result-file
adds the checkMK section header, adds an optional piggyback header to assign the result to another host
saves the XML into cmk-spooldir/result-file
deletes result-dir/result-file

Intentionally there will be no conversion to another format (JSON etc.) to be on server side still compatible with other tools which take the XML as the input source. (See also #10 )

The result filename will be unique by adding --timestampoutputs to the robot args.

Arguments:

--path: Path to the robot suite dir/test file
--pull: do a git pull before calling the test
--piggybackhost: Hostname the result should be assigned on
--result-dir (opt): temporary dir where robot will save the XML results
--result-filename (opt): name of the result file in result-dir
--cmk-spooldir: CMK agent spool directory
--mode: how to start the robot test
- local
- docker: start a local robot container in Docker/Swarm (spool dir must be mounted)
- k8s: start a robot container in k8s/OpenShift

TODO:

auto-determine spool dir (WIN/LINUX)

Add option to overwrite "Robot" prefix to discovered services

Currently, all Robot services are prefixed with "Robot".
It should be possible to override this. Provide a field in the WATO discovery rule.

Generate performance data based on level

This can be used as an alternative to the regex pattern approach to get perfdata.
If the robot test should generate perfdata of every element e.g. on level 2, define:

perfdata_level=2

Add WATO option: include run timestamp into first line

robotmk.yml newlines on Linux client are wrong

Distribute test repository with bakery

The bakery also should bake the test repository into the agent.

Add global setting to WATO:

[  ] Clone test repository with GIT from this location:
      [__________________________]

Requires git on the monitoring host and keybased passwordless access to the git server.
Bakery clones the repo into /usr/lib/check_mk_agent/robotmk.

Checkbox to generate perfdata for root suite

By default you never get perfdata. To ease the generation of perfdata for the root suite, let's add a checkbox (enabled by default).

Bakery crashes when RobotMK rule does not contain specific suites

By default, suites can be omitted in the bakery rule; in this case, the plugin should simply execute all robot tests in the robotdir.
This does not work properly:

Traceback (most recent call last):
  File "/omd/sites/xxx/lib/python/cmk_base/cee/agent_bakery.py", line 364, in execute_bakery_plugin
    bake_func(*func_args)
  File "/omd/sites/xxx/local/share/check_mk/agents/bakery/robotmk", line 28, in bake_robotmk
    robot_test_suites = robot_global_conf.pop('suites')
KeyError: 'suites'

As a temporary solution, specify the suites which should be executed.

RobotMK Plugin "juggler" mode

To get a first version as quick as possible, let's concentrate on the asynchronous execution of the robotmk plugin which then executes all suites in serial - each time it gets triggered. For this, a proper result caching time must be determined carefully. Chances are (and will, Mike :-) ) that tests run too long and test executions pass each other. This is hard to debug and error prone.

Because of this fact, in a second step, this should be handled more intelligent. The plugin should be called on each agent execution (every minute) in "juggler" mode. The YML file provides the execution interval per suite. The plugin is responsible then to remember the last execution time of test X and if > interval, it should schedule a detached process of the plugin, which then acts in "wrapper" mode. In this case, the test output does not get to STDOUT but written to the agent spool dir.
In juggler mode, the plugin also keeps track of the current number of running e2e tests and returns a proper error message if there is risk of overloading (very bad because tests running longer will produce WRONG perfdata and cause false alarms!). Further, it could respect the number of CPUs and free Memory to ensure that e2e test are always running under proper conditions.

Keyword output depth > 0 ignores setting

A depth of 0 works, but setting it to a higher level outputs the complete subtree

Propagate FAIL reason of suite/test/keyword to top

and show FAILed elements with filled unicode symbol

Add WATO option to allow generation of log/report HTML file

Enabling log file also enables the creation of Screenshots, see

https://robotframework.org/SeleniumLibrary/SeleniumLibrary.html#Run-on-failure functionality

and

https://robotframework.org/SeleniumLibrary/SeleniumLibrary.html#Capture Page Screenshot

Suite prefix does not remove "RobotMK" prefix

include Robot log.html into check_MK

HTML already gets transferred to Checkmk with commit #59

This issue covers the tasks on the CMK side.

Check:

extract HTML log
save within folder shared by site apache

Create an action menu for Robotmk checks which show two entries:

"Show last log" -> links to the most recent log file
"Show last non-OK log" -> links to the most recent log file of a non-OK state.

There won't be a long-term storage of HTML logs on the CMK side because the monitoring system should not be burdened with too much other tasks, including housekeeping. For more detailed analysis of e2e errors the logs on the client have to be inspected. See issue #91 (Configuring log rotation by state)

Integrate Robotmk logo into MKP

Support global options for suites

Robot options should be given also on a global basis so that e.g. variables are valid for all suites.
This is currently not possible because the bakery rule does not support this.
(This is a nice to have feature, let's implement this later)

Excerpt from #31:

#robot options whith a list of multiple values needs a special handling to
#add global options to suite options.
LISTOPTIONS = ['settag',
               'variable',
               'critical',
               'noncritical',
               'test',
               'task',
               'suite',
               'include',
               'exclude',
               'tagstatinclude',
               'tagstatexclude',
               'tagstatcombine',
               'tagdoc',
               'tagstatlink',
               'removekeywords',
               'flattenkeywords',
               'listener',
               'prerunmodifier',
               'prerebotmodifier',
               'pythonpath']
...
...
   #Merge the suite options with globally set options.
   for key in cfg:
      if key == "suites": continue
      # If the global option is a list we extend the suite list
      # If the suite option is a string and global option a list we
      # make a the suite option a list and extend it with global list.
      # If the global option is a string and suite option a list we append it
      # If both is a string we build a list.
      if key in LISTOPTIONS and type(cfg[key]) == list:
         if key not in options or options[key] == None:
            options[key] = []
         if type(options[key]) == list:
            options[key].extend(cfg[key])
         else:
            options[key] = [options[key]]
            options[key].extend(cfg[key])
      if key in LISTOPTIONS and key in options and type(cfg[key]) == str:
         if type(options[key]) == list:
            options[key].append(cfg[key])
         elif type(options[key]) == str:
            options[key] = [options[key]]
            options[key].append(cfg[key])
      options.setdefault(key, cfg[key])

WARNING: Exception while parsing agent section 'robotmk': ParseError(ExpatError('junk after document element: line 1, column 3506',),)

Solution in parse: split xmlstring at <?xml version