turboturtle / rig Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 6.0 310 KB

A lightweight, flexible, easy to use system monitoring and event handling utility

License: GNU General Public License v2.0

Python 100.00%

rig's People

Stargazers

Watchers

Forkers

hroncok juanmasg amtdas prijeshpatel kevin-c-myers lgtm-migrator

rig's Issues

[rig] trigger sub-command does not appear in --help output

Just noticed that rig trigger does not appear in rig --help output, as it should. It is properly covered in the man pages, so users can still be made aware of it, but inclusion in the help output would be best.

Should rig direct all file creation to a temp directory?

Need to decide if we should have any files created by actions moved to a rig-specific temp directory. I am initially thinking any files reported via an actions add_report_file() should trigger a move of those files to a temp directory created for the rig.

The other thought here, is should these files be collected into a tarball after execution is completed?

[gcore] Action should allow for freezing process before collecting a core dump

It would be beneficial if the gcore action were able to (optionally) freeze a process before collecting a core dump (sending SIGSTOP) and then thawing it (SIGCONT) after the coredump completes.

[rfe] Rig 'Running' jobs should be restored after system reboot.

Observation : If the system is rebooted, all the active rig triggers with status as 'Running' are lost.
Expectation : All the active rig triggers should be restored once system reboot is done.

[root@rh83-new ~]# rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
qibit  1600  system  system utilization             System loadavg above 6.0            Running   
zphfg  25777  system  system utilization                                                 Running   
fgbuj  25130  logs    messages, journals: system     validate                            Running   
olfpe  24928  logs    dnf.log, journals: system      yum update kernel                   Running   


[root@rh83-new ~]# rig info -i qibit
{
    "id": "qibit",
    "pid": "1600",
    "rig_type": "system",
    "status": "Running",
    "restart_max": 0,
    "restart_count": 0,
    "cmdline": "/usr/bin/rig system --loadavg 6 --kdump",
    "debug": false,
    "watch": "system utilization",
    "trigger": "System loadavg above 6.0",
    "created": "11/23/20 22:41:22",
    "actions": {
        "kdump": {
            "name": "kdump",
            "priority": 10000,
            "expected_result": "A vmcore saved in your configured crash dump location"
        }
    }
}

[root@rh83-new ~]# rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
[root@rh83-new ~]#

In above scenario, rig id(qibit) triggered kdump once system load avg was 6 and rebooted the system. vmcore is generated as per expectation successfully, but all other active rig triggers are lost. Similarly, whenever system #reboot is executed all the rig jobs are lost.

rig command return return "ModuleNotFoundError: No module named 'systemd'"

Following error observed while executing rig on rhel8.3 .
rig command return "ModuleNotFoundError: No module named 'systemd'" when installed with rig-1.0-2.el8.noarch.rpm

[root@localhost ~]# rig --help
Traceback (most recent call last):
File "/usr/bin/rig", line 67, in
supported_rigs = _load_supported_rigs()
File "/usr/bin/rig", line 59, in _load_supported_rigs
modules.extend(_import_modules(_mod))
File "/usr/bin/rig", line 26, in _import_modules
module = import(modname, globals(), locals(), [mod_short_name])
File "/usr/lib/python3.6/site-packages/rigging/rigs/logs.py", line 17, in
from systemd import journal
ModuleNotFoundError: No module named 'systemd'

[root@localhost ~]# uname -a
Linux localhost.localdomain 4.18.0-240.1.1.el8_3.x86_64 #1 SMP Fri Oct 16 13:36:46 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# rpm -qa rig
rig-1.0-2.el8.noarch
[root@localhost ~]# which rig
/usr/bin/rig
[root@localhost ~]# python3
python3 python3.6 python3.6m
[root@localhost ~]# which python3
/usr/bin/python3
[root@localhost ~]#

[rfe] Provide distribution packaging for rig

It would be a friendlier user experience to have rig installable through pip instead of having to git-clone.

[rig] cannot resolve the package dependency to python3-psutil

python3-psutil was not installed automatically as the dependency for rig package on RHEL8.

# rig --help
Traceback (most recent call last):
  File "/usr/bin/rig", line 67, in <module>
    supported_rigs = _load_supported_rigs()
  File "/usr/bin/rig", line 59, in _load_supported_rigs
    modules.extend(_import_modules(_mod))
  File "/usr/bin/rig", line 26, in _import_modules
    module = __import__(modname, globals(), locals(), [mod_short_name])
  File "/usr/lib/python3.6/site-packages/rigging/rigs/process.py", line 16, in <module>
    import psutil
ModuleNotFoundError: No module named 'psutil'

[RFE]: tcpdump action of rig should have option to set snap length (-s)

Currently, tcpdump action by rig capture the entire network frame(-s 0):

# ps -ef | grep tcpdump
root        1926       1  0 01:00 pts/0    00:00:00 /usr/sbin/tcpdump -Z root -s 0 -n -i ens3 -C 10 -W 1 -w /var/tmp/rig/mldrf/localhost.localdomain-22-11-2020-01:00:38-ens3.pcap
root        1936       1  0 01:00 ?        00:00:00 python3 rig logs -m tcpdump test --tcpdump --iface ens3
root        1993    1962  0 01:02 pts/1    00:00:00 grep --color=auto tcpdump

Usually we do not need entire frame to analyze most of the network issues, just frame header information is good enough.

So, option to modify snap length can help in :

Reducing the capture size
Tripping the data part of network frame(in case traffic is plain text) so that we can stop exposing data (from security perspective) when pcap is shared to some one for analyzation

Please see if it is possible to give a option to user to modify snap length (-s) for tcpdump

[process] Rig will terminate if one pid ends when watching all pids

User report from CEE SD testing:

Created the rig with two processes running, simulating some load. One of the processes finished and stopped before load went about the target.

However the rig stopped monitoring as soon as the first process died off.

Rig seems to working and this might be an RFE to the functionality but wanted to submit this just in case it is possible to change this behavior.

# rig process --all --process python3 --cpuperc 5 --foreground
Beginning watch of process 39455 for total cpu usage of 5.0% or higher
Beginning watch of process 39439 for total cpu usage of 5.0% or higher

Process 39455 is no longer running, stopping cpu percentage monitor.
No data generated to archive for this rig.

Destroy command keeps printing manually killed processes

$ sudo python3 ./rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
cfucr                 Cannot communicate with rig                                        Unknown   

<at this point I manually killed the rig above>

$ sudo python3 ./rig destroy -i all 
Could not destroy rig cfucr, rig is not running.

Error continues and is still listed in rig listing.

Can the tool detect when a rig is no longer running and then drop it from list?

Running actions for a given amount of time

Say that I'd like to collect information from the monitor action for 1 hour and then stop collecting information. Right now I don't see an easy way to do this.

First approach, I created a noop rig that does nothing, forever. Then I run rig in the foreground as follows:

rig noop --foreground --monitor

and It will run until I press Ctrl+C. Not so elegant but effective.
However the above will require attention from the user as it needs to be stopped manually. So I was thinking about other ways to achieve this.
Similar to the above, I could just run something like this on the terminal:

sleep 3600; rig trigger ID

but that would require more work as I need to get the ID somehow
The most obvious solution to me is a timer or clock rig. They could trigger after a given amount of time or on a specific date+time. Something like this:

rig timer --timer-timeout 1h --monitor

the rig will run for 1 hour and exit.
Another possibility would be to add --max-monitor-time option to the monitor rig, so it could just stop collecting information after a given amount of time. However, this would still require some kind of noop rig as there's currently no way to run an action without an associated rig (AFAIK).
Finally, I was thinking that it might be useful to add a global --trigger-timeout, available to all rigs, so a rig configured with a timeout will terminate when the timeout is reached.
This would still need some kind of noop rig

Thoughts?
Is there currently any other approach that I'm missing that would allow me to run the monitor action for a certain amount of time?

[rigs] Add a system utilization rig

The process rig can monitor process(es) for state changes and resource consumption. We should have one that monitors total system resource consumption as well.

We should be able to do this with the psutil module as we did with process.

We should be able to monitor:

cpu usage
specific cpu substats like %steal
total memory usage
disk usage/activity

What I'm not sure of as being advantageous but is possible to monitor (how easily remains to be seen):

load avg
runq size
temps/sensors

[rigs] Add a process monitoring rig

This is for tracking the creation of a process monitoring rig. This rig should be able to at minimum trigger when a provided process (PID or process name) stops running. Potentially, this should also support triggering based on process state (e.g. D state possibly).

Rig v2 - new design, renewed focus on applicability of project

It's been a long time since rig has received any attention or updates. That's completely on me as I've been tied up with other projects at work that have taken me away from here.

In the interim however, I've been scoping out a new design for rig that makes it easier to extend and maintain, while also being easier on end users and more in-line with other modern tools.

New design

As things currently stand, rig is designed around the concept of "a rig watches for one condition and then does one or many actions in response". While simple in concept, the underlying code for building "a rig" was...not the cleanest design. CLI commands were conflated with the handling of rig creation at a fundamental level, which leads to extensibility issues.

The new design changes this by instead making a rig "the" backgrounded process from which one or many "monitors" may be launched and when any of those monitors detect their specified conditions, trigger one or many actions in response." In other words, whereas before we would have "a logs rig" that watches for specific messages, we now have "a rig that monitors logs for a message, and possibly monitors other things as well".

By making this abstraction, we can also re-arrange a number of code flows that makes it easier to establish new commands/new functionality, without having as large of knock-on effects on the whole project.

Further, with so many rigs specifying rig-specific options, the CLI experience was frankly, painful. One rig may use the --interface option, while another used --iface and another may have needed to use --ifname to all reference the same physical piece of hardware.

v2 will resolve this by transitioning to using yaml-formatted configuration files, or "rigfiles". Most similar to ansible playbooks, these rigfiles will serve as the way to configure new rigs on a system. Rather than having a CLI subcommand for each rig type, there will simply be a main rig create command which will specify a rigfile to use, and then we will parse that rigfile to configure and deploy a rig as described.

By moving to this kind of deployment model, we simplify a number of aspects of the project:

rigs/monitors no longer need to handle the parser themselves. We can standardize how options are defined and validated much easier
we no longer need to fumble with "enabling opts" to enable a rig, or munging around with various parser descriptions. By leveraging a yaml configuration, we can source the pertinent bits we need and compare that with what is supported by rig at that point in time.
monitors and actions will now be easier to create

An example of a rigfile that would have previously been deployed by a CLI command of rig logs --interval 5 --message 'this is my test message' --logfile='/var/log/messages','/var/log/foo/bar.log' --journals='foo' --count=5 --sosreport --only-plugins kernel,logs --initial-sos would be:

name: myrig
interval: 5
monitors:
  logs:
    message: this is my test message
    files:
      - /var/log/messages
      - /var/log/foo/bar.log
    journals: foo
    count: 5
actions:
  sos:
    report:
      initial_sos: true
      only_plugins:
        - kernel
        - logs

Which is far more grokable, and more reusable.

I am beginning these changes on the rig-v2 branch and will be working my way through transitioning the various monitors, actions, and commands to this new design. Once done, I'll flip the changes over to master (or rather, it will be main at that point most likely).

Comments, feedback, and suggestions are surely welcome.

[process] State value should accept single-character shorthand for state

The process rig requires the full status string to be given to the state option when monitoring for process status. E.G. users must specify --state disk-sleep.

Realistically, sysadmins and support representative typically use the character shorthand for these statuses, e.g. "D-state". The process rig should therefore accept these shorthand state strings.

rig trigger command 'yum update' fails to generate sosreport.

Scenario: Deploy a logs rig that watches for a specific, exact message, then trigger the rig.
Expected: Sosreport should be generated whenever kernel is updated using command 'yum update kernel'.

Steps
[1] Created rig trigger for command 'yum update kernel' and watch for a specific messages 'yum update kernel' in /var/log/dnf.log.
[2] Completed kernel upgrade with command #yum update kernel
[3] /var/log/dnf.log shows message entry of 2020-11-11T12:49:19Z DDEBUG Command: yum update kernel
[4] sosreport didnot generated at its default location /var/tmp or /var/tmp/rig/sxavr/.

[root@rh83-new rig]# rig logs --logfile /var/log/dnf.log --message "yum update kernel" --sosreport
sxavr
[root@rh83-new rig]# rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
sxavr  38565  logs    dnf.log, journals: system      yum update kernel                   Running   
ozqmh  37366  logs    dnf.log, journals: system      yum update                          Running   
cbfzw  37348  logs    dnf.log, journals: system      yum update                          Running   
tuvfo  31035  logs    dnf.log, journals: system      yum install                         Running   
[root@rh83-new rig]# 

[root@rh83-new tmp]# rig info -i sxavr
{
    "id": "sxavr",
    "pid": "38565",
    "rig_type": "logs",
    "status": "Running",
    "restart_max": 0,
    "restart_count": 0,
    "cmdline": "/usr/bin/rig logs --logfile /var/log/dnf.log --message yum update kernel --sosreport",
    "debug": false,
    "watch": "dnf.log, journals: system",
    "trigger": "yum update kernel",
    "created": "11/11/20 22:48:23",
    "actions": {
        "sosreport": {
            "name": "sosreport",
            "priority": 100,
            "expected_result": "An sosreport from the host in /var/tmp/rig/sxavr/"
        }
    }
}

[root@rh83-new rig]# yum update kernel
<../lines removed../>
Is this ok [y/N]: y
Downloading Packages:
(1/3): kernel-4.18.0-240.1.1.el8_3.x86_64.rpm                                   1.2 MB/s | 4.3 MB     00:03    
(2/3): kernel-modules-4.18.0-240.1.1.el8_3.x86_64.rpm                           2.6 MB/s |  26 MB     00:09    
(3/3): kernel-core-4.18.0-240.1.1.el8_3.x86_64.rpm                              2.7 MB/s |  30 MB     00:11    
-----------------------------------------------------------------------------------------------------------
<../lines removed../>
Installed:
  kernel-4.18.0-240.1.1.el8_3.x86_64   kernel-core-4.18.0-240.1.1.el8_3.x86_64   kernel-modules-4.18.0-240.1.1.el8_3.x86_64                              

#tail -f /var/log/dnf.log
2020-11-11T12:49:17Z INFO --- logging initialized ---
2020-11-11T12:49:17Z DDEBUG timer: config: 2 ms
2020-11-11T12:49:17Z DEBUG Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, needs-restarting, playground, product-id, repoclosure, repodiff, repograph, repomanage, reposync, subscription-manager, uploadprofile
2020-11-11T12:49:17Z INFO Updating Subscription Management repositories.
2020-11-11T12:49:19Z DEBUG YUM version: 4.2.23
2020-11-11T12:49:19Z DDEBUG Command: yum update kernel 
2020-11-11T12:49:19Z DDEBUG Installroot: /
2020-11-11T12:49:19Z DDEBUG Releasever: 8
2020-11-11T12:49:19Z DEBUG cachedir: /var/cache/dnf
2020-11-11T12:49:19Z DDEBUG Base command: update

[root@rh83-new rig]# ls -lrt  /var/tmp/rig/sxavr/
total 0
[root@rh83-new rig]#

[actions] Add tcpdump action

Placeholder for tracking creation of a tcpdump action. This action should start a tcpdump on rig creation and terminate the capture when the rig is triggered.

[rfe] Allow rigs to auto restart their own monitoring

RFE from an email - rigs should be able to restart monitoring after data collection is completed.

Generally I agree with this, and it shouldn't be too hard to implement. The only question is does the existing rig simply re-enter the monitoring function, or should it launch and entirely new rig? I'm leaning towards the former, but I have concerns about temp location availability. To be honest I haven't looked into that in the slightest yet to even determine if that's a valid concern or not.

action should be generic

Rather than hardwiring a set of actions (sosreport, tcpdump, etc.) just give a
cmdline to run, as an arg to rig. Then we can put whatever in a script.

[rig] empty temporary directory was not deleted after rig destroy

The temporary directory was not deleted after rig destroy. It should be deleted by the destroy subcommand.
Otherwise, the unused empty directories will be created in it for each rig destroy command,
and it's required to delete it manually.

# rig destroy -i jeygb
jeygb destroyed
# ls /var/tmp/rig
jeygb
# ls jeygb
#

NameError: name 'CannotConfigureRigError' is not defined

Receiving exception while trying to list current rigs:

$ git rev-parse HEAD
baedc300174197dddee80e421e5b0aa343e4f45a
$ sudo python3 ./rig list
Traceback (most recent call last):
  File "./rig", line 56, in <module>
    ret = rig.execute()
  File "/home/rbost/code/rig/rigging/__init__.py", line 112, in execute
    self.list_rigs()
  File "/home/rbost/code/rig/rigging/__init__.py", line 178, in list_rigs
    socks = os.listdir('/var/run/rig')
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/rig'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./rig", line 60, in <module>
    except CannotConfigureRigError:
NameError: name 'CannotConfigureRigError' is not defined

[rfe] Reoccurring log event actions

log monitors can be clubbed with a counter and a delay to take the actions for a defined number of times with a minimum delay specified.

This would extend the feature and would cover actions to be taken for a set of times and between some time intervals.

Example: # rig logs --reoccur 5 --delay=5 --message "This is my test message" --gcore my_failing_service

rig is triggered when free memory is above threshold

Observation : Scheduled rig shows that it will trigger above threshold value free above 400.
Expectation : Rig should be triggered once free memory is at or below threshold.

Sample: 1

[root@rh83-new tmp]# free -m
              total        used        free      shared  buff/cache   available
Mem:            808         247         445          41         115         422
Swap:             0           0           0

[root@rh83-new tmp]# rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
ttrdd  8303   system  system utilization             free above 300                      Running   
biuse  8295   system  system utilization             free above 400                      Running   
fogau  6857   system  system utilization             used above 512                      Running   

[root@rh83-new tmp]# head /proc/meminfo 
MemTotal:         827740 kB
MemFree:          456572 kB          <-
MemAvailable:     432516 kB
Buffers:               0 kB
Cached:            92660 kB
SwapCached:            0 kB
Active:           135824 kB
Inactive:         112484 kB
Active(anon):     117236 kB
Inactive(anon):    80424

Sample: 2

[root@rh83-new tmp]# free -m
              total        used        free      shared  buff/cache   available
Mem:            808         225         452          41         130         436
Swap:             0           0           0

[root@rh83-new tmp]# rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
yplwz  9286   system  system utilization             free above 25                       Running   
whszc  9278   system  system utilization             free above 75                       Running   
txoze  9270   system  system utilization             free above 100                      Running   

[root@rh83-new tmp]# rig info --id txoze
{
    "id": "txoze",
    "pid": "9270",
    "rig_type": "system",
    "status": "Running",
    "restart_max": 0,
    "restart_count": 0,
    "cmdline": "/usr/bin/rig system --free 100 --noop",
    "debug": false,
    "watch": "system utilization",
    "trigger": "free above 100",
    "created": "12/04/20 18:14:50",
    "actions": {
        "noop": {
            "name": "noop",
            "priority": 10,
            "expected_result": "This action will generate no content"
        }
    }
}

[root@rh83-new tmp]# top
top - 18:22:04 up  1:32,  3 users,  load average: 4.27, 2.30, 1.41
Tasks: 138 total,   6 running, 118 sleeping,  14 stopped,   0 zombie
%Cpu(s):  8.0 us, 74.8 sy,  0.0 ni, 10.7 id,  2.8 wa,  2.7 hi,  0.5 si,  0.5 st
MiB Mem :    808.3 total,     41.1 free,    696.6 used,     70.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.      6.8 avail Mem 

[root@rh83-new tmp]# rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
yplwz  9286   system  system utilization             free above 25                       Running

After running stress tool all the rig got triggered when free memory is at or lower than threshold. This is as per expectation. But rig list Trigger column should mention as free below <value> . Mentioning free above is not correct. Please let me know your opinion. Thanks.

"rig list --id tuvfo" doesnot show individual rig id

Using optional list argument(-i ID, --id ID) for listing the individual rig doesnot work. However, it works fine with 'rig destroy' .

optional arguments:
-h, --help show this help message and exit
-i ID, --id ID rig id for list or destroy

[root@rh83-new tmp]# rig logs --logfile /var/log/dnf.log --message "yum update" --sosreport
[root@rh83-new tmp]# rig logs --logfile /var/log/dnf.log --message "yum install" --sosreport

[root@rh83-new tmp]# rig list
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
tuvfo  31035  logs    dnf.log, journals: system      yum install                         Running   
vlkot  31021  logs    dnf.log, journals: system      yum update                          Running   
  
[root@rh83-new tmp]# rig list -i tuvfo
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
tuvfo  31035  logs    dnf.log, journals: system      yum install                         Running   
vlkot  31021  logs    dnf.log, journals: system      yum update                          Running   

[root@rh83-new tmp]# rig list --id tuvfo
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
tuvfo  31035  logs    dnf.log, journals: system      yum install                         Running   
vlkot  31021  logs    dnf.log, journals: system      yum update                          Running   

[root@rh83-new tmp]# rig list --id vlkot
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
tuvfo  31035  logs    dnf.log, journals: system      yum install                         Running   
vlkot  31021  logs    dnf.log, journals: system      yum update                          Running   


[root@rh83-new tmp]# rig destroy --id vlkot
vlkot destroyed
[root@rh83-new tmp]# rig list --id vlkot
ID     PID    Type    Watching                       Trigger                             Status    
====================================================================================================
tuvfo  31035  logs    dnf.log, journals: system      yum install                         Running   
[root@rh83-new tmp]#

[actions] Add a gcore action

Placeholder for tracking creation of an action that will use gcore to generate coredumps of a PID or list of PIDs.

rig name conflict with existing command

Hi. I wanted to let you know about a name conflict with your command. I have nothing to do with it, but there is already a program called 'rig' that is the Random Identity Generator and has existed since 1999. In the Debian repo it is also called 'rig'. This already created some confusion when talking about the 'rig' command. I'm not suggesting a name change because it wouldn't be the first time there has been conflicting command names. However, since your project is much newer, I thought you might be open to the idea. Thanks.

#RFE - Can we execute certain set of commands when an event occurs rather than generating the sosreport

When analyzing an issue we sometimes require output of certain commands which are not included in sosreport, can we take this as a RFE?
If there is some log (event) then the reaction should be the execution of a certain set of commands as suggested by the support engineer.

rig is not triggering the tcpdump action

When we try to use tcpdump with rig we get Invalid file size issue

# rig logs -m "Ncat: Connection refused." --logfile /tmp/nc_test.txt --tcpdump --iface ens3 --captures 2
Rig setup failed: tcpdump: invalid file size False

If we apply size option to tcpdump it fails with "Unknown option --size specified." even though --size is valid option

# rig logs -m "Ncat: Connection refused." --logfile /tmp/nc_test.txt --tcpdump --iface ens3 --size 20 --captures 2
Unknown option --size specified.

\# man rig
...
       tcpdump
              Start  collecting  a  tcpdump  when the rig is initialized, and stop the collection when the rig triggers. This action
              will be triggered before most other actions, but after the gcore action.

              Note there will be a slight delay in configuring any rig that uses the tcpdump action as rig must verify that the tcp‐
              dump process started successfully during the initialization process.

              The tcpdump action supports the following options:

              --tcpdump
                     Enables this action

              --iface INTERFACE
                     Starts  the  tcpdump to monitor the provided INTERFACE. In almost all situations this should likely be set to a
                     specific interface on the system, however the value of 'any' is accepted by the tcpdump  command  in  order  to
                     listen on all interfaces. Be wary of using this however as use of 'any' means will make it impossible to deter‐
                     mine which interface a particular packet came in on in the resulting packet capture.

                     Default: eth0
              --filter FILTER
                     Provide a filter to use with tcpdump in order to reduce the amount of traffic recorded in the  packet  capture.
                     This value is passed directly to the tcpdump utility, and thus can be any valid filter accepted by tcpdump.

                     For most shells you must quote the filter string for rig to pass it correctly.

              --size SIZE
                     Limit the size of the packet capture file(s) to SIZE in MB.

                     Default: 10

              --captures CAPTURES
                     Specify  the  number of packet capture files to keep. If more than one (1), then tcpdump will rotate the packet
                     capture file when it reaches the --size value and keep CAPTURES number of files.

                     E.G. Using a CAPTURES of 2 and a SIZE of 5, then when the rig terminates you will have up to 2 5MB packet  
                       captures.

                     Default: 1 (packet capture file is replaced upon reaching SIZE limit).

rig info do not shows debug to true even if --debug is used

Creating rig with --debug

# rig logs -m "Debug option check" --debug --noop
cgzsc

Checking the rig info we do not see debug to true:

# rig info -i cgzsc
{
    "id": "cgzsc",
    "pid": "6827",
    "rig_type": "logs",
    "status": "Running",
    "restart_max": 0,
    "restart_count": 0,
    "cmdline": "/usr/bin/rig logs -m Debug option check --debug --noop",
    "debug": false,     <<<--- shows false even though debug option added
    "watch": "messages, journals: system",
    "trigger": "Debug option check",
    "created": "11/21/20 10:24:52",
    "actions": {
        "noop": {
            "name": "noop",
            "priority": 10,
            "expected_result": "This action will generate no content"
        }
    }
}

[rigs] Add a network rig

Placeholder to create a rig that monitors network activity. Initial idea is that users will provide a tshark packet filter string and once we get a single packet matching the filter, the rig will trigger.

rig GA version ?

Would like to know

when will GA available for rig?
Would it will be shipped as rpm in rhel ?

[rfe] Notify a user when a rig is triggered

Right now the only indication a user has if a rig has been triggered is checking rig list and/or /var/log/rig/rig.log. This isn't exactly a great experience and I would like to see us providing an active notification.

My first thought is to optionally send an email. I haven't explored this too deeply though as I'm unsure of how complex we would have to make this functionality.

Need to add weighting to actions to make order of execution deterministic

As we build out actions, it is entirely expected that rig will be used to chain several of them together. As such, we need a way to be able to deterministically resolve an execution order.

[rig] Collected data should be tarred up at the end of execution

After all data is collected into /var/tmp/rig/$rig_id, it should be tarred up and placed in /var/tmp for easy reference. The temp dir should then be removed when rig exits.

[actions] Need a way to identify if a system can actually perform an action

Most of the action's today (or for the forseeable future) rely on external tools being called - sosreport, gcore, tcpdump, etc - but we do not make these tools install-time requirements, as that would quickly spiral out of control and install packages that users may well never need or use, even with rig.

So, actions need a way to specify what executables are required for it to function properly, and if they are not found on the system, fail immediately rather than having to wait for the rig to be triggered for an error to be logged.

[rig] --foreground doesn't work

--foreground option doesn't work. It's started in background even with the option.

# rig logs --foreground --logfile /var/log/messages -m "dumping master file: rename: example.com.zone: permission denied" --sosreport --gcore 15103 
jeygb
#