aleex42 / netapp-cdot-nagios Goto Github PK
View Code? Open in Web Editor NEWNagios-Checks for monitoring NetApp cDOT-Systems via NetApp Perl API
License: GNU General Public License v3.0
Nagios-Checks for monitoring NetApp cDOT-Systems via NetApp Perl API
License: GNU General Public License v3.0
Command and output "anonymized". ONTAP 8.3.2P6, snapmirror status quiesced
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
Use of uninitialized value $current_transfer in string ne at ./check_cdot_snapmirror.pl line 73.
CRITICAL: 9 snapmirror(s) failed - 11 snapmirror(s) ok
Name Healthy Delay
SVM_iSCSI_XXX_mirror false 184707s
SVM_iSCSI_XXX_mirror false 184707s
SVM_iSCSI_XXX_mirror false 184707s
SVM_CIFS_XXX_mirror false 184707s
SVM_iSCSI_XXX_mirror false 184707s
SVM_iSCSI_XXX_mirror false 184707s
SVM_iSCSI_XXX_mirror false 184707s
SVM_NFS_XXX_mirror false 184707s
SVM_iSCSI_XXX_mirror false 184707s
Hi,
I`d like to ask what's the reason that the performance data output contains the script name, surrounded by "::"?
Example:
Vol_vSphere/vol_vSphere_nfs_01::check_cdot_volume_usage::space_used
Vol_block/block_ROOT::check_cdot_volume_usage::space_used
This makes creating graphs from the performance data a bit uncomfortable.
Graphite seems to escape this nicely:
But pnp4nagios has its problems. Though I didn't dive into it, to know why there are no pnp graphs for this check.
Best regards and thanks for the great plugins!
Hi,
i have seen that your Wiki is down.
Is that Intentional?
Getting this:
./check_cdot_efficiency.pl --help
Got a 0-length file from ./check_cdot_efficiency.pl via Pod::Perldoc::ToTerm!?
at /usr/bin/perldoc line 13.
Maybe others like this too and maybe I'm overlooking some issues that this could cause.
My feature request was:
Add rebuild percentage to check_cdot_rebuild to be able to track progress.
I did this quick & possibly dirty hack in patch file format:
--- check_cdot_rebuild.pl 2022-02-28 11:17:27.142714000 +0100
+++ check_cdot_rebuild_percent.pl 2022-02-28 13:09:22.043997000 +0100
@@ -78,10 +78,12 @@
foreach my $rg (@rgs) {
my $rg_reconstruct = $rg->child_get_string( "is-reconstructing" );
my $rg_reconstruct_percent = $rg->child_get_string( "reconstruction-percentage" );
if ($rg_reconstruct eq "true") {
unless (grep(/$aggr_name/, @failed_aggrs)) {
push( @failed_aggrs, $aggr_name );
push( @failed_aggrs, $aggr_name.":".$rg_reconstruct_percent );
}
}
}
Hello,
regarding check_cdot_aggr.pl: I have fixed output of $perfmsg and added regex as console parameter for this check (copied feature from check_cdot_volume.pl) etc..
Please can you give me repo push access, so that I can create a new branch and make a pull request or simply insert following file:
MODIFIED_check_cdot_aggr.zip
script(s) didn't work out of the box on an Ubuntu 14.04 machine.
Did work when i ran the check_cdot_aggregates.pl in bash but didn't when nagios ran it with its epn
I don't know hot to fix the code or if this is based on code coming from the NetApp manageability-sdk.
What helped was adding a line "# nagios: -epn" to the top of the script.
I didn't use github so far. Should I patch the scripts with this line and create a pull request?
Used NetApp SDK / API 5.4
Hi,
thank you for your scripts! We run it on ONTAP 9.6 with netapp-manageability-sdk-9.4 and on master branch (2ac4070) we get following error:
# check_cdot_multipath.pl --hostname 123.123.123.123 -u nagios -p **********
Can't call method "child_get" on an undefined value at /usr/local/bin/nagios/netapp/netapp-cdot-nagios/check_cdot_multipath.pl line 51.
If I can provide some debug output, please, let me know how.
I've just looked into the source code of the script since it was giving me an error regarding the DP volumes and I couldn't understand why and...
Why is the check looking into the aggregate name for the string "ata"? Isn't it probably better to check what type the aggregate is and filter it via parameter?
If it is better, let me work on it.
BR,
Giorgio
I am wondering what type of user is needed on my NetApp cDot and 7-Mode systems to use these plugins and what permissions they need to run them.
Thanks!
I ran check_cdot_clusterlinks.pl and received the following error:
user@server1:/usr/lib/nagios/plugins$ ./check_cdot_clusterlinks.pl --hostname netapp --username monitoruser --password pw Undefined subroutine &main::Dumper called at ./check_cdot_clusterlinks.pl line 117.
after a little digging and a hint from http://www.perlmonks.org/bare/?node_id=430132 I added
use Data::Dumper;
after the other use statements, and it corrected the issue:
user@server1:/usr/lib/nagios/plugins$ ./check_cdot_clusterlinks.pl --hostname netapp --username monitoruser --password pw OK: all clusterlinks up
Should the Data::Dumper be included there, or is there something wrong somewhere else in my install?
netapp-cdot-nagios/check_cdot_aggr.pl
Line 117 in 108f41f
I don't understand why there's a dot in the regex before the variable $excludeliststr
Is it a mistake and should be removed?
Hi,
I am trying to monitor num of volumes without sanmirrors using scrip "check_cdot_diff_snapmirror.pl". But It seems is working for some and failing for others. eg:
For Cluster1 i am getting below error on execution
./check_cdot_diff_snapmirror.pl --hostname Cluster1 --username USRTNAME --password PASWORD
Can't call method "children_get" on an undefined value at ./check_cdot_diff_snapmirror.pl line 64.
But same is working fine for Cluster2:
./check_cdot_diff_snapmirror.pl --hostname Cluster2 --username USRTNAME --password PASWORD
25 volume(s) without snapmirror:
----details-----
How to fix it ?
The November 22 version of check_cdot_aggr.pl reads
print Dumper($aggr);
on line 105. That's probably an unintended debug setting.
Hi
if the volume have user quota but no disk limit will got the errors
Argument "-" isn't numeric in multiplication (*) at ./check_cdot_quota.pl line 101.
I don't know how to fix it
Hello,
I get the following plugin output but I don't know why. This only happens on one cdot-node
Use of uninitialized value $state in string ne at /usr/lib/nagios/plugins/netapp-cdot-nagios/check_cdot_interfaces.pl line 126.
Use of uninitialized value $state in string ne at /usr/lib/nagios/plugins/netapp-cdot-nagios/check_cdot_interfaces.pl line 126.
OK: All IFGRP fully active
I don't know why this happens.
cdot Version is 8.3.2P1. All other clusters have a newer release. Maybe that's the reason
Maybe you can help me with that output.
Thanks
Greetings
Filers with internal drives only have 2 paths. The 4 path default is reporting critical because only 2 paths found.
(for example on Netapp 500F )
I would propose a 2 path default. (2 is also multi)
Hi,
I am experiencing the following plugin output with all check_cdot commands, using NetApp-manageability-SDK 9.7.x-9.8.x.
UNKNOWN: in Zapi::invoke, cannot connect to socket
This is presented when running as Icinga, but when executed from command line, it works.
Any ideas?
transport type is HTTPS
We are using check_cdot_disk.pl for monitoring our NetApp. We saw in the icinga2 log files, that there is a problem with the performance Output to the InfluxWriter:
In Icingaweb2 is this the output:
Taking a look into the guidlines from Monitoring plugins I saw that 'Disks' is not an official UOM. So I think that's why the error exists and the output is wrong.
I am using ONTAP 9.3P2 for before version like Ontap 9.2 is working fine. Getting below error while added DATA ONTAP 9.3P2 version.
Failed test query: NaServer::parse_xml - Error in parsing xml:
syntax error at line 1, column 49, byte 49:
================================================^
<title>400 Bad Request</title> at /usr/lib64/perl5/XML/Parser.pm line 187Hi!
First, thanks very much for this splendid check set.
This may be similar to #47, but we get:
Can't call method "children_get" on an undefined value at ./check_cdot_diff_snapmirror.pl line 121.
, on one of our 3 clusters.
We're using the latest ONTAP SDK, so I'm not sure what might be happening here. Unfortunately I know nothing of Perl so I'm fairly useless without some pointers on collecting debug for you.
Thanks!
Currently the script allows to filter/exclude volumes by name/regex.
It would be nice to have this option for the snapshot names as well.
E.g. to filter out snapshots containing the string "snapmirror".
Hi,
I am asking myself the following question: How do they access the information from the NetApp?
Do you use a normal SNMP query?
Security policies prevent using username and password in some companies. API is able to authenticate with certificate. We should enable this option here, too. It could be a minimal change or proper implementation.
I was able to do a minimum change by adding parameter --cert, which then uses the same --username and --password parameters for the public certificate filepath and the private key filepath. This is not a complete implementation but is a quick and simple fix that works for us. I can code something better too, but please suggest how it should be implemented to comply best to standards in this project.
Hi,
the volume-check can send performance-data as well, but the function doesn't work in the right way. There is only 1 Pipe-Symbol "|" allowed. Bit with "--perf" there are plenty of them. Maybe you can have a look at it.
Thanks and regards,
pgress
Hi,
We tried out your scripts to check our NetApp-Metro-Cluster. First, thanks for the great work.
With the first Tests we noticed that the output is not very clear, if one or more disks are failed.
In this case we don't know which disk in which NetApp of our Cluster has the Problem. I saw you get at line 114 the ownership of the disk with
my $owner = $disk->child_get( "disk-ownership-info" );
, but you don't put this information in the output.
It would be very nice, if you can add this.,e.g. in line 117 with this (or better solutions)
push @disk_list, $disk->child_get_string( "disk-name" )." (".$owner->{children}->[5]->{content}.")";
(I hope this is always the element 6 in the array).
My colleagues, the NetApp-Admins, explained to me, that is better to write the "owner-node-name" instead of home-node-name.
thank you!
When warning/critical thresholds for check_cdot_aggr.pl are not met:
[root@netapp_cluster ~]# /usr/local/nagios/libexec/check_cdot_aggr.pl --hostname netapp_cluster --username nagios --password $PASS --warning 80 --critical 90 -aggr of_n1_aggr001_rP
CRITICAL: of_n1_aggr001_rP (95%)Use of uninitialized value $ok_msg in concatenation (.) or string at /usr/local/nagios/libexec/check_cdot_aggr.pl line 167.
OK:
[root@nagios ~]# echo $?
2
[root@nagios ~]#
This strange behavior goes for WARNING states too.
All goes smooth when results are within thesholds:
[root@nagios ~]# /usr/local/nagios/libexec/check_cdot_aggr.pl --hostname netapp_cluster --username nagios --password $PASS --warning 96 --critical 97 -aggr of_n1_aggr001_rP
OK: of_n1_aggr001_rP (95%)
[root@nagios ~]# echo $?
0
[root@nagios ~]#
Hello,
It would be great to add DIMM status check via the check_cdot_global.pl plugin:
Output example:
>system controller memory dimm show
DIMM UECC CECC CPU Slot
Node Name Count Count Socket Channel Number Status
-------------- ------- -------- -------- --------- ------- ------ ------
node1 DIMM-1 0 0 0 0 0 ok
node1 DIMM-NV1 0 0 0 1 1 ok
node2 DIMM-1 1 0 0 0 0 ok
node3 DIMM-NV1 0 0 0 1 1 ok
4 entries were displayed.
BR,
Yannick
./check_netapp_aggr.pl -H 192.168.200.152 -u nagios -p Monitor_246 -w 40 -c 50 --aggr data04_SAS_SHELVES
UNKNOWN: NaServer::parse_xml - Error in parsing xml:
syntax error at line 1, column 49, byte 49:
================================================^
<title>400 Bad Request</title> at /usr/lib64/perl5/vendor_perl/XML/Parser.pm line 187This happens after upgrade ONTAP from 9.2 to 9.3P4
Tried netapp-manageability-sdk-9.3 - no luck
Hello
I have a problem with plugin "check_cdot_lun.pl".
In March, I received an order to implement Icinga1 checks for Netapp monitoring.
Have I downloaded latest plugins "netapp-cdot-nagios-master" and latest Netapp SDK and enabled monitoring.
RAW Commandline sees success.
The Netapp host hosts over 200 volumes.
And here I get problem with "check_cdot_lun.pl".
After some time it has been found that Performance Folder for this netapp host contains very, very many RRD files and the file system gets full quickly.
Currently I see in this netapp host file system over 25500 files and is occupied with 9 GB of memory.
After a short analysis I found out that for one volume alone there are already 259 RRD files within one week and with every new tag, even more files are added.
Example.
-rw-rw-r-- 1 icingaadmin icingaadmin 384952 May 21 13:39 Netapp_lun ._ vol_server_db1_server_db01_ (Usage__427956.87890625_512071.875_MB; 83.574%) | vol_server_db1_server_db01.rrd
-rw-rw-r-- 1 icingaadmin icingaadmin 384952 May 21 13:39 Netapp_lun . vol_server_db1_server_db01 (Usage__427986.7421875_512071.875_MB; 83.579%) | vol_server_db1_server_db01.rrd
..
-rw-rw-r-- 1 icingaadmin icingaadmin 384952 May 23 00:29 Netapp_lun . vol_server_db1_server_db01 (Usage__428173.83203125_512071.875_MB; 83.616%) | vol_server_db1_server_db01.rrd
-rw-rw-r-- 1 icingaadmin icingaadmin 384952 May 23 01:56 Netapp_lun . vol_server_db1_server_db01 (Usage__428173.91015625_512071.875_MB; 83.616%) | vol_server_db1_server_db01.rrd ..
..
-rw-rw-r-- 1 icingaadmin icingaadmin 384952 May 25 11:49 am Netapp_lun . vol_server_db1_server_db01 (Usage__429678.859375_512071.875_MB; 83.910%) | vol_server_db1_server_db01.rrd
-rw-rw-r-- 1 icingaadmin icingaadmin 384952 May 25 11:20 Netapp_lun . vol_server_db1_server_db01 (Usage__429678.890625_512071.875_MB; _83.910%) | _vol_server_db1_server_db01.rrd
Together (in this case) 259 files - from 21.05 to 25.05- for a volume and differ only by adding "* Usage__xxxx *".
And that's how it looks with every volume (some of them have less the others, even more RRD files).
Is it possible to do something here to switch off function "Performance Data"?
Thx
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.