Giter Club home page Giter Club logo

zenpacks.daviswr.zfs's Introduction

ZenPacks.daviswr.ZFS

ZenPack to model & monitor ZFS pools and datasets

Requirements

  • An OS that supports ZFS (Solaris/Illumos, FreeBSD, Linux with OpenZFS)
    • See "Illumos & FreeBSD notes" below for non-Linux hosts
  • An account on the ZFS-capable host, which can
    • Log in via SSH with a key
    • Use a bash-compatible shell
    • Run the zdb, zpool, and zfs commands command with certain parameters via privilege escalation without password
      • This may not be required on some hosts, depending on configuration
      • Currently tries to detect dzdo, doas, pfexec, and sudo
  • ZenPackLib

Example entries in /etc/sudoers

Cmnd_Alias ZDB = /sbin/zdb -L
Cmnd_Alias ZPOOL = /sbin/zpool get -pH all, /sbin/zpool iostat -y *, /sbin/zpool status -v, /sbin/zpool status -v *, /sbin/zpool status -x *
Cmnd_Alias ZFS = /sbin/zfs get -pH all, /sbin/zfs get -pH all *
zenoss ALL=(ALL) NOPASSWD: ZDB, ZPOOL, ZFS

zProperties

  • zZFSDatasetIgnoreNames
    • Regex of dataset names for the modeler to ignore.
  • zZFSDatasetIgnoreTypes
    • List of dataset types for the modeler to ignore. Valid types:
      • filesystem
      • snapshot
      • volume
  • zZPoolIgnoreNames
    • Regex of pool names for the modeler to ignore.
  • zZPoolThresholdWarning
    • DEPRECATED: Replaced by per-pool thresholds
  • zZPoolThresholdError
    • DEPRECATED: Replaced by per-pool thresholds
  • zZPoolThresholdCritical
    • DEPRECATED: Replaced by per-pool thresholds
  • zZFSExecPrefix
    • DEPRECATED: Deteremined by modeler
  • zZFSBinaryPath
    • DEPRECATED: Deteremined by modeler
  • zZPoolBinaryPath
    • DEPRECATED: Deteremined by modeler
  • zZdbBinaryPath
    • DEPRECATED: Deteremined by modeler

Deprecated zProperties will be removed before the v1.0 release.

Illumos & FreeBSD notes

Being an OpenZFS/ZoL user, I'm primarily developing on Linux, but paths to zdb, zfs, and zpool should be automatically determined by the modeler, as well as what, if anything, to use for priviledge escalation (sudo, pfexec, etc).

That said, this ZenPack's a work in progress; all of the zdb, zpool, and zfs parameters should work on an Illumos system, at least. Some patient friends that use SmartOS have helped me with that.

Usage

Modelers

I'm not going to make any assumptions about your device class organization, so it's up to you to configure the daviswr.cmd.ZPool and daviswr.cmd.ZFS modelers on the appropriate class or device. The ZPool modeler must be in the list of modelers before the ZFS one.

Zenoss configuration

I've found that reducing zSshConcurrentSessions on the device or class from 10 to maybe 5 helps with problems due to overrunning a monitored system's available SSH channels.

ZPool I/O stats

The zpool-iostat datasource will miss data since it's only noting what's happened in the last second when it polls. While an actual counter would be nice, that's the only source of pool activity information I can find. Any suggestions would be appreciated.

Special Thanks

zenpacks.daviswr.zfs's People

Contributors

daviswr avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

sempervictus

zenpacks.daviswr.zfs's Issues

Re-usable parsers for various ZFS tool outputs

Currently both modelers and all ZenCommand parsers implement parsing of tool output individually. Shared parsing functions for each output type would reduce complexity and perhaps make modeling a little more fault-tolerant.

Should be considered a prerequisite to #6.

Some Pools Return "Code: 2 - Msg: Misuse of shell builtins"

I'm seeing a host with 4 pools - single vdev rpool, raidz1 pool, another raidz1, and a 5-wide span of 2-disk mirrors with a SLOG mirror and a 2-diskL2ARC span, only return data for two of the pools (rpool and a raidz1). The other two pools show up as having a warning of Code: 2 - Msg: Misuse of shell builtins and no pool state whatsoever. The "stateless" pools VDEVs are not accounted for, nor are their comprising storage devices.
The host systems are Arch Linux (so tip bash), its sudoless as root (isolated env), and the ZFS revision is 2.0.0.

REMOVED disk not correctly detected by zpool component

We use /dev/disk/by-id/ paths referencing the ata- or scsi-/sas- symlinks pointing to our devices in our zpool configurations. Just noticed a pool lost a drive, OS removed it altogether at fail-time, and the zenpack is having some issues with this.
The disk is still showing as online, but there is a warning message generated saying:

Component:  raidz1-0
Event Class:    /Cmd/Fail
Status:     New
Message:    Traceback (most recent call last):
  File "/opt/zenoss/Products/ZenRRD/zencommand.py", line 819, in _processDatasourceResults
    parser.processResults(datasource, results)
  File "/opt/zenoss/packs/ZenPacks.daviswr.ZFS/ZenPacks/daviswr/ZFS/parsers/zpool/status.py", line 68, in processResults
    health = pool_match.groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'

I'm assuming a problem in the zpool status output parser.

As a result, the disk itself is not marked as being offline in Zenoss, but the VDEV does show yellow (warning state) due to the parsing problem resulting in the reference to a 'NoneType' object.

Implement support for pending native crypto

@tcaputi has pretty much completed work on native crypto implementation for OpenZFS (openzfs/zfs#4329). This work adds some complexity to how information is stored and presented, as well as CLI interface. Given that the ZenPack works off zdb output, and that dataset-level attributes remain CT, i'm assuming that we should be able to see all relevant attributes whether we have a key loaded or not (aka, should still work while DS is encrypted). We would however want to output information regarding the crypto config (on/off, keysource, cipher, and pbkdfiters) to be logged by Zenoss.

@daviswr: Could i ask you to take a look toward implementation? Every time i start working on this ZenPack i get bogged down by the idiosyncratic differences between Python and my 3rd gen language of choice (Ruby) as relating to string parsing, indents, and set manipulation. I should have some cycles in Jan, but i'm massively behind on Metasploit work, so am throwing this up as an issue instead of a PR presuming you have the cycles to tackle it. Thanks as always.

Integrate ZFS modeler into ZPool modeler

There is no order in which modelers are executed, so it's possible for the ZFS modeler to run prior to the ZPool modeler the first time a system is modeled, this missing the datasets due to ZPool components not yet having been created.

Probably can't stub-out Pools in case ZFS runs after ZPool, which would replace the previously-made components (I think...)

Subsequent models are normally fine.

Cache VDEV Enumeration and Small Suggestions

Thank you for this zenpack - its a lifesaver in our environment. At the latest version (0.7.0), cache drive vdev enumeration fails, i've had to comment it out (https://github.com/daviswr/ZenPacks.daviswr.ZFS/blob/master/ZenPacks/daviswr/ZFS/modeler/plugins/daviswr/cmd/ZPool.py#L153). I'll spin up a lab system to replicate the error, but along the lines of "NoneType has no member named 'dev'".

Separately, i've added local thresholds for pool capacity notification - may be useful to have them in the zenpack. A pool at 90% is something to be concerned about (especially with automated snapshots or heavy use). Also, would be very useful to have a configuration option to disable enumeration of snapshots. Some of our systems have thousands of snapshots across datasets, it gets painful pretty quick (we are only monitoring pools for now anyway, but DS usage and ZVOL IO would be nice).

Parse error for 2.2.3

Will track down the cause but noting this here for record-keeping - catching this error against a 2.2.3 built, packaged, and installed on Ubuntu 22.04. May happen on others, will check Arch shortly:

2024-03-26 23:38:57,653 ERROR zen.ZenModeler: Traceback (most recent call last):
  File "/opt/zenoss/Products/DataCollector/zenmodeler.py", line 669, in processClient
    datamaps = plugin.process(device, results, self.log)
  File "/opt/zenoss/ZenPacks/ZenPacks.daviswr.ZFS-0.8.0-py2.7.egg/ZenPacks/daviswr/ZFS/modeler/plugins/daviswr/cmd/ZFS.py", line 200, in process
    comp[key] = int(datasets[ds][key])
ValueError: invalid literal for int() with base 10: 'none'

Generate events based on zpool status errors

The zpool.status parser should generate events based on messages in the status and errors fields for the pool.

Additionally, vdev & device events if other components in the output have error messages.

Failure to model a single pool in a system containing 3

OpenZFS 2.1 host with 3 zpools hangs on modeling eternally - had to disable the ZFS plugin to get the rest of it commit into the DB. With the plugin enabled, even after zpools are discovered, the modeler hangs indefinitely.
There is an error message in Zenoss:

stderr | interval cannot be zero usage: 	status [-c [script1,script2,...]] [-igLpPstvxD]  [-T d\|u] [pool] ...  	    [interval [count]]
-- | --

The pool which is failing to model is a raidz2 of 6 drives. There's another raidz2 in there with more disks, and both of them have faults showing. The one which works has one UNAVAIL and one FAULTED - both show up as events in Zenoss. The failing pool has a single FAULTED disk in it.

SUSPENDED state not detected correctly

We had a failure go unnoticed this morning - pool shows up as ONLINE in Zenoss (with no IO in the graphs) but went SUSPENDED on the host hours ago.
Zenoss 6.3 with zenpack built off of 8a17ca6
Thanks as always

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.