rverchere / vmware_exporter Goto Github PK

View Code? Open in Web Editor NEW

50.0 7.0 31.0 61 KB

VMWare VCenter Exporter for Prometheus

License: BSD 3-Clause "New" or "Revised" License

Python 98.24% Dockerfile 1.76%

pyvmomi vcenter prometheus-exporter python prometheus

vmware_exporter's Introduction

vmware_exporter

VMWare VCenter Exporter for Prometheus.

Get VMWare VCenter information:

Current number of active snapshots
Snapshot Unix timestamp creation date
Datastore size and other stuff
Basic VM and Host metrics

Usage

install with $ python setup.py install or $ pip install vmware_exporter (Installing from pip will install an old version. This is likely something I wont persue)
Create a config.yml file based on the configuration section. Some variables can be passed in as environment variables
Run $ vmware_exporter -c /path/to/your/config
Go to http://localhost:9272/metrics?vsphere_host=vcenter.company.com to see metrics

Alternatively, if you don't wish to install the package, run using $ vmware_exporter/vmware_exporter.py or you can use the following docker command:

docker run -it --rm  -p 9272:9272 -e VSPHERE_USER=${VSPHERE_USERNAME} -e VSPHERE_PASSWORD=${VSPHERE_PASSWORD} -e VSPHERE_HOST=${VSPHERE_HOST} -e VSPHERE_IGNORE_SSL=True --name vmware_exporter pryorda/vmware_exporter

Configuration amd limiting data collection

You do not need to provide a configuration file unless you are not going to use Environment variables. If you do plan to use a configuration file be sure to override the container entrypoint or add -c config.yml to the command args.

If you want to limit the scope of the metrics gather you can update the subsystem under collect_only in the config section, e.g. under default, or by using the environment variables:

collect_only:
    vms: False
    datastores: True
    hosts: True

This would only connect datastores and hosts.

You can have multiple sections for different hosts and the configuration would look like:

default:
    vsphere_host: "vcenter"
    vsphere_user: "user"
    vsphere_password: "password"
    ignore_ssl: False
    collect_only:
        vms: True
        datastores: True
        hosts: True

esx:
    vsphere_host: vc.example2.com
    vsphere_user: 'root'
    vsphere_password: 'password'
    ignore_ssl: True
    collect_only:
        vms: False
        datastores: False
        hosts: True

limited:
    vsphere_host: slowvc.example.com
    vsphere_user: '[email protected]'
    vsphere_password: 'password'
    ignore_ssl: True
    collect_only:
        vms: False
        datastores: True
        hosts: False

Switching sections can be done by adding ?section=limited to the url.

Environment Variables

Varible	Precedence	Defaults	Description
`VSPHERE_HOST`	config, env, get_param	n/a	vsphere server to connect to
`VSPHERE_USER`	config, env	n/a	User for connecting to vsphere
`VSPHERE_PASSWORD`	config, env	n/a	Password for connecting to vsphere
`VSPHERE_IGNORE_SSL`	config, env	False	Ignore the ssl cert on the connection to vsphere host
`VSPHERE_COLLECT_HOSTS`	config, env	True	Set to false to disable collect of hosts
`VSPHERE_COLLECT_DATASTORES`	config, env	True	Set to false to disable collect of datastores
`VSPHERE_COLLECT_VMS`	config, env	True	Set to false to disable collect of virtual machines

Prometheus configuration

You can use the following parameters in prometheus configuration file. The params section is used to manage multiple login/passwords.

  - job_name: 'vmware_vcenter'
    metrics_path: '/metrics'
    static_configs:
      - targets:
        - 'vcenter.company.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9272

  - job_name: 'vmware_esx'
    metrics_path: '/metrics'
    file_sd_configs:
      - files:
        - /etc/prometheus/esx.yml
    params:
      section: [esx]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9272

Current Status

VCenter and ESXi 6 and 6.5 have been tested.
VM information, Snapshot, Host and Datastore basic information is exported, i.e:

# HELP vmware_snapshots VMWare current number of existing snapshots
# TYPE vmware_snapshot_count gauge
vmware_snapshot_timestamp_seconds{vm_name="My Super Virtual Machine"} 2.0
# HELP vmware_snapshot_timestamp_seconds VMWare Snapshot creation time in seconds
# TYPE vmware_snapshot_timestamp_seconds gauge
vmware_snapshot_age{vm_name="My Super Virtual Machine",vm_snapshot_name="Very old snaphot"} 1478146956.96092
vmware_snapshot_age{vm_name="My Super Virtual Machine",vm_snapshot_name="Old snapshot"} 1478470046.975632

# HELP vmware_datastore_capacity_size VMWare Datasore capacity in bytes
# TYPE vmware_datastore_capacity_size gauge
vmware_datastore_capacity_size{ds_name="ESX1-LOCAL"} 67377299456.0
# HELP vmware_datastore_freespace_size VMWare Datastore freespace in bytes
# TYPE vmware_datastore_freespace_size gauge
vmware_datastore_freespace_size{ds_name="ESX1-LOCAL"} 66349694976.0
# HELP vmware_datastore_uncommited_size VMWare Datastore uncommitted in bytes
# TYPE vmware_datastore_uncommited_size gauge
vmware_datastore_uncommited_size{ds_name="ESX1-LOCAL"} 0.0
# HELP vmware_datastore_provisoned_size VMWare Datastore provisoned in bytes
# TYPE vmware_datastore_provisoned_size gauge
vmware_datastore_provisoned_size{ds_name="ESX1-LOCAL"} 1027604480.0
# HELP vmware_datastore_hosts VMWare Hosts number using this datastore
# TYPE vmware_datastore_hosts gauge
vmware_datastore_hosts{ds_name="ESX1-LOCAL"} 1.0
# HELP vmware_datastore_vms VMWare Virtual Machines number using this datastore
# TYPE vmware_datastore_vms gauge
vmware_datastore_vms{ds_name="ESX1-LOCAL"} 0.0

# HELP vmware_host_power_state VMWare Host Power state (On / Off)
# TYPE vmware_host_power_state gauge
vmware_host_power_state{host_name="esx1.company.com"} 1.0
# HELP vmware_host_cpu_usage VMWare Host CPU usage in Mhz
# TYPE vmware_host_cpu_usage gauge
vmware_host_cpu_usage{host_name="esx1.company.com"} 2959.0
# HELP vmware_host_cpu_max VMWare Host CPU max availability in Mhz
# TYPE vmware_host_cpu_max gauge
vmware_host_cpu_max{host_name="esx1.company.com"} 28728.0
# HELP vmware_host_memory_usage VMWare Host Memory usage in Mbytes
# TYPE vmware_host_memory_usage gauge
vmware_host_memory_usage{host_name="esx1.company.com"} 107164.0
# HELP vmware_host_memory_max VMWare Host Memory Max availability in Mbytes
# TYPE vmware_host_memory_max gauge
vmware_host_memory_max{host_name="esx1.company.com"} 131059.01953125

References

The VMWare exporter uses theses libraries:

pyVmomi for VMWare connection
Prometheus client_python for Prometheus supervision
Twisted for http server

The initial code is mainly inspired from:

Maintainer

Daniel Pryor pryorda

License

See LICENSE file

vmware_exporter's People

Contributors

Stargazers

Watchers

vmware_exporter's Issues

Unclear values for vmware_vm_cpu_usage_average metric

I can't quite figure out the values of vmware_vm_cpu_usage_average metric, for example:

vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz1"} | 202
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz2"} | 225
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz3"} | 4015
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz4"} | 207
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz5"} | 209

according to this https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/cpu_counters.html

The description of this counter is Amount of actively used virtual CPU, as a percentage of total available CPU, but the values I'm seeing do not seem like percentages.

Any clues?

Grafana dashboard

Add grafana dashboard using this exporter.

Add "cluster" label for hosts, vms and datastores

It would be great to add "cluster" label to indicate to which HA cluster the host/vm belongs to.

Same, for datastores that belong to a Datastore cluster.

We manage multiple clusters with the same vCenter, and would like to filter metrics per "PROD" and "STAGING" clusters, for example.

Getting Alarm Info

It would be great to get Alarm info for VM's / Hosts
For VM there is an overallStatus option which works but it would be great to write to a label which alarm was executed.
This way we can easily check if we need to deal with it or not / how many alarms there are for the VM

Transfer ownership

@rverchere since I'm maintaining this now should we transfer owner ship or something to make it so issues get opened on my fork?

Performance impact of refreshing each datastore info at every scrape

I've been running into an issue where one of my esxi hosts appears to be unreachable by the exporter.
I think what's actually happening is that the scrape is timing out.

The jump in scrape duration coincides with about the time that I deployed this code change #16
and it looks like I might be bumping up against my default 10 second timeout.
I've increased my timeout but I'm not sure if this is a great solution for everyone. Maybe the datastore refresh should be a configurable option.

exporter is refreshing vcenter datastore states every one minutes

Hello,

Exporter is getting vcenter (5.5\6.0.2) to refresh datastore information as a vcenter task each query cycle and flood the task viewer.

i would suggest to remove ds.RefreshDatastoreStorageInfo() from vmware_exporter.py

Need /healthz endpoint(s)

Hi.

I'm looking to deploy this via container to Kubernetes, where health checking of http endpoints is a manner of container life or death.

If Kubernetes can't get a 200 OK without polling vCenter API, using health checks with vmware_exporter could DoS vCenter API.

Could we get a /healthz endpoint (or two)?

200 OK if web server is up. /healthz
200 OK if vCenter API is up (without polling metrics). /healthz/api?vcenter.example.com

Edit: This is related:
https://stackoverflow.com/questions/43380939/where-does-the-convention-of-using-healthz-for-application-health-checks-come-f

Open to your thoughts.

Thanks for this awesome project!

-Joshua

Python 3 support

add exporter runtime metrics

It would be nice to add some exporter runtime metrics.
For example at the moment it would be really useful if I knew how long the exporter is taking to scrape vmware for metrics.

Limit metrics to be collected

Hey! first of, thanks for this work 👍

We are now testing this exporter in our environment, one thing we noticed that it takes ~1m32.703s to scrape the vcenter metrics, probably as our environment is quite big.

I wonder if it's possible to limit which metrics should be collected? for example if we only want datastore metrics?

Too many open files when running for a long time

The application opens a connection at every vcenter request, and uses atexit() to handle disconnect.

As the application never stops - in theory -, there are more and more opened connections, leading to too many open fileserror.

2 ways to fix it:

open/close connection at every requests
open connection when application starts, and close when it stops.

support for multiple vcenter/esxi endpoints

It would be great if this exporter could support polling multiple vcenter/esxi endpoints in the same way that the official snmp exporter works by passing the host device as a http get request.

I'm still investigating how to best handle this.

Here's an older snmp exporter that was still python based
https://github.com/prometheus/snmp_exporter/blob/30cb5cc264d1a3c2329ed40e740f57f3670fe1ee/snmp_exporter/http.py

Here's another one https://github.com/dhtech/snmpexporter/blob/master/snmpexporterd.py

Please let me know if you have any suggestions.

Send 500 status code when error connecting to host

When the exporter is not able to connect to a host it logs a Error, cannot connect to vmware message but returns a blank http metrics page. This causes prometheus to think the target is still up.

It would be better if the exporter returned a 500 status code.

Here's an example https://github.com/prometheus/snmp_exporter/blob/30cb5cc264d1a3c2329ed40e740f57f3670fe1ee/snmp_exporter/http.py#L52

connection to vcenter lost

Hi,

it seems that the vmware_exporter didn't make a reconnect, when the vcenter server ist rebooted or switched over by vcenter HA.

regards
Robert

vmware_vm_cpu_ready_summation is not very useful without the number of vCPUs on the VM

This value is in millseconds over the observed interval, on it's own it doesn't provide much insight and usually it is translated into a percentage value using the following formula:

(<value> / (<interval> * 1000)) * 100 = % CPU ready

for example, if the value is 1000 and the interval is 20 (Real-time), the result is:

(1000 / (20s * 1000)) * 100 = 5% CPU ready

The result of above calculations are a sum of each virtual CPUs %RDY time, However, it is more accurate to calculate the % CPU Ready per vCPU, mainly because 5% on 1 vCPU VM is a problem, however 5% on an 8 vCPU VM is OK (0.625% per vCPU)

In order to get the % CPU Ready per vCPU, we need to know how many vCPU each VM has, currently the exporter doesn't collect this information.

add host network metrics

I need some host network metrics so that I can keep an eye on host level network bottlenecks.
I probably would like to see throughput for each nic and not just an aggregate host throughput which seems useless.
I was trying to figure out what the net.throughput.contention.summation metric actually reports as this might be an easier alternative.

Missing VMs metrics

Add more VMs metrics

exceptions.TypeError: a float is required

Hello, I have an error when I run the statistics only when i call vms statistic :

Error, cannot get vm metrics vmware_vm_disk_usage_average for ..... (lot of vmname)

[2018-07-03 17:21:02.748099+00:00] [Failure instance: Traceback: <type 'exceptions.TypeError'>: a float is required
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:1203:mainLoop
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:825:runUntilCurrent
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:393:callback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:501:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:588:_runCallbacks
build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py:129:generate_latest_metrics
build/bdist.linux-x86_64/egg/prometheus_client/core.py:775:_floatToGoString
]
Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1203, in mainLoop
    self.runUntilCurrent()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 825, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 393, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py", line 129, in generate_latest_metrics

  File "build/bdist.linux-x86_64/egg/prometheus_client/core.py", line 775, in _floatToGoString

exceptions.TypeError: a float is required

when i call datastores and hosts statistics, there is no problem.

running

Python 2.7.12
pyvmomi 6.5.0.2017.5.post1
prometheus-client 0.0.19
on Ubuntu 16.04

connected on VMware VShpere 5.5
I test with docker or directly and the error is the same
Do you have an idea?

Thank you!

Add package files

Add setup.py, requirements.txt and co to make it a clean python package.

Add arguments for CLI

i.e:

./vmware_exporter.py -c <config_file.yml>
./vmware_exporter.py -p <port>
./vmware_exporter.py -h # help

The response time is too long

When I get the metrics by using the URL,the response usually continue for up to more than ten seconds.How to resolve this problem?

Rename config values

vcenter_ip and co are not correct variables as the program can also manage esxi hosts.

Rename to:

vcenter_ip --> vmware_target
vcenter_user --> vmware_user
vcenter_password --> vmware_password

metrics error when ESXi host is down

When one of managed ESXi is down, related metrics gives an error as vcenter returns empty values.

Metrics gathering blocks if connection is timing out

Hi,

I have multiple ESXi Hosts im am scraping. Some of them are switched off most of the time.

It seems that if the vmware_exporter is in the process trying to connect to a currently switched off system it cannot process a second connection.

This can easily be reproduced using a web browser by connecting to a currenlty "off" target and opening a second tab trying to connect to an existent machine. Both fail.

Can the exporter process a single request only or is the problem mybe in the python vware library?

Andreas

[Feature Request] Add Host Hardware Information

Dear All,

Some Suggestion:

Is it possible to include the VMware Host Information Like:

ESxi Version, Build
Server Manufacturer
Server Model
Server Service Tag
CPU Cores
Processor Type
VMware Host Physical Network Adapter Transmit Traffic
VMware Host Physical Network Adapter Receive Traffic
VMware Host Physical Network Adapter Errors
VMware Host Physical Network Adapter Dropped Packets

Not working on plain ESXi

Hi,

when running against plain ESXi it fails. Not being a python/vmware expert, this might be due to the pyvmomi not working with ESXi or some (missing?) out of bound checks - I suspect ESXi is exposing other (or limited if I recall) performance measurements.

Andreas

/vmware_exporter.py
[2017-06-17 10:40:22.423793+00:00] Start collecting vcenter metrics
Traceback (most recent call last):
  File "./vmware_exporter.py", line 352, in <module>
    REGISTRY.register(VMWareVCenterCollector())
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 50, in register
    names = self._get_names(collector)
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 86, in _get_names
    for metric in desc_func():
  File "./vmware_exporter.py", line 134, in collect
    self._vmware_get_vms(content, metrics, counter_info)
  File "./vmware_exporter.py", line 315, in _vmware_get_vms
    float(sum(result[0].value[0].value)))
IndexError: list index out of range

Prometheus erro

Hi, When I compled the configuration and restart prometheus service, I got the error as below:
time="2017-08-09T17:34:37+08:00" level=info msg="Loading configuration file prometheus.yml" source="main.go:252"
time="2017-08-09T17:34:37+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"
time="2017-08-09T17:34:38+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"

My config.yml configuration is:
default:
vmware_user: '[email protected]'
vmware_password: 'Er4545'
ignore_ssl: True

esx:
vmware_user: 'root'
vmware_password: 'Er4545'
ignore_ssl: True

Do you know why?