rverchere / vmware_exporter Goto Github PK

View Code? Open in Web Editor NEW

50.0 50.0 31.0 61 KB

VMWare VCenter Exporter for Prometheus

License: BSD 3-Clause "New" or "Revised" License

Python 98.24% Dockerfile 1.76%

prometheus prometheus-exporter python pyvmomi vcenter

vmware_exporter's Issues

Missing VMs metrics

Add more VMs metrics

vmware_vm_cpu_ready_summation is not very useful without the number of vCPUs on the VM

This value is in millseconds over the observed interval, on it's own it doesn't provide much insight and usually it is translated into a percentage value using the following formula:

(<value> / (<interval> * 1000)) * 100 = % CPU ready

for example, if the value is 1000 and the interval is 20 (Real-time), the result is:

(1000 / (20s * 1000)) * 100 = 5% CPU ready

The result of above calculations are a sum of each virtual CPUs %RDY time, However, it is more accurate to calculate the % CPU Ready per vCPU, mainly because 5% on 1 vCPU VM is a problem, however 5% on an 8 vCPU VM is OK (0.625% per vCPU)

In order to get the % CPU Ready per vCPU, we need to know how many vCPU each VM has, currently the exporter doesn't collect this information.

Send 500 status code when error connecting to host

When the exporter is not able to connect to a host it logs a Error, cannot connect to vmware message but returns a blank http metrics page. This causes prometheus to think the target is still up.

It would be better if the exporter returned a 500 status code.

Here's an example https://github.com/prometheus/snmp_exporter/blob/30cb5cc264d1a3c2329ed40e740f57f3670fe1ee/snmp_exporter/http.py#L52

Rename config values

vcenter_ip and co are not correct variables as the program can also manage esxi hosts.

Rename to:

vcenter_ip --> vmware_target
vcenter_user --> vmware_user
vcenter_password --> vmware_password

Prometheus erro

Hi, When I compled the configuration and restart prometheus service, I got the error as below:
time="2017-08-09T17:34:37+08:00" level=info msg="Loading configuration file prometheus.yml" source="main.go:252"
time="2017-08-09T17:34:37+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"
time="2017-08-09T17:34:38+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"

My config.yml configuration is:
default:
vmware_user: '[email protected]'
vmware_password: 'Er4545'
ignore_ssl: True

esx:
vmware_user: 'root'
vmware_password: 'Er4545'
ignore_ssl: True

Do you know why?

Thank you!

Add package files

Add setup.py, requirements.txt and co to make it a clean python package.

Grafana dashboard

Add grafana dashboard using this exporter.

Add "cluster" label for hosts, vms and datastores

It would be great to add "cluster" label to indicate to which HA cluster the host/vm belongs to.

Same, for datastores that belong to a Datastore cluster.

We manage multiple clusters with the same vCenter, and would like to filter metrics per "PROD" and "STAGING" clusters, for example.

metrics error when ESXi host is down

When one of managed ESXi is down, related metrics gives an error as vcenter returns empty values.

Not working on plain ESXi

Hi,

when running against plain ESXi it fails. Not being a python/vmware expert, this might be due to the pyvmomi not working with ESXi or some (missing?) out of bound checks - I suspect ESXi is exposing other (or limited if I recall) performance measurements.

Andreas

/vmware_exporter.py
[2017-06-17 10:40:22.423793+00:00] Start collecting vcenter metrics
Traceback (most recent call last):
  File "./vmware_exporter.py", line 352, in <module>
    REGISTRY.register(VMWareVCenterCollector())
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 50, in register
    names = self._get_names(collector)
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 86, in _get_names
    for metric in desc_func():
  File "./vmware_exporter.py", line 134, in collect
    self._vmware_get_vms(content, metrics, counter_info)
  File "./vmware_exporter.py", line 315, in _vmware_get_vms
    float(sum(result[0].value[0].value)))
IndexError: list index out of range

Performance impact of refreshing each datastore info at every scrape

I've been running into an issue where one of my esxi hosts appears to be unreachable by the exporter.
I think what's actually happening is that the scrape is timing out.

The jump in scrape duration coincides with about the time that I deployed this code change #16
and it looks like I might be bumping up against my default 10 second timeout.
I've increased my timeout but I'm not sure if this is a great solution for everyone. Maybe the datastore refresh should be a configurable option.

Need /healthz endpoint(s)

Hi.

I'm looking to deploy this via container to Kubernetes, where health checking of http endpoints is a manner of container life or death.

If Kubernetes can't get a 200 OK without polling vCenter API, using health checks with vmware_exporter could DoS vCenter API.

Could we get a /healthz endpoint (or two)?

200 OK if web server is up. /healthz
200 OK if vCenter API is up (without polling metrics). /healthz/api?vcenter.example.com

Edit: This is related:
https://stackoverflow.com/questions/43380939/where-does-the-convention-of-using-healthz-for-application-health-checks-come-f

Open to your thoughts.

Thanks for this awesome project!

-Joshua

support for multiple vcenter/esxi endpoints

It would be great if this exporter could support polling multiple vcenter/esxi endpoints in the same way that the official snmp exporter works by passing the host device as a http get request.

I'm still investigating how to best handle this.

Here's an older snmp exporter that was still python based
https://github.com/prometheus/snmp_exporter/blob/30cb5cc264d1a3c2329ed40e740f57f3670fe1ee/snmp_exporter/http.py

Here's another one https://github.com/dhtech/snmpexporter/blob/master/snmpexporterd.py

Please let me know if you have any suggestions.

Unclear values for vmware_vm_cpu_usage_average metric

I can't quite figure out the values of vmware_vm_cpu_usage_average metric, for example:

vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz1"} | 202
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz2"} | 225
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz3"} | 4015
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz4"} | 207
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz5"} | 209

according to this https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/cpu_counters.html

The description of this counter is Amount of actively used virtual CPU, as a percentage of total available CPU, but the values I'm seeing do not seem like percentages.

Any clues?

Add arguments for CLI

i.e:

./vmware_exporter.py -c <config_file.yml>
./vmware_exporter.py -p <port>
./vmware_exporter.py -h # help

exporter is refreshing vcenter datastore states every one minutes

Hello,

Exporter is getting vcenter (5.5\6.0.2) to refresh datastore information as a vcenter task each query cycle and flood the task viewer.

i would suggest to remove ds.RefreshDatastoreStorageInfo() from vmware_exporter.py

Python 3 support

add exporter runtime metrics

It would be nice to add some exporter runtime metrics.
For example at the moment it would be really useful if I knew how long the exporter is taking to scrape vmware for metrics.

connection to vcenter lost

Hi,

it seems that the vmware_exporter didn't make a reconnect, when the vcenter server ist rebooted or switched over by vcenter HA.

regards
Robert

Transfer ownership

@rverchere since I'm maintaining this now should we transfer owner ship or something to make it so issues get opened on my fork?

add host network metrics

I need some host network metrics so that I can keep an eye on host level network bottlenecks.
I probably would like to see throughput for each nic and not just an aggregate host throughput which seems useless.
I was trying to figure out what the net.throughput.contention.summation metric actually reports as this might be an easier alternative.

Metrics gathering blocks if connection is timing out

Hi,

I have multiple ESXi Hosts im am scraping. Some of them are switched off most of the time.

It seems that if the vmware_exporter is in the process trying to connect to a currently switched off system it cannot process a second connection.

This can easily be reproduced using a web browser by connecting to a currenlty "off" target and opening a second tab trying to connect to an existent machine. Both fail.

Can the exporter process a single request only or is the problem mybe in the python vware library?

Andreas

The response time is too long

When I get the metrics by using the URL,the response usually continue for up to more than ten seconds.How to resolve this problem?

[Feature Request] Add Host Hardware Information

Dear All,

Some Suggestion:

Is it possible to include the VMware Host Information Like:

ESxi Version, Build
Server Manufacturer
Server Model
Server Service Tag
CPU Cores
Processor Type
VMware Host Physical Network Adapter Transmit Traffic
VMware Host Physical Network Adapter Receive Traffic
VMware Host Physical Network Adapter Errors
VMware Host Physical Network Adapter Dropped Packets

Too many open files when running for a long time

The application opens a connection at every vcenter request, and uses atexit() to handle disconnect.

As the application never stops - in theory -, there are more and more opened connections, leading to too many open fileserror.

2 ways to fix it:

open/close connection at every requests
open connection when application starts, and close when it stops.

Getting Alarm Info

It would be great to get Alarm info for VM's / Hosts
For VM there is an overallStatus option which works but it would be great to write to a label which alarm was executed.
This way we can easily check if we need to deal with it or not / how many alarms there are for the VM

Limit metrics to be collected

Hey! first of, thanks for this work 👍

We are now testing this exporter in our environment, one thing we noticed that it takes ~1m32.703s to scrape the vcenter metrics, probably as our environment is quite big.

I wonder if it's possible to limit which metrics should be collected? for example if we only want datastore metrics?

exceptions.TypeError: a float is required

Hello, I have an error when I run the statistics only when i call vms statistic :

Error, cannot get vm metrics vmware_vm_disk_usage_average for ..... (lot of vmname)

[2018-07-03 17:21:02.748099+00:00] [Failure instance: Traceback: <type 'exceptions.TypeError'>: a float is required
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:1203:mainLoop
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:825:runUntilCurrent
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:393:callback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:501:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:588:_runCallbacks
build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py:129:generate_latest_metrics
build/bdist.linux-x86_64/egg/prometheus_client/core.py:775:_floatToGoString
]
Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1203, in mainLoop
    self.runUntilCurrent()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 825, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 393, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py", line 129, in generate_latest_metrics

  File "build/bdist.linux-x86_64/egg/prometheus_client/core.py", line 775, in _floatToGoString

exceptions.TypeError: a float is required

when i call datastores and hosts statistics, there is no problem.

running

Python 2.7.12
pyvmomi 6.5.0.2017.5.post1
prometheus-client 0.0.19
on Ubuntu 16.04

connected on VMware VShpere 5.5
I test with docker or directly and the error is the same
Do you have an idea?

Thank you!

rverchere / vmware_exporter Goto Github PK

vmware_exporter's Issues

Recommend Projects

Recommend Topics

Recommend Org