rverchere / vmware_exporter Goto Github PK
View Code? Open in Web Editor NEWVMWare VCenter Exporter for Prometheus
License: BSD 3-Clause "New" or "Revised" License
VMWare VCenter Exporter for Prometheus
License: BSD 3-Clause "New" or "Revised" License
Add more VMs metrics
This value is in millseconds over the observed interval, on it's own it doesn't provide much insight and usually it is translated into a percentage value using the following formula:
(<value> / (<interval> * 1000)) * 100 = % CPU ready
for example, if the value is 1000 and the interval is 20 (Real-time), the result is:
(1000 / (20s * 1000)) * 100 = 5% CPU ready
The result of above calculations are a sum of each virtual CPUs %RDY time, However, it is more accurate to calculate the % CPU Ready per vCPU, mainly because 5% on 1 vCPU VM is a problem, however 5% on an 8 vCPU VM is OK (0.625% per vCPU)
In order to get the % CPU Ready per vCPU, we need to know how many vCPU each VM has, currently the exporter doesn't collect this information.
When the exporter is not able to connect to a host it logs a Error, cannot connect to vmware
message but returns a blank http metrics page. This causes prometheus to think the target is still up.
It would be better if the exporter returned a 500 status code.
Here's an example https://github.com/prometheus/snmp_exporter/blob/30cb5cc264d1a3c2329ed40e740f57f3670fe1ee/snmp_exporter/http.py#L52
vcenter_ip
and co are not correct variables as the program can also manage esxi hosts.
Rename to:
vcenter_ip
--> vmware_target
vcenter_user
--> vmware_user
vcenter_password
--> vmware_password
Hi, When I compled the configuration and restart prometheus service, I got the error as below:
time="2017-08-09T17:34:37+08:00" level=info msg="Loading configuration file prometheus.yml" source="main.go:252"
time="2017-08-09T17:34:37+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"
time="2017-08-09T17:34:38+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"
My config.yml configuration is:
default:
vmware_user: '[email protected]'
vmware_password: 'Er4545'
ignore_ssl: True
esx:
vmware_user: 'root'
vmware_password: 'Er4545'
ignore_ssl: True
Do you know why?
Thank you!
Add setup.py
, requirements.txt
and co to make it a clean python package.
Add grafana dashboard using this exporter.
It would be great to add "cluster" label to indicate to which HA cluster the host/vm belongs to.
Same, for datastores that belong to a Datastore cluster.
We manage multiple clusters with the same vCenter, and would like to filter metrics per "PROD" and "STAGING" clusters, for example.
When one of managed ESXi is down, related metrics gives an error as vcenter returns empty values.
Hi,
when running against plain ESXi it fails. Not being a python/vmware expert, this might be due to the pyvmomi not working with ESXi or some (missing?) out of bound checks - I suspect ESXi is exposing other (or limited if I recall) performance measurements.
Andreas
/vmware_exporter.py
[2017-06-17 10:40:22.423793+00:00] Start collecting vcenter metrics
Traceback (most recent call last):
File "./vmware_exporter.py", line 352, in <module>
REGISTRY.register(VMWareVCenterCollector())
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 50, in register
names = self._get_names(collector)
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 86, in _get_names
for metric in desc_func():
File "./vmware_exporter.py", line 134, in collect
self._vmware_get_vms(content, metrics, counter_info)
File "./vmware_exporter.py", line 315, in _vmware_get_vms
float(sum(result[0].value[0].value)))
IndexError: list index out of range
I've been running into an issue where one of my esxi hosts appears to be unreachable by the exporter.
I think what's actually happening is that the scrape is timing out.
The jump in scrape duration coincides with about the time that I deployed this code change #16
and it looks like I might be bumping up against my default 10 second timeout.
I've increased my timeout but I'm not sure if this is a great solution for everyone. Maybe the datastore refresh should be a configurable option.
Hi.
I'm looking to deploy this via container to Kubernetes, where health checking of http endpoints is a manner of container life or death.
If Kubernetes can't get a 200 OK without polling vCenter API, using health checks with vmware_exporter could DoS vCenter API.
Could we get a /healthz
endpoint (or two)?
/healthz
/healthz/api?vcenter.example.com
Edit: This is related:
https://stackoverflow.com/questions/43380939/where-does-the-convention-of-using-healthz-for-application-health-checks-come-f
Open to your thoughts.
Thanks for this awesome project!
-Joshua
It would be great if this exporter could support polling multiple vcenter/esxi endpoints in the same way that the official snmp exporter works by passing the host device as a http get request.
I'm still investigating how to best handle this.
Here's an older snmp exporter that was still python based
https://github.com/prometheus/snmp_exporter/blob/30cb5cc264d1a3c2329ed40e740f57f3670fe1ee/snmp_exporter/http.py
Here's another one https://github.com/dhtech/snmpexporter/blob/master/snmpexporterd.py
Please let me know if you have any suggestions.
I can't quite figure out the values of vmware_vm_cpu_usage_average
metric, for example:
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz1"} | 202
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz2"} | 225
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz3"} | 4015
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz4"} | 207
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz5"} | 209
according to this https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/cpu_counters.html
The description of this counter is Amount of actively used virtual CPU, as a percentage of total available CPU
, but the values I'm seeing do not seem like percentages.
Any clues?
i.e:
./vmware_exporter.py -c <config_file.yml>
./vmware_exporter.py -p <port>
./vmware_exporter.py -h # help
Hello,
Exporter is getting vcenter (5.5\6.0.2) to refresh datastore information as a vcenter task each query cycle and flood the task viewer.
i would suggest to remove ds.RefreshDatastoreStorageInfo() from vmware_exporter.py
It would be nice to add some exporter runtime metrics.
For example at the moment it would be really useful if I knew how long the exporter is taking to scrape vmware for metrics.
Hi,
it seems that the vmware_exporter didn't make a reconnect, when the vcenter server ist rebooted or switched over by vcenter HA.
regards
Robert
@rverchere since I'm maintaining this now should we transfer owner ship or something to make it so issues get opened on my fork?
I need some host network metrics so that I can keep an eye on host level network bottlenecks.
I probably would like to see throughput for each nic and not just an aggregate host throughput which seems useless.
I was trying to figure out what the net.throughput.contention.summation
metric actually reports as this might be an easier alternative.
Hi,
I have multiple ESXi Hosts im am scraping. Some of them are switched off most of the time.
It seems that if the vmware_exporter is in the process trying to connect to a currently switched off system it cannot process a second connection.
This can easily be reproduced using a web browser by connecting to a currenlty "off" target and opening a second tab trying to connect to an existent machine. Both fail.
Can the exporter process a single request only or is the problem mybe in the python vware library?
Andreas
When I get the metrics by using the URL,the response usually continue for up to more than ten seconds.How to resolve this problem?
Dear All,
Some Suggestion:
Is it possible to include the VMware Host Information Like:
The application opens a connection at every vcenter request, and uses atexit()
to handle disconnect.
As the application never stops - in theory -, there are more and more opened connections, leading to too many open files
error.
2 ways to fix it:
It would be great to get Alarm info for VM's / Hosts
For VM there is an overallStatus option which works but it would be great to write to a label which alarm was executed.
This way we can easily check if we need to deal with it or not / how many alarms there are for the VM
Hey! first of, thanks for this work ๐
We are now testing this exporter in our environment, one thing we noticed that it takes ~1m32.703s to scrape the vcenter metrics, probably as our environment is quite big.
I wonder if it's possible to limit which metrics should be collected? for example if we only want datastore
metrics?
Hello, I have an error when I run the statistics only when i call vms statistic :
Error, cannot get vm metrics vmware_vm_disk_usage_average for ..... (lot of vmname)
[2018-07-03 17:21:02.748099+00:00] [Failure instance: Traceback: <type 'exceptions.TypeError'>: a float is required
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:1203:mainLoop
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:825:runUntilCurrent
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:393:callback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:501:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:588:_runCallbacks
build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py:129:generate_latest_metrics
build/bdist.linux-x86_64/egg/prometheus_client/core.py:775:_floatToGoString
]
Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1203, in mainLoop
self.runUntilCurrent()
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 825, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 393, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py", line 129, in generate_latest_metrics
File "build/bdist.linux-x86_64/egg/prometheus_client/core.py", line 775, in _floatToGoString
exceptions.TypeError: a float is required
when i call datastores and hosts statistics, there is no problem.
running
connected on VMware VShpere 5.5
I test with docker or directly and the error is the same
Do you have an idea?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.