Giter Club home page Giter Club logo

vmware_exporter's Issues

vmware_vm_cpu_ready_summation is not very useful without the number of vCPUs on the VM

This value is in millseconds over the observed interval, on it's own it doesn't provide much insight and usually it is translated into a percentage value using the following formula:

(<value> / (<interval> * 1000)) * 100 = % CPU ready

for example, if the value is 1000 and the interval is 20 (Real-time), the result is:

(1000 / (20s * 1000)) * 100 = 5% CPU ready

The result of above calculations are a sum of each virtual CPUs %RDY time, However, it is more accurate to calculate the % CPU Ready per vCPU, mainly because 5% on 1 vCPU VM is a problem, however 5% on an 8 vCPU VM is OK (0.625% per vCPU)

In order to get the % CPU Ready per vCPU, we need to know how many vCPU each VM has, currently the exporter doesn't collect this information.

Rename config values

vcenter_ip and co are not correct variables as the program can also manage esxi hosts.

Rename to:

  • vcenter_ip --> vmware_target
  • vcenter_user --> vmware_user
  • vcenter_password --> vmware_password

Prometheus erro

Hi, When I compled the configuration and restart prometheus service, I got the error as below:
time="2017-08-09T17:34:37+08:00" level=info msg="Loading configuration file prometheus.yml" source="main.go:252"
time="2017-08-09T17:34:37+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"
time="2017-08-09T17:34:38+08:00" level=error msg="Error reading file "/opt/vmware_exporter/config.yml": yaml: unmarshal errors:
line 1: cannot unmarshal !!map into []*config.TargetGroup" source="file.go:199"

My config.yml configuration is:
default:
vmware_user: '[email protected]'
vmware_password: 'Er4545'
ignore_ssl: True

esx:
vmware_user: 'root'
vmware_password: 'Er4545'
ignore_ssl: True

Do you know why?

Thank you!

Add package files

Add setup.py, requirements.txt and co to make it a clean python package.

Add "cluster" label for hosts, vms and datastores

It would be great to add "cluster" label to indicate to which HA cluster the host/vm belongs to.

Same, for datastores that belong to a Datastore cluster.

We manage multiple clusters with the same vCenter, and would like to filter metrics per "PROD" and "STAGING" clusters, for example.

Not working on plain ESXi

Hi,

when running against plain ESXi it fails. Not being a python/vmware expert, this might be due to the pyvmomi not working with ESXi or some (missing?) out of bound checks - I suspect ESXi is exposing other (or limited if I recall) performance measurements.

Andreas

/vmware_exporter.py
[2017-06-17 10:40:22.423793+00:00] Start collecting vcenter metrics
Traceback (most recent call last):
  File "./vmware_exporter.py", line 352, in <module>
    REGISTRY.register(VMWareVCenterCollector())
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 50, in register
    names = self._get_names(collector)
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 86, in _get_names
    for metric in desc_func():
  File "./vmware_exporter.py", line 134, in collect
    self._vmware_get_vms(content, metrics, counter_info)
  File "./vmware_exporter.py", line 315, in _vmware_get_vms
    float(sum(result[0].value[0].value)))
IndexError: list index out of range

Performance impact of refreshing each datastore info at every scrape

I've been running into an issue where one of my esxi hosts appears to be unreachable by the exporter.
I think what's actually happening is that the scrape is timing out.

vmware-exporter-scrape

The jump in scrape duration coincides with about the time that I deployed this code change #16
and it looks like I might be bumping up against my default 10 second timeout.
I've increased my timeout but I'm not sure if this is a great solution for everyone. Maybe the datastore refresh should be a configurable option.

Need /healthz endpoint(s)

Hi.

I'm looking to deploy this via container to Kubernetes, where health checking of http endpoints is a manner of container life or death.

If Kubernetes can't get a 200 OK without polling vCenter API, using health checks with vmware_exporter could DoS vCenter API.

Could we get a /healthz endpoint (or two)?

  • 200 OK if web server is up. /healthz
  • 200 OK if vCenter API is up (without polling metrics). /healthz/api?vcenter.example.com

Edit: This is related:
https://stackoverflow.com/questions/43380939/where-does-the-convention-of-using-healthz-for-application-health-checks-come-f

Open to your thoughts.

Thanks for this awesome project!

-Joshua

support for multiple vcenter/esxi endpoints

It would be great if this exporter could support polling multiple vcenter/esxi endpoints in the same way that the official snmp exporter works by passing the host device as a http get request.

I'm still investigating how to best handle this.

Here's an older snmp exporter that was still python based
https://github.com/prometheus/snmp_exporter/blob/30cb5cc264d1a3c2329ed40e740f57f3670fe1ee/snmp_exporter/http.py

Here's another one https://github.com/dhtech/snmpexporter/blob/master/snmpexporterd.py

Please let me know if you have any suggestions.

Unclear values for vmware_vm_cpu_usage_average metric

I can't quite figure out the values of vmware_vm_cpu_usage_average metric, for example:

vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz1"} | 202
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz2"} | 225
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz3"} | 4015
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz4"} | 207
vmware_vm_cpu_usage_average{instance="<vcenter>",job="vmware-exporter",vm_name="xyz5"} | 209

according to this https://www.vmware.com/support/developer/converter-sdk/conv61_apireference/cpu_counters.html

The description of this counter is Amount of actively used virtual CPU, as a percentage of total available CPU, but the values I'm seeing do not seem like percentages.

Any clues?

Add arguments for CLI

i.e:

./vmware_exporter.py -c <config_file.yml>
./vmware_exporter.py -p <port>
./vmware_exporter.py -h # help

add exporter runtime metrics

It would be nice to add some exporter runtime metrics.
For example at the moment it would be really useful if I knew how long the exporter is taking to scrape vmware for metrics.

connection to vcenter lost

Hi,

it seems that the vmware_exporter didn't make a reconnect, when the vcenter server ist rebooted or switched over by vcenter HA.

regards
Robert

Transfer ownership

@rverchere since I'm maintaining this now should we transfer owner ship or something to make it so issues get opened on my fork?

add host network metrics

I need some host network metrics so that I can keep an eye on host level network bottlenecks.
I probably would like to see throughput for each nic and not just an aggregate host throughput which seems useless.
I was trying to figure out what the net.throughput.contention.summation metric actually reports as this might be an easier alternative.

Metrics gathering blocks if connection is timing out

Hi,

I have multiple ESXi Hosts im am scraping. Some of them are switched off most of the time.

It seems that if the vmware_exporter is in the process trying to connect to a currently switched off system it cannot process a second connection.

This can easily be reproduced using a web browser by connecting to a currenlty "off" target and opening a second tab trying to connect to an existent machine. Both fail.

Can the exporter process a single request only or is the problem mybe in the python vware library?

Andreas

The response time is too long

When I get the metrics by using the URL,the response usually continue for up to more than ten seconds.How to resolve this problem?

[Feature Request] Add Host Hardware Information

Dear All,

Some Suggestion:

Is it possible to include the VMware Host Information Like:

  • ESxi Version, Build
  • Server Manufacturer
  • Server Model
  • Server Service Tag
  • CPU Cores
  • Processor Type
  • VMware Host Physical Network Adapter Transmit Traffic
  • VMware Host Physical Network Adapter Receive Traffic
  • VMware Host Physical Network Adapter Errors
  • VMware Host Physical Network Adapter Dropped Packets

Too many open files when running for a long time

The application opens a connection at every vcenter request, and uses atexit() to handle disconnect.

As the application never stops - in theory -, there are more and more opened connections, leading to too many open fileserror.

2 ways to fix it:

  • open/close connection at every requests
  • open connection when application starts, and close when it stops.

Getting Alarm Info

It would be great to get Alarm info for VM's / Hosts
For VM there is an overallStatus option which works but it would be great to write to a label which alarm was executed.
This way we can easily check if we need to deal with it or not / how many alarms there are for the VM

Limit metrics to be collected

Hey! first of, thanks for this work ๐Ÿ‘

We are now testing this exporter in our environment, one thing we noticed that it takes ~1m32.703s to scrape the vcenter metrics, probably as our environment is quite big.

I wonder if it's possible to limit which metrics should be collected? for example if we only want datastore metrics?

exceptions.TypeError: a float is required

Hello, I have an error when I run the statistics only when i call vms statistic :

Error, cannot get vm metrics vmware_vm_disk_usage_average for ..... (lot of vmname)

[2018-07-03 17:21:02.748099+00:00] [Failure instance: Traceback: <type 'exceptions.TypeError'>: a float is required
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:1203:mainLoop
/usr/lib/python2.7/dist-packages/twisted/internet/base.py:825:runUntilCurrent
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:393:callback
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:501:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:588:_runCallbacks
build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py:129:generate_latest_metrics
build/bdist.linux-x86_64/egg/prometheus_client/core.py:775:_floatToGoString
]
Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1203, in mainLoop
    self.runUntilCurrent()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 825, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 393, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 501, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "build/bdist.linux-x86_64/egg/vmware_exporter/vmware_exporter.py", line 129, in generate_latest_metrics

  File "build/bdist.linux-x86_64/egg/prometheus_client/core.py", line 775, in _floatToGoString

exceptions.TypeError: a float is required

when i call datastores and hosts statistics, there is no problem.

running

  • Python 2.7.12
  • pyvmomi 6.5.0.2017.5.post1
  • prometheus-client 0.0.19
  • on Ubuntu 16.04

connected on VMware VShpere 5.5
I test with docker or directly and the error is the same
Do you have an idea?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.