Giter Club home page Giter Club logo

zookeeper-exporter's Introduction

Prometheus zookeeper exporter

Exports mntr zookeeper's stats in prometheus format. zk_followers, zk_synced_followers and zk_pending_syncs metrics are available only on cluster leader.

Build

./build.sh script builds dabealu/zookeeper-exporter:latest docker image. To build image with different name, pass it to build.sh as a first arg.

Usage

Note: starting from zookeeper v3.4.10 it's required to have mntr command whitelisted (details: 4lw.commands.whitelist).

Warning: flag to specify target zk hosts is changed since v0.1.10, see below

Usage of zookeeper-exporter:
  -listen string
        address to listen on (default "0.0.0.0:9141")
  -location string
        metrics location (default "/metrics")
  -timeout int
        timeout for connection to zk servers, in seconds (default 30)
  -zk-hosts string
        comma separated list of zk servers, e.g. '10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181'
  -zk-tls-auth bool
        zk tls client authentication (default false)
  -zk-tls-auth-cert string
        tls certiticate for zk tls client authentication (required if -zk-tls-auth is true)
  -zk-tls-auth-key string
        tls key for zk tls client authentication (required if -zk-tls-auth is true)

An example docker-compose.yml can be used for management of clustered zookeeper + exporters:

# start zk cluster and exporters
docker-compose up -d

# get metrics of first exporter (second and third exporters are on 9142 and 9143 ports)
curl -s localhost:9141/metrics

# at 9184 port there's exporter which handles multiple zk hosts
curl -s localhost:9144/metrics

# shutdown containers
docker-compose down -v

Dashboard

Example grafana dashboard: https://grafana.com/grafana/dashboards/11442

zookeeper-exporter's People

Contributors

dabealu avatar dandronov-alytics avatar dgo- avatar lex2606 avatar mikespokefire avatar nickebbitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

zookeeper-exporter's Issues

leaderServes=no option breaks output

We're using leaderServes=no option, meaning the leader only returns the string This ZooKeeper instance is not currently serving requests as a response to any 4 letter words.

As a result, the exporter metrics output looks like this - note the bare string:

...
zk_ephemerals_count{zk_host="x.x.x.x:2181"} 129
zk_max_latency{zk_host="x.x.x.x:2181"} 4007
zk_packets_received{zk_host="x.x.x.x:2181" 19406771080
zk_outstanding_requests{zk_host="x.x.x.x:2181"} 19
This{zk_host="ip.of.leader:2181"} ZooKeeper
zk_max_latency{zk_host="x.x.x.x:2181"} 5664
zk_num_alive_connections{zk_host="x.x.x.x:2181"} 469
...

Prometheus is not a fan of this, of course: strconv.ParseFloat: parsing "ZooKeeper": invalid syntax

alpine:3.6 has CVE-2019-14697

Hello,

Our vulnerability scanning tool detected CVE-2019-14697, a vulnerability in musl, which is used in alpine:3.6

Vulnerabilities
---------------
Image                                ID                  CVE               Package     Version       Severity    Status                 CVSS
-----                                --                  ---               -------     -------       --------    ------                 ----
dabealu/zookeeper-exporter:latest    3366c4c4d651f5d4    CVE-2019-14697    musl        1.1.16-r15    critical                           9.8

Incompatible with 3.4.10 or other issues?

warning: cannot resolve zk hostname '"zk-0.zk-hs.zk.svc.cluster.local:2181': lookup "zk-0.zk-hs.zk.svc.cluster.local: no such host
warning: cannot resolve zk hostname 'zk-2.zk-hs.zk.svc.cluster.local:2181"': address tcp/2181": unknown port

/ # nslookup zk-0.zk-hs.zk.svc.cluster.local
Server: 10.43.0.10
Address: 10.43.0.10:53

Name: zk-0.zk-hs.zk.svc.cluster.local
Address: 10.42.0.71

/ # nslookup zk-2.zk-hs.zk.svc.cluster.local
Server: 10.43.0.10
Address: 10.43.0.10:53

Name: zk-2.zk-hs.zk.svc.cluster.local
Address: 10.42.2.127

Viewing metrics only shows information about zk-1

Add ability to add custom timeout

We are currently using this tool to provide Prometheus metrics coverage for our zookeeper cluster using the --zk-list optjon. However we've noticed that if we have some network problems which result in tcp connections timing out to 1 of the servers, it takes ~2 minutes for the metrics endpoint to return.

Unfortunately this means that we get large gaps in our monitoring when one of the nodes becomes unresponsive and we can't figure out which zk server is causing the problems.

It looks like from local testing that the default timeout is somewhere around 2 minutes (verified this myself multiple times to be sure):

## curl to metrics endpoint
08:55 $ time curl localhost:8083/metrics
time curl localhost:8083/metrics
zk_up{zk_host="zoo3:2181"} 0

real	2m12.268s
user	0m0.013s
sys	0m0.017s

and we see this message in the logs:

2018/12/06 08:57:37 warning: cannot connect to zoo3:2181: dial tcp 172.28.99.58:2181: getsockopt: connection timed out

It would be good if we could specify a custom timeout for this case to get a zk_up metric of 0 for earlier and still return the rest of the clusters metrics in a timely manner.

I've got a pull request incoming with this functionality shortly.

HTTP 404 while trying to access the metrics

I started a local standalone Zookeeper instance of port 2181.
Then build this source and created the container.

When I run it like this:

docker run dabealu/zookeeper-exporter:latest -listen=127.0.0.1:8080
2019/08/15 04:20:41 zookeeper addresses [127.0.0.1:2181]
2019/08/15 04:20:41 starting serving metrics at 127.0.0.1:8080/metrics

When I try to access the metrics:
curl -s 127.0.0.1:8080/metrics

<title>Error 404 Not Found</title>

HTTP ERROR 404

Problem accessing /metrics. Reason:

    Not Found


Powered by Jetty:// 9.4.17.v20190418

error: 2019/03/27 06:53:05 lookup "10.15.5.4: no such host

I can run this in my docker environment.
when i run this in k8s environment has error like: error: 2019/03/27 06:53:05 lookup "10.15.5.4: no such host
same configuration as docker. both host network mode.

configuration as below:
-listen="0.0.0.0:9141" - -zk-list="zk1:2181,zk2:2181,zk3:2181"

Grafana Dashboard

Hello,
do you have some grafana dashboard example for collected metrics?

Regards,
Andrii

Publish binary releases?

I am fine to build it on my machine for testing etc, but when I want to deploy it to large amount of servers -- that is far from ideal. I'd suggest publishing binary releases along with source releases (so those can be pulled using Ansible for example).

Issue zk_peer_state prometheus parsing

Hello,

I've found a strange issue related to the zk_peer_state metric using the exporter with 3.6.0 zookeeper version. Using 3.5.7 zookeeper resolve this issue because the metric does not exist.

Indeed when i'm using prometheus to parse values from the exporter I'm having the followng error :
strconv.ParseFloat: parsing "leading": invalid syntax

Result of mntr command

echo mntr | nc localhost 2181 | grep peer
zk_peer_state leading - broadcast

The same happen when nodes are following

echo mntr | nc localhost 2181 | grep peer
zk_peer_state following - broadcast

May be it's a known issue but impossible for me to found it.
Thank you :)

One exporter for multiple zookeeper

Hi

I'm wondering , is it possible to use one exporter to scrape metrics from multiple (e.g use --zk-host more then once)? This would be very useful.

Thank you in advance.

Support for ruok in zk_up?

I like this exporter, but was hoping to also get a metric that covers the response from ruok. This would either be a new metric zk_urok or just update the zk_up to take this into account. I can submit a PR but wanted to see if you had any thoughts on a direction here.

Add support for TLS

Hi,
Since version 3.5, zookeeper has an option to enable TLS which is not supported by current exporter.

Tag version for docker image

Hi,
On a docker hub only latest tag is available
May you please tag it with some version, so I can avoid using "FROM dabealu/zookeeper-exporter:latest" and use "FROM dabealu/zookeeper-exporter:" instead.

Thank you in advance.

Prometheus scrape error: expected value after metric, got "MNAME"

Hi, i deploy a zookeeper cluster with this operator, zookeeper version is 3.6.1, and mntr command is in the 4lw.commands.whitelist.
After i deploy the exporter, prometheus got an error: expected value after metric, got "MNAME", when i check the metrics endpoint, i got something like this:

zk_concurrent_request_processing_in_commit_processor_count{zk_host="zookeeper-1.zookeeper-headless:2181"} 0.0
zk_write_per_namespace_count{key="config"}{zk_host="zookeeper-1.zookeeper-headless:2181"} 9.0
zk_jvm_memory_pool_bytes_max{pool="Compressed{zk_host="zookeeper-0.zookeeper-headless:2181"} Class
zk_jvm_threads_state{state="RUNNABLE"}{zk_host="zookeeper-0.zookeeper-headless:2181"} 15.0
zk_quorum_ack_latency{quantile="0.99"}{zk_host="zookeeper-0.zookeeper-headless:2181"} NaN
zk_local_write_committed_time_ms_sum{zk_host="zookeeper-0.zookeeper-headless:2181"} 770.0
zk_jvm_memory_bytes_init{area="nonheap"}{zk_host="zookeeper-0.zookeeper-headless:2181"} 7667712.0
zk_write_per_namespace{key="log_dir_event_notification",quantile="0.5"}{zk_host="zookeeper-1.zookeeper-headless:218
1"} NaN
zk_om_proposal_process_time_ms_count{zk_host="zookeeper-2.zookeeper-headless:2181"} 0.0
zk_write_per_namespace_sum{key="cluster"}{zk_host="zookeeper-2.zookeeper-headless:2181"} 64.0
zk_proposal_latency_count{zk_host="zookeeper-0.zookeeper-headless:2181"} 0.0
zk_jvm_buffer_pool_used_bytes{pool="mapped"}{zk_host="zookeeper-1.zookeeper-headless:2181"} 0.0
zk_write_per_namespace{key="controller",quantile="0.5"}{zk_host="zookeeper-1.zookeeper-headless:2181"} NaN
zk_read_per_namespace_sum{key="cluster"}{zk_host="zookeeper-0.zookeeper-headless:2181"} 124.0
zk_write_per_namespace{key="latest_producer_id_block",quantile="0.5"}{zk_host="zookeeper-0.zookeeper-headless:2181"
} NaN
zk_write_commitproc_time_ms{quantile="0.5"}{zk_host="zookeeper-1.zookeeper-headless:2181"} NaN
zk_jvm_memory_bytes_committed{area="heap"}{zk_host="zookeeper-2.zookeeper-headless:2181"} 2.51396096E8
zk_prep_processor_queue_time_ms_count{zk_host="zookeeper-0.zookeeper-headless:2181"} 179132.0
zk_jvm_memory_pool_bytes_used{pool="Tenured{zk_host="zookeeper-0.zookeeper-headless:2181"} Gen"}
zk_startup_txns_load_time_sum{zk_host="zookeeper-1.zookeeper-headless:2181"} 0.0
zk_last_client_response_size{zk_host="zookeeper-0.zookeeper-headless:2181"} 16.0
zk_commit_propagation_latency_sum{zk_host="zookeeper-0.zookeeper-headless:2181"} 0.0
zk_num_alive_connections{zk_host="zookeeper-1.zookeeper-headless:2181"} 1.0
zk_process_max_fds{zk_host="zookeeper-1.zookeeper-headless:2181"} 1048576.0
zk_max_file_descriptor_count{zk_host="zookeeper-1.zookeeper-headless:2181"} 1048576.0
zk_jvm_memory_pool_allocated_bytes_total{pool="Compressed{zk_host="zookeeper-0.zookeeper-headless:2181"} Class
zk_response_packet_cache_hits{zk_host="zookeeper-0.zookeeper-headless:2181"} 96.0
zk_read_commit_proc_req_queued_count{zk_host="zookeeper-0.zookeeper-headless:2181"} 179230.0
zk_reads_after_write_in_session_queue_sum{zk_host="zookeeper-0.zookeeper-headless:2181"} 1.0
zk_jvm_buffer_pool_capacity_bytes{pool="direct"}{zk_host="zookeeper-1.zookeeper-headless:2181"} 147861.0

I think the problem is the "Class" value of zk_jvm_memory_pool_bytes_max metric, is that right? Is this a bug or i misconfigured something?
Thanks.

zookeeper-exporter image has 1 Compliance and 44 Fixable Vulnerabilities

ISSUE

  • 1 Compliance and 44 Fixable Vulnerabilities found in zookeeper-exporter image after doing the twistlock scan on it.
  • 1 compliance
{
  "complianceFailureSummary": "C:0|H:1|M:0|L:0|T:1",
  "vulnerabilityFailureSummary": "C:5|H:30|M:11|L:1|T:47",
  "complianceDistribution": {
    "critical": 0,
    "high": 1,
    "medium": 0,
    "low": 0,
    "total": 1
  },
  "vulnerabilityDistribution": {
    "critical": 5,
    "high": 30,
    "medium": 11,
    "low": 1,
    "total": 47
  }
}

 Detailed Report of Compliance for image
{
  "text": "",
  "id": 41,
  "severity": "high",
  "cvss": 0,
  "status": "",
  "cve": "",
  "cause": "",
  "description": "It is a good practice to run the container as a non-root user, if possible. Though user\nnamespace mapping is now available, if a user is already defined in the container image, the\ncontainer is run as that user by default and specific user namespace remapping is not\nrequired",
  "title": "(CIS_Docker_v1.3.1 - 4.1) Image should be created with a non-root user",
  "vecStr": "",
  "exploit": "",
  "link": "",
  "type": "image",
  "packageName": "",
  "packageVersion": "",
  "layerTime": 0,
  "templates": [
    "PCI",
    "DISA STIG"
  ],
  "twistlock": false,
  "cri": false,
  "published": 0,
  "fixDate": 0,
  "discovered": "0001-01-01T00:00:00Z",
  "functionLayer": "",
  "severityCHML": "H"
}

Solution

  • Update the go version to golang:1.19-alpine and alpine version to alpine:3.17.0 in Dockerfile for zookeeper-exporter. Updating the version will solve all the 44 fixable Vulnerabilities.

Only one indicator?

[root@172-16-86-7 ~]# curl 172.16.121.13:9141/metrics
zk_up{zk_host="172.16.121.13:2181"} 0
[root@172-16-86-7 ~]#

Wrong up metric

Hello,

echo stats | nc 127.0.0.1 2181
This ZooKeeper instance is not currently serving requests

curl 127.0.0.1:8090/metrics
zk_up{zk_host="127.0.0.1:2181"} 1
zk_server_leader{zk_host="127.0.0.1:2181"} 1

Please fix this

Regards,
Andrii

Wrong zk_server_leader value for network partitioned node

Hi! Thanks for the exporter :)
I have found something that looks like a bug. If zk node is network-partitioned from the quorum it responses with This ZooKeeper instance is not currently serving requests line to the mntr command. This response is processed on

if lines[0] == "This ZooKeeper instance is not currently serving requests" {
and zk_server_leader metric for this host is set to 1. So this node is considered a leader while it is not.
I assume that this specific processing was done for the cases when zookeeper is configured to not serve requests from leader node. Looks like there is another edge case to be processed, but not sure how to distinguish partitioned node from master node which does not serve user requests :(

Exporter unknown port in kubernetes

Thank you very much for your open source project.

I run a zookeeper cluster in Kubernetes, and then use exporter to collect data, but exporter reports an error: unknown port. My exporter is also deployed in Kubernetes.

logs:

[root@k8s-master01 kafka]# kubectl logs -f zookeeper-exporter-7ddf65cc6-lbd6x -n public-service
2020/04/29 14:15:36 info: zookeeper addresses ["zookeeper-0.zookeeper-headless.public-service.svc.cluster.local:2181" --timeout=5]
2020/04/29 14:15:36 info: serving metrics at 0.0.0.0:8080/metrics
2020/04/29 14:17:13 warning: cannot resolve zk hostname '"zookeeper-0.zookeeper-headless.public-service.svc.cluster.local:2181" --timeout=5': address tcp/2181" --timeout=5: unknown port

Pods:

[root@k8s-master01 kafka]# kubectl get po -n public-service
NAME                                 READY   STATUS    RESTARTS   AGE
elasticsearch-logging-0              1/1     Running   0          8d
kafka-0                              1/1     Running   0          13m
zookeeper-0                          1/1     Running   0          132m
zookeeper-exporter-7ddf65cc6-lbd6x   1/1     Running   0          2m56s

nslookup:

[root@k8s-master01 kafka]# kubectl exec -ti zookeeper-exporter-7ddf65cc6-lbd6x -n public-service -- nslookup zookeeper-0.zookeeper-headless.public-service.svc.cluster.local
Server:		50.96.0.10
Address:	50.96.0.10:53

Name:	zookeeper-0.zookeeper-headless.public-service.svc.cluster.local
Address: 177.246.65.55

deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: zookeeper-exporter
  name: zookeeper-exporter
  namespace: public-service
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: zookeeper-exporter
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: zookeeper-exporter
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - k8s-master03.xiqu
      containers:
      - args:
        - --zk-list="zookeeper-0.zookeeper-headless.public-service.svc.cluster.local:2181"
          --timeout=5
        env:
        - name: TZ
          value: Asia/Shanghai
        - name: LANG
          value: C.UTF-8
        image: dabealu/zookeeper-exporter:latest
        imagePullPolicy: IfNotPresent
        lifecycle: {}
        name: zookeeper-exporter
        ports:
        - containerPort: 8080
          name: web
          protocol: TCP
        resources:
          limits:
            cpu: 837m
            memory: 448Mi
          requests:
            cpu: 10m
            memory: 10Mi
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          procMount: Default
          readOnlyRootFilesystem: false
          runAsNonRoot: false
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/zoneinfo/Asia/Shanghai
          name: tz-config
        - mountPath: /etc/localtime
          name: tz-config
        - mountPath: /etc/timezone
          name: timezone
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /usr/share/zoneinfo/Asia/Shanghai
          type: ""
        name: tz-config
      - hostPath:
          path: /etc/timezone
          type: ""
        name: timezone

I hope to add a version of ARM64

The current system architecture is aarch64 and requires the use of zookeeper exporter monitoring. I would like to release a version, linux-arm64.tar.gz
I would be very grateful

prometheus error ParseFloat:invalid syntax

Metric "mntr{zk_host="localhost:2181"} is" - seems not parseable by prometheus:

level=warn ts=2019-07-03T08:50:17.786Z caller=scrape.go:937 component="scrape manager" scrape_pool=zookeeper
target=http://HOST:4000/metrics msg="append failed" err="strconv.ParseFloat: parsing "
is": invalid syntax"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.