blakelead / couchbase_exporter Goto Github PK

View Code? Open in Web Editor NEW

35.0 6.0 19.0 484 KB

Export metrics from Couchbase Server for Prometheus consumption

License: Other

Dockerfile 0.27% Go 99.73%

prometheus exporter couchbase metrics

couchbase_exporter's Introduction

Couchbase Exporter

Expose metrics from Couchbase cluster for consumption by Prometheus.

News

Couchbase has released an official exporter: couchbase-exporter.

Getting Started

Run from command-line:

./couchbase_exporter [flags]

The exporter supports various configuration ways: command-line arguments takes precedence over environment variables that take precedence over configuration file.

Configuration file can be provided on the command line. It must be written in json or yaml. If none is provided using the command line --config.file option, it will look for a file named config.json or config.yml in the same directory that the exporter binary. You can find complete examples of configuation files in the sources (directory examples).

As for available flags and equivalent environment variables, here is a list:

environment variable	argument	description	default
	-config.file	Configuration file to load data from
CB_EXPORTER_LISTEN_ADDR	-web.listen-address	Address to listen on for HTTP requests	:9191
CB_EXPORTER_TELEMETRY_PATH	-web.telemetry-path	Path under which to expose metrics	/metrics
CB_EXPORTER_SERVER_TIMEOUT	-web.timeout	Server read timeout in seconds	10s
CB_EXPORTER_DB_URI	-db.uri	Address of Couchbase cluster	http://127.0.0.1:8091
CB_EXPORTER_DB_TIMEOUT	-db.timeout	Couchbase client timeout in seconds	10s
CB_EXPORTER_TLS_ENABLED	-tls.enabled	If true, enable TLS communication with the cluster	false
CB_EXPORTER_TLS_SKIP_INSECURE	-tls.skip-insecure	If true, certificate won't be verified	false
CB_EXPORTER_TLS_CA_CERT	-tls.ca-cert	Root certificate of the cluster
CB_EXPORTER_TLS_CLIENT_CERT	-tls.client-cert	Client certificate
CB_EXPORTER_TLS_CLIENT_KEY	-tls.client-key	Client private key
CB_EXPORTER_DB_USER	not allowed	Administrator username
CB_EXPORTER_DB_PASSWORD	not allowed	Administrator password
CB_EXPORTER_LOG_LEVEL	-log.level	Log level: info,debug,warn,error,fatal	error
CB_EXPORTER_LOG_FORMAT	-log.format	Log format: text, json	text
CB_EXPORTER_SCRAPE_CLUSTER	-scrape.cluster	If false, wont scrape cluster metrics	true
CB_EXPORTER_SCRAPE_NODE	-scrape.node	If false, wont scrape node metrics	true
CB_EXPORTER_SCRAPE_BUCKET	-scrape.bucket	If false, wont scrape bucket metrics	true
CB_EXPORTER_SCRAPE_XDCR	-scrape.xdcr	If false, wont scrape xdcr metrics	false
	-help	Command line help

Important: for security reasons credentials cannot be set with command line arguments.

Metrics

All metrics are listed in resources/metrics.md.

Docker

Use it like this:

docker run --name cbexporter -p 9191:9191 -e CB_EXPORTER_DB_USER=admin -e CB_EXPORTER_DB_PASSWORD=complicatedpassword blakelead/couchbase-exporter:latest

Examples

You can find example files in resources directory.

Prometheus

Some simple alerting rules: resources/prometheus-alerts.yaml.

Grafana

Minimal dashboard (resources/grafana-dashboard.json):

Systemd

You can adapt and use the provided service template to run the exporter with systemd (resources/couchbase-exporter.service):

sudo mv couchbase-exporter.service /etc/systemd/system/couchbase-exporter.service
sudo systemctl enable couchbase-exporter.service
sudo systemctl start couchbase-exporter.service

Contributors

Special thanks to:

@Berchiche
@bitdba88
@CharlesRaymond1
@pandrieux

couchbase_exporter's People

Contributors

Stargazers

Watchers

Forkers

plusx2012 bitdba88 lydat143 charlesraymond1 igorbelitei glennabernethy oriolsantos sebalopez pandrieux preeyanp nishantrajak zwb-github mateuszlapa1986 linleng ibelitei wangqianjun0723 vivektech hanpengshua ibrahimyldz11q

couchbase_exporter's Issues

[examples] Rework Grafana dashboard template

Multi URI to same cluster

Create a second connection to the same cluster. This will allow for primary node being down in a multi-node cluster.

You can check my work I did this in the fork of your code https://github.com/bitdba88/couchbase_exporter

couchbase_export build failed

go build

github.com/leansys-team/couchbase_exporter

./couchbase_exporter.go:62:3: unknown field 'TLSEnabled' in struct literal of type collector.Context
./couchbase_exporter.go:63:3: unknown field 'TLSSkipInsecure' in struct literal of type collector.Context
./couchbase_exporter.go:64:3: unknown field 'TLSCACert' in struct literal of type collector.Context
./couchbase_exporter.go:65:3: unknown field 'TLSClientCert' in struct literal of type collector.Context
./couchbase_exporter.go:66:3: unknown field 'TLSClientKey' in struct literal of type collector.Context

go mod collector source is not new

diff collector/common.go /mnt/e/go/dev/pkg/mod/github.com/blakelead/[email protected]/collector/common.go
9,10d8
< "crypto/tls"
< "crypto/x509"
40,52c38,45
< URI string
< Username string
< Password string
< Timeout time.Duration
< ScrapeCluster bool
< ScrapeNode bool
< ScrapeBucket bool
< ScrapeXDCR bool
< TLSEnabled bool
< TLSSkipInsecure bool
< TLSCACert string
< TLSClientCert string
< TLSClientKey string

  URI           string
  Username      string
  Password      string
  Timeout       time.Duration
  ScrapeCluster bool
  ScrapeNode    bool
  ScrapeBucket  bool
  ScrapeXDCR    bool

121,128d113
< tlsClientConfig := &tls.Config{}
< if c.TLSEnabled {
< tlsClientConfig, err = createTLSClientConfig(c)
< if err != nil {
< log.Error(err)
< }
< }
<
130,136c115
< client := http.Client{
< Timeout: c.Timeout,
< Transport: &http.Transport{
< TLSClientConfig: tlsClientConfig,
< },
< }
<

  client := http.Client{Timeout: c.Timeout}

138d116
<
143d120
<
266,288d242
< }
<
< // createTLSClientConfig loads certificates and create TLS config
< func createTLSClientConfig(c Context) (*tls.Config, error) {
< caCert, err := ioutil.ReadFile(c.TLSCACert)
< if err != nil {
< return nil, err
< }
< certPool := x509.NewCertPool()
< certPool.AppendCertsFromPEM(caCert)
<
< keyPair, err := tls.LoadX509KeyPair(c.TLSClientCert, c.TLSClientKey)
< if err != nil {
< return nil, err
< }
<
< config := tls.Config{
< Certificates: []tls.Certificate{keyPair},
< ClientCAs: certPool,
< InsecureSkipVerify: c.TLSSkipInsecure,
< }
<
< return &config, nil

Support for Couchbase 6.x

Hi,

is Couchbase 6.x supported?

I haven't tested it yet, thought you probably already know if it's the case or not.

Thanks

Issue with docker image run

i am trying to run docker image with all default value , but container is existing with below error:

docker run blakelead/couchbase-exporter:0.6.0
time="2019-06-19T11:09:45Z" level=info msg="stat /bin/config.yml: no such file or directory: using command-line parameters and/or environment variables if provided"
time="2019-06-19T11:09:45Z" level=info msg="Couchbase Exporter Version: 0.6.0"
time="2019-06-19T11:09:45Z" level=info msg="Supported Couchbase versions: 4.5.1, 4.6.5, 5.1.1"
time="2019-06-19T11:09:45Z" level=info msg="config.file=config.yml"
time="2019-06-19T11:09:45Z" level=info msg="web.listen-address=9191"
time="2019-06-19T11:09:45Z" level=info msg="web.telemetry-path=/metrics"
time="2019-06-19T11:09:45Z" level=info msg="web.timeout=0s"
time="2019-06-19T11:09:45Z" level=info msg="db.uri=http://127.0.0.1:8091"
time="2019-06-19T11:09:45Z" level=info msg="db.timeout=0s"
time="2019-06-19T11:09:45Z" level=info msg="log.level=info"
time="2019-06-19T11:09:45Z" level=info msg="log.format=text"
time="2019-06-19T11:09:45Z" level=info msg="scrape.cluster=true"
time="2019-06-19T11:09:45Z" level=info msg="scrape.node=true"
time="2019-06-19T11:09:45Z" level=info msg="scrape.bucket=true"
time="2019-06-19T11:09:45Z" level=info msg="scrape.xdcr=false"
time="2019-06-19T11:09:45Z" level=info msg="Started listening at 9191"
time="2019-06-19T11:09:45Z" level=fatal msg="listen tcp: address 9191: missing port in address"

Do you have the kubernets deployment file for this ?

[xdcr] crash when scraping

go version go1.8.3 linux/amd64
-sh-4.2$ go run *go
INFO[0000] No configuration file was found in the working directory /tmp/go-build797253098/command-line-arguments/_obj/exe 
INFO[0000] Couchbase version: 3.0.1-1444-rel-community  
INFO[0000] Community version: true                      
WARN[0000] Version 3.0.1-1444-rel-community may not be supported by this exporter 
ERRO[0000] Could not read file /tmp/go-build797253098/command-line-arguments/_obj/exe/metrics/cluster-default.json 
ERRO[0000] Error during creation of cluster exporter. Cluster metrics won't be scraped 
INFO[0000] Listening at :9191

How does it compare with brunopsoares/prometheus_couchbase_exporter?

https://github.com/brunopsoares/prometheus_couchbase_exporter

[xdcr] add error count metric

Tasks API response contains an array with XDCR errors. Create a metric that is a count of those errors.

[xdcr] Add percent_completeness metric

Couchbase console exposes percent_completeness metric regarding outbound XDCR.

https://docs.couchbase.com/server/6.0/manage/manage-xdcr/monitor-xdcr-replication.html

[all] follow Prometheus naming conventions

Prometheus documentation mentions best-practices when writing exporters. Add them when relevant/possible.

Change Licence to Apache 2.0

json error

exporter version 0.7.0 (compiled binary not in a container)
couchbase version: 5.0.1

The exporter is handing out metrics BUT throws some errors on various servers:

level=error msg="json: cannot unmarshal number 530.5305305305305 into Go struct field .cmd_get of type int"
level=error msg="json: cannot unmarshal number 8.998001998001998 into Go struct field .diskFetches of type int"
level=error msg="json: cannot unmarshal object into Go struct field BucketData.autoCompactionSettings of type bool"

[xdcr] get metrics from requesting node only

Currently all nodes get XDCR metrics from all other nodes, but that is useless. Find a way to remove redundant data.

The new version is not running

Hey there,

I tried the new build and it exits immediately after creating:

I run it with: docker run --name cbexporter -p 9191:9191 -e CB_EXPORTER_TELEMETRY_PATH=/metrics -e CB_EXPORTER_DB_URI=http://10.7.62.132 -e CB_EXPORTER_DB_USER=prometheus -e CB_EXPORTER_DB_PASSWORD=prompwd blakelead/couchbase-exporter:latest

Then i see this: cdfb3009fb0f blakelead/couchbase-exporter:latest "/bin/sh -c /bin/cou…" 6 seconds ago Exited (1) 5 seconds ago cbexporter

Thanks for your input

Difference between this and Couchbase Official Prometheus Exporter

Hello, I am just wondering what is the difference between this project and the official Couchbase exporter: https://github.com/couchbase/couchbase-exporter

Add compatibility with Couchbase Community 5.1.1

Some users expressed the need to use the exporter with Couchbase 5.1.1.

[xdcr] get remote cluster name instead of uuid

Fetch remote cluster name instead of remote cluster uuid for XDCR labels, or put them both.

Wrong naming for default cluster metrics

I forgot to remove prefixes in the file cluster-default.json

Node vs Bucket metrics

I have a 3 node cluster. I'm wondering do I need a separate exporter for each and every node to grab each "NODE's" metrics (for example: cb_node_stats_cmd_get)? If my understanding is correct, each exporter would be calculating the same BUCKET and CLUSTER metrics redundantly.

Is there a way to just have 1 exporter running that points to a single node in the cluster, and then can identify all other nodes in that cluster and provide the *NODE metrics breakdown for each node? (Much like we have for BUCKETs, if we have multiple buckets, we report on each bucket from a single exporter).

Support for Couchbase Sync-Gateway +2.5 metrics

Couchbase Sync Gateway was released on 29 April 2019 and contains support to metrics in the Sync Gateway servers. it would be great if couchbase_exporter could support it too.

Use MustNewConstMetric

Quote from Brian Brazil (prometheus/docs#1207 (comment)):

You are using normal counters/gauges, as this is an exporter you should use MustNewConstMetric instead and then you can also remove the mutexes.

What am I doing wrong

I tried to start this from the command line in a centos container and I tried to download the https://github.com/blakelead/couchbase_exporter/releases/download/0.1.0/couchbase_exporter-0.1.0-linux-amd64.tar.gz. I also tried the premade docker image "blakelead/couchbase-exporter". I am not understanding what to do here. After running the docker image one I get the following:
sudo docker run -dit --name dbapocs_cbexp_004 --label triton.network.public="SDC-PCI-Dev-DB" -web.listen-address=":9191" -web.telemetry-path="/metrics" -db.url="http://xx.x.xx.xxx:8091" -db.user="prometheus" -db.pwd="prompwd" blakelead/couchbase-exporter
Password:
unknown shorthand flag: 'b' in -b.url=http://xx.x.xx.xxx:8091
I don't understand why it's not functioning properly.
in the centos container I run the command line one and I am seeing all of the metrics I need in the terminal window, However when I goto the url http://server:9191 I am not seeing all the metrics that I am seeing in the terminal window.

I suspect that is the log that is showing there. See screenshot below.

If you would like a screen shot, I can send you one via email. Just let me know.

Thanks,
Michael

Incorrect binary in linux amd64 release

Wrong binary in tar.gz for linux amd64.

The linux-amd64.tar.gz form the release page does not run on alpine:3.9.3.

However if I clone down the repo and build the binary env GOOS=linux GOARCH=amd64 go build ./couchbase_exporter.go it runs on alpine no problem.

This leads me to believe the release tar.gz does not have the correct binary tar.gz.

Support Couchbase 6.0

Saw similar question was raised before when we started the exporter we are having this warnings reported and we do not see the metrics are spitted out.
[root]# ./couchbase2_exporter -scrape.xdcr=false INFO[0000] Couchbase version: 6.0.0-1693-community INFO[0000] Community version: true WARN[0000] Version 6.0.0-1693-community may not be supported by this exporter INFO[0000] Listening at :9420

Is TYPE correct for cb_bucket_ep_oom_errors and cb_bucket_ep_tmp_oom_errors ?

I have 558 couchbase nodes. and over the past week I've only ever seen these metrics increase and plateau, never decrease, even on machines that presently have 85% ram free. The couchbase_exporter documents them as

# HELP cb_bucket_ep_oom_errors Number of times unrecoverable OOMs happened while processing operations
# TYPE cb_bucket_ep_oom_errors gauge

I found DataDog documents these metrics as type 'gauge' but I worry these are actually of type counter, because the numbers stay high long after the machine's memory pressure is relieved.
I'm having a terrible time finding mention of "Samples.EpOomErrors" or "Samples.EpTmpOomErrors" in the Couchbase documentation. All I've found is a passing mention to "ep_oom_errors" and how it's a bad thing if you see it at all... and about a dozen other websites that copy-paste that one paragraph.

I'm certain the exporter is correctly relaying the information from couchbase, but I would like an assurance that these metrics are of type gauge, and if so, a more comprehensible description of their meaning. I.E. if these are a gague-measurement of errorsful operations... how many operations were sampled for this gauge? a minute's worth? an hour's worth? if I scrape less often than the sample range, could errors go undetected between scrapes?

Will it work with other couchbase versions like 4.6.5 ?

@blakelead , i need to use it on VMs , do i have to make any change to use this ?

Switch to using https for connection to Couchbase cluster

Change to default to https://localhost:18091 instead of the non secure port since the code is pushing a password in the connection. Need to also add the ability to change TLS setting to ignore self signed certificates.

You can check my code on the fork https://github.com/bitdba88/couchbase_exporter if that makes it easier.

Errors when running against couchbase 3.0.1 -- timeout exceeded while awaiting headers

Hello,

Thanks for the help earlier -- it can run now without crashing :) One issue though: yes, it runs, but it doesn't actually get any stats when it runs now. I get these errors every time it scrapes (below). When I go to http://:8091/pools/default/buckets/presence/stats/replications I do actually see json, so the endpoints seem to be there. Any ideas? Thanks!

ERRO[0098] Get http://:8091/pools/default/buckets/mwi/stats: net/http: request canceled (Client.Timeout exceeded while awaiting headers) ERRO[0098] Get http://:8091/pools/default/buckets/presence/stats: net/http: request canceled (Client.Timeout exceeded while awaiting headers) ERRO[0098] Could not unmarshal bucketstats data for bucket mwi

[0.6.0] environment variable for scraping XDCR doesn't work

CB_EXPORTER_SCRAPE_XDCR is ignored (regression).

No Binaries added to Release Assets for 0.9.2

Release 0.9.2 which include the fix for the amd64 binary does not have any binaries included in its Assets

Collect performance issues

Collect can take a long time eventually failing with a timeout, especially for bucket metrics fetching.

A solution would be to parallelize bucket API requests.

Memcached moxi exporter

I'm guessing not but does this exporter capture any of the stats exposed by moxi stats proxy command described here https://developer.couchbase.com/documentation/server/3.x/admin/Monitoring/monitor-moxi-statistics.html?

Add XDCR metrics

Add XDCR metrics scraping.

See this: Couchbase XDCR Stats API

[all] implement unit tests

can't identify some couchbase cluster metrics

I try to add some couchbase metrics such as couchbase cluster ram/disk, but can't identify them, but it seems defined in metrics directory in the repo. BTW what should I do if I need to add more coustom metrics in the future?

    "name": "cluster",
    "route": "/pools/default",
    "list": [
        { "name": "ram_total_bytes",         "id": "StorageTotals.ram.total",       "description": "Total memory available to the cluster",              "labels": [] },
        { "name": "ram_used_bytes",          "id": "StorageTotals.ram.used",        "description": "Memory used by the cluster",                         "labels": [] },
        { "name": "ram_used_by_data_bytes",  "id": "StorageTotals.ram.usedByData",  "description": "Memory used by the data in the cluster",             "labels": [] },
        { "name": "ram_quota_total_bytes",   "id": "StorageTotals.ram.quotaTotal", "description": "Total memory allocated to Couchbase in the cluster", "labels": [] },
        { "name": "ram_quota_used_bytes",    "id": "StorageTotals.ram.quotaUsed",   "description": "Memory quota used by the cluster",                   "labels": [] },
        { "name": "disk_total_bytes",        "id": "StorageTotals.hdd.total",       "description": "Total disk space available to the cluster",          "labels": [] },
        { "name": "disk_used_bytes",         "id": "StorageTotals.hdd.used",        "description": "Disk space used by the cluster",                     "labels": [] },
        { "name": "disk_quota_total_bytes",  "id": "StorageTotals.hdd.quotaTotal",  "description": "Disk space quota for the cluster",                   "labels": [] },
        { "name": "disk_used_by_data_bytes", "id": "StorageTotals.hdd.usedByData",  "description": "Disk space used by the data in the cluster",         "labels": [] },
        { "name": "disk_free_bytes",         "id": "StorageTotals.hdd.free",        "description": "Free disk space in the cluster",                     "labels": [] },

    ]
}

How to run this in Kubernetes and Prometheus Operator?

Do not force installing exporters on each node in cluster

I have read your answer to @shipmak in his issue #6 and I read the prometheus Deployment paragraph but I think this should be an exception here.
It looks as if the exporter is meant to be running from each node in the couchbase cluster but in our case we have a 6 nodes cluster (others might have more) but I'm running the exporter as a deployment in a separate kubernetes cluster.
I don't think in my case it makes sense to have 6 deployments running only with different nodes variables, plus the redundant "Bucket" metrics DO mess up your queries ones you start sum up things and forget about splitting by the "instance" label..
I feel like the best solution for this is a "self discover" mechanism for each node in the cluster and report them all as a label.
For example instead of:
cb_node_service_up 1 =>
cb_node_service_up {node="node1"} 1
cb_node_service_up {node="node2"} 1

Also, in the grafana dashboard that you provided (nice work btw!) the $node variable assumes the same idea of the exporter running from couchbase node itself when in fact the "instance" label is just the kubernetes node the exporter is running in.

[other] improve configuration file management

The way configuration file is used and coded is dirty and not very flexible. Change that.

multiple xdcr crashes the exporter

exporter version 0.9.0
couchbase version 5.0.1-5003 (community)

Issue: setting up a couchbase cluster to replicate to 2 datacenters (each with a distinct cluster id and name) throws the exporter in a strange loop. After a few prometheus scrape it uses over 1030 file descriptors and errors out with this for every xdcr metrics.

collected metric "cb_xdcr_error_count" { label:<name:"destination_bucket" value:"XXXXX" > label:<name:"remote_cluster_id" value:"YYYY" > label:<name:"remote_cluster_name" value:"backup" > label:<name:"source_bucket" value:"ZZZZ" > gauge:<value:0 > } was collected before with the same name and label values

blakelead / couchbase_exporter Goto Github PK

couchbase_exporter's Introduction

Couchbase Exporter

News

Getting Started

Metrics

Docker

Examples

Prometheus

Grafana

Systemd

Contributors

couchbase_exporter's People

Contributors

Stargazers

Watchers

Forkers

couchbase_exporter's Issues

github.com/leansys-team/couchbase_exporter

go mod collector source is not new

121,128d113 < tlsClientConfig := &tls.Config{} < if c.TLSEnabled { < tlsClientConfig, err = createTLSClientConfig(c) < if err != nil { < log.Error(err) < } < } < 130,136c115 < client := http.Client{ < Timeout: c.Timeout, < Transport: &http.Transport{ < TLSClientConfig: tlsClientConfig, < }, < } <

Wrong binary in tar.gz for linux amd64.

Recommend Projects

Recommend Topics

Recommend Org

121,128d113
< tlsClientConfig := &tls.Config{}
< if c.TLSEnabled {
< tlsClientConfig, err = createTLSClientConfig(c)
< if err != nil {
< log.Error(err)
< }
< }
<
130,136c115
< client := http.Client{
< Timeout: c.Timeout,
< Transport: &http.Transport{
< TLSClientConfig: tlsClientConfig,
< },
< }
<