Giter Club home page Giter Club logo

pg_exporter's Introduction

PG Exporter

Prometheus exporter for PostgreSQL Server & Pgbouncer metrics.

PG Exporter aims to bring the ultimate observability for Pigsty x PostgreSQL, which is a Free RDS PG Alternative and battery-included open-source PostgreSQL distribution: Demo & Gallery

PG Exporter is fully customizable, flexible, and extensible. It defines almost all metrics with declarative YAML configuration files. It's easy to add new metrics or modify existing ones. Much more that the prometheus community one.

The latest stable version is 0.6.1, which support PostgreSQL 10 ~ 16+ and Pgbouncer 1.8 ~ 1.21+.

pigsty-v2-3


Features

  • Support Pigsty, the battery-include PostgreSQL distribution with ultimate observability.
  • Support both Postgres & Pgbouncer (Pgbouncer is detected when target dbname is pgbouncer)
  • Flexible: Almost all metrics are defined in customizable conf files with SQL collector.
  • Schedule: Fine-grained execution control: Timeout, Cache, Skip, Fatality, etc...
  • Dynamic Planning: Define multiple branches for a collector. Install specific branch when server & exporter meet certain conditions.
  • Rich self-monitoring metrics about pg_exporter itself.
  • Auto-discovery multiple databases, and run database level collectors
  • Tested and verified in a real-world production environment for several years (10K+ cores)

Quick Start

To run this exporter, you will need two things:

  • Where to scrape: A Postgres or pgbouncer URL given via PG_EXPORTER_URL or --url.
  • What to scrape: A path to config file or directory, by default ./pg_exporter.yml or /etc/pg_exporter.yml
export PG_EXPORTER_URL='postgres://postgres:password@localhost:5432/postgres'
export PG_EXPORTER_CONFIG='/path/to/conf/file/or/dir'
pg_exporter

pg_exporter only built-in with 3 metrics: pg_up,pg_version , and pg_in_recovery. All other metrics are defined in configuration files. You could use the pre-defined configuration file: pg_exporter.yml or use separated metric query in conf dir.


Usage

Parameters could be given via command-line args or environment variables.

  • --web.listen-address is the web endpoint listen address, :9630 by default, this parameter can not be changed via environment variable.
  • --web.telemetry-path or PG_EXPORTER_TELEMETRY_PATH` is the URL path under which to expose metrics.
  • --url or PG_EXPORTER_URL defines where to scrape, it should be a valid DSN or URL. (note that sslmode=disable must be specifed explicitly for database that does not using SSL)
  • --config or PG_EXPORTER_CONFIG defines how to scrape. It could be a single YAML file or a directory containing a series of separated YAML configs, which config will be loaded in alphabetic order.
  • --label or PG_EXPORTER_LABEL defines constant labels that are added to all metrics. It should be a comma-separated list of label=value pairs.
  • --tag or PG_EXPORTER_TAG will mark this exporter with given tags. Tags are a comma-separated-value list of strings. which could be used for query filtering and execution control.
  • --disable-cache or PG_EXPORTER_DISABLE_CACHE will disable metric cache.
  • --auto-discovery or PG_EXPORTER_AUTO_DISCOVERY will automatically spawn peripheral servers for other databases in the target PostgreSQL server. except for those listed in --exclude-database. (Not implemented yet)
  • --exclude-database or PG_EXPORTER_EXCLUDE_DATABASE is a comma-separated list of the database name. Which are not scrapped when --auto-discovery is enabled
  • --namespace or PG_EXPORTER_NAMESPACE defined internal metrics prefix, by default pg|pgbouncer.
  • --fail-fast or PG_EXPORTER_FAIL_FAST is a flag. During start-up, pg_exporter will wait if the target is down. with --fail-fast=true, pg_exporter will fail instead of waiting on the start-up procedure if the target is down
  • --connect-timeout or PG_EXPORTER_CONNECT_TIMEOUT is the timeout for connecting to the target.
  • --dry-run will print configuration files
  • --explain will actually connect to the target server and plan queries for it. Then explain which queries are installed.
  • --log.level will set logging level: one of debug, info, warn, error.
usage: pg_exporter [<flags>]

Flags:
  -h, --[no-]help            Show context-sensitive help (also try --help-long and --help-man).
  -u, --url=URL              postgres target url
  -c, --config=CONFIG        path to config dir or file
      --web.listen-address=:9630 ...
                             Addresses on which to expose metrics and web interface. Repeatable for multiple addresses.
      --web.config.file=""   [EXPERIMENTAL] Path to configuration file that can enable TLS or authentication. See: https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md
  -l, --label=""             constant lables:comma separated list of label=value pair ($PG_EXPORTER_LABEL)
  -t, --tag=""               tags,comma separated list of server tag ($PG_EXPORTER_TAG)
  -C, --[no-]disable-cache   force not using cache ($PG_EXPORTER_DISABLE_CACHE)
  -m, --[no-]disable-intro   disable collector level introspection metrics ($PG_EXPORTER_DISABLE_INTRO)
  -a, --[no-]auto-discovery  automatically scrape all database for given server ($PG_EXPORTER_AUTO_DISCOVERY)
  -x, --exclude-database="template0,template1,postgres"
                             excluded databases when enabling auto-discovery ($PG_EXPORTER_EXCLUDE_DATABASE)
  -i, --include-database=""  included databases when enabling auto-discovery ($PG_EXPORTER_INCLUDE_DATABASE)
  -n, --namespace=""         prefix of built-in metrics, (pg|pgbouncer) by default ($PG_EXPORTER_NAMESPACE)
  -f, --[no-]fail-fast       fail fast instead of waiting during start-up ($PG_EXPORTER_FAIL_FAST)
  -T, --connect-timeout=100  connect timeout in ms, 100 by default ($PG_EXPORTER_CONNECT_TIMEOUT)
  -P, --web.telemetry-path="/metrics"
                             URL path under which to expose metrics. ($PG_EXPORTER_TELEMETRY_PATH)
  -D, --[no-]dry-run         dry run and print raw configs
  -E, --[no-]explain         explain server planned queries
      --log.level="info"     log level: debug|info|warn|error]
      --[no-]version         Show application version.

API

Here are pg_exporter REST APIs

# Fetch metrics (metrics path depends on parameters)
curl localhost:9630/metrics

# Reload configuration
curl localhost:9630/reload

# Explain configuration
curl localhost:9630/explain

# Print Statistics
curl localhost:9630/stat

# Aliveness health check (200 up, 503 down)
curl localhost:9630/up
curl localhost:9630/health
curl localhost:9630/liveness
curl localhost:9630/readiness

# traffic route health check

### 200 if not in recovery, 404 if in recovery, 503 if server is down
curl localhost:9630/primary
curl localhost:9630/leader
curl localhost:9630/master
curl localhost:9630/read-write
curl localhost:9630/rw

### 200 if in recovery, 404 if not in recovery, 503 if server is down
curl localhost:9630/replica
curl localhost:9630/standby
curl localhost:9630/slave
curl localhost:9630/read-only
curl localhost:9630/ro

### 200 if server is ready for read traffic (including primary), 503 if server is down
curl localhost:9630/read

Build

Build on your local machine:

go build

To build a static stand-alone binary for docker scratch

CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o pg_exporter

To build a docker image, use:

make docker

Or download the latest prebuilt binaries from release pages.


Deployment

Redhat rpm and Debian/Ubuntu deb packages is made with nfpm.


Configuration

Configs lie in the core of pg_exporter. Actually, this project contains more lines of YAML than go.

Current pg_exporter is shipped with the following metrics collector definition files

Note

Supported version: PostgreSQL 10, 11, 12, 13, 14, 15, 16+

But you can still get PostgreSQL 9.4, 9.5, 9.6 support by switching to the older version collector definition

pg_exporter will generate approximately 200~300 metrics for a completely new database cluster. For a real-world database with 10 ~ 100 tables, it may generate several 1k ~ 10k metrics. You may need to modify or disable some database-level metrics on a database with several thousand or more tables in order to complete the scrape in time.

Config files are using YAML format, there are lots of examples in the conf dir. and here is a sample config.

#################################################################
#┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓#
#┃ PostgreSQL/Pgbouncer Metric Collector Definition            ┃#
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┫#
#┃ Author:   Vonng ([email protected])                   ┃#
#┃ Desc  :   pg_exporter metrics collector definition          ┃#
#┃ Ver   :   PostgreSQL 10~14 pgbouncer 1.9+                   ┃#
#┃ Ctime :   2019-12-09                                        ┃#
#┃ Mtime :   2022-04-27                                        ┃#
#┃ Copyright (C) 2019-2022 Ruohang Feng                        ┃#
#┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛#
#################################################################



#┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#┃ 1. Configuration File
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ The configuration file for pg_exporter is a YAML file.
#┃ Default configuration are retrieved via following precedence:
#┃     1. command line args:      --config=<config path>
#┃     2. environment variables:  PG_EXPORTER_CONFIG=<config path>
#┃     3. pg_exporter.yml        (Current directory)
#┃     4. /etc/pg_exporter.yml   (etc config file)
#┃     5. /etc/pg_exporter       (etc config dir)
#┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#┃ 2. Config Format
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ pg_exporter config could be a single YAML file, or a directory containing a series of separated YAML files.
#┃ each YAML config file is consist of one or more metrics Collector definition. Which are top-level objects
#┃ If a directory is provided, all YAML in that directory will be merged in alphabetic order.
#┃ Collector definition examples are shown below.
#┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#┃ 3. Collector Example
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃  # Here is an example of a metrics collector definition
#┃  pg_primary_only:       <---- Collector branch name. Must be UNIQUE among the entire configuration
#┃    name: pg             <---- Collector namespace, used as METRIC PREFIX, set to branch name by default, can be override
#┃                               the Same namespace may contain multiple collector branches. It's the user's responsibility
#┃                               to make sure that AT MOST ONE collector is picked for each namespace.
#
#┃    desc: PostgreSQL basic information (on primary)                 <---- Collector description
#┃    query: |                                                        <---- SQL string
#
#┃      SELECT extract(EPOCH FROM CURRENT_TIMESTAMP)                  AS timestamp,
#┃             pg_current_wal_lsn() - '0/0'                           AS lsn,
#┃             pg_current_wal_insert_lsn() - '0/0'                    AS insert_lsn,
#┃             pg_current_wal_lsn() - '0/0'                           AS write_lsn,
#┃             pg_current_wal_flush_lsn() - '0/0'                     AS flush_lsn,
#┃             extract(EPOCH FROM now() - pg_postmaster_start_time()) AS uptime,
#┃             extract(EPOCH FROM now() - pg_conf_load_time())        AS conf_reload_time,
#┃             pg_is_in_backup()                                      AS is_in_backup,
#┃             extract(EPOCH FROM now() - pg_backup_start_time())     AS backup_time;
#
#┃                             <---- [OPTIONAL] metadata fields, control collector behavior
#┃    ttl: 10                  <---- Cache TTL: in seconds, how long will pg_exporter cache this collector's query result.
#┃    timeout: 0.1             <---- Query Timeout: in seconds, query that exceed this limit will be canceled.
#┃    min_version: 100000      <---- minimal supported version, boundary IS included. In server version number format,
#┃    max_version: 130000      <---- maximal supported version, boundary NOT included, In server version number format
#┃    fatal: false             <---- Collector marked `fatal` fails, the entire scrape will abort immediately and marked as failed
#┃    skip: false              <---- Collector marked `skip` will not be installed during the planning procedure
#
#┃    tags: [cluster, primary] <---- list of strings, which could be:
#┃                               * 'cluster' marks this query as cluster level, so it will only execute once for the same PostgreSQL Server
#┃                               * 'primary' or 'master'  mark this query can only run on a primary instance (WILL NOT execute if pg_is_in_recovery())
#┃                               * 'standby' or 'replica' mark this query can only run on a replica instance (WILL execute if pg_is_in_recovery())
#┃                             some special tag prefix have special interpretation:
#┃                               * 'dbname:<dbname>' means this query will ONLY be executed on database with name '<dbname>'
#┃                               * 'username:<user>' means this query will only be executed when connect with user '<user>'
#┃                               * 'extension:<extname>' means this query will only be executed when extension '<extname>' is installed
#┃                               * 'schema:<nspname>' means this query will only by executed when schema '<nspname>' exist
#┃                               * 'not:<negtag>' means this query WILL NOT be executed when exporter is tagged with '<negtag>'
#┃                               * '<tag>' means this query WILL be executed when exporter is tagged with '<tag>'
#┃                               ( <tag> could not be cluster,primary,standby,master,replica,etc...)
#
#
#┃    metrics:                 <---- List of returned columns, each column must have a `name` and `usage`, `rename` and `description` are optional
#┃      - timestamp:           <---- Column name, should be exactly the same as returned column name
#┃          usage: GAUGE       <---- Metric type, `usage` could be
#┃                                    * DISCARD: completely ignoring this field
#┃                                    * LABEL:   use columnName=columnValue as a label in metric
#┃                                    * GAUGE:   Mark column as a gauge metric, full name will be '<query.name>_<column.name>'
#┃                                    * COUNTER: Same as above, except it is a counter rather than a gauge.
#┃          rename: ts         <---- [OPTIONAL] Alias, optional, the alias will be used instead of the column name
#┃          description: xxxx  <---- [OPTIONAL] Description of the column, will be used as a metric description
#┃          default: 0         <---- [OPTIONAL] Default value, will be used when column is NULL
#┃          scale:   1000      <---- [OPTIONAL] Scale the value by this factor
#┃      - lsn:
#┃          usage: COUNTER
#┃          description: log sequence number, current write location (on primary)
#┃      - insert_lsn:
#┃          usage: COUNTER
#┃          description: primary only, location of current wal inserting
#┃      - write_lsn:
#┃          usage: COUNTER
#┃          description: primary only, location of current wal writing
#┃      - flush_lsn:
#┃          usage: COUNTER
#┃          description: primary only, location of current wal syncing
#┃      - uptime:
#┃          usage: GAUGE
#┃          description: seconds since postmaster start
#┃      - conf_reload_time:
#┃          usage: GAUGE
#┃          description: seconds since last configuration reload
#┃      - is_in_backup:
#┃          usage: GAUGE
#┃          description: 1 if backup is in progress
#┃      - backup_time:
#┃          usage: GAUGE
#┃          description: seconds since the current backup start. null if don't have one
#
#
#┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#┃ 4. Collector Presets
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ pg_exporter is shipped with a series of preset collectors (already numbered and ordered by filename)
#
#┃ 1xx  Basic metrics:        basic info, metadata, settings
#┃ 2xx  Replication metrics:  replication, walreceiver, downstream, sync standby, slots, subscription
#┃ 3xx  Persist metrics:      size, wal, background writer, checkpoint, recovery, cache, shmem usage
#┃ 4xx  Activity metrics:     backend count group by state, wait event, locks, xacts, queries
#┃ 5xx  Progress metrics:     clustering, vacuuming, indexing, basebackup, copy
#┃ 6xx  Database metrics:     pg_database, publication, subscription
#┃ 7xx  Object metrics:       pg_class, table, index, function, sequence, default partition
#┃ 8xx  Optional metrics:     optional metrics collector (disable by default, slow queries)
#┃ 9xx  Pgbouncer metrics:    metrics from pgbouncer admin database `pgbouncer`
#
#┃ 100-599 Metrics for entire database cluster  (scrape once)
#┃ 600-899 Metrics for single database instance (scrape for each database ,except for pg_db itself)
#
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ 5. Cache TTL
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ Cache can be used for reducing query overhead, it can be enabled by setting a non-zero value for `ttl`
#┃ It is highly recommended to use cache to avoid duplicate scrapes. Especially when you got multiple Prometheus
#┃ scraping the same instance with slow monitoring queries. Setting `ttl` to zero or leaving blank will disable
#┃ result caching, which is the default behavior
#
#┃ TTL has to be smaller than your scrape interval. 15s scrape interval and 10s TTL is a good start for
#┃ production environment. Some expensive monitoring queries (such as size/bloat check) will have longer `ttl`
#┃ which can also be used as a mechanism to achieve 'different scrape frequency'
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ 6. Query Timeout
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ Collectors can be configured with an optional Timeout. If the collector's query executes more than that
#┃ timeout, it will be canceled immediately. Setting the `timeout` to 0 or leaving blank will reset it to
#┃ default timeout 0.1 (100ms). Setting it to any negative number will disable the query timeout feature.
#┃ All queries have a default timeout of 100ms, if exceed, the query will be canceled immediately to avoid
#┃ avalanche. You can explicitly overwrite that option. but beware: in some extreme cases, if all your
#┃ timeout sum up greater your scrape/cache interval (usually 15s), the query may still be jammed.
#┃ or, you can just disable potential slow queries.
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ 7. Version Compatibility
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ Each collector has two optional version compatibility parameters: `min_version` and `max_version`.
#┃ These two parameters specify the version compatibility of the collector. If target postgres/pgbouncer
#┃ version is less than `min_version`, or higher than `max_version`, the collector will not be installed.
#┃ These two parameters are using PostgreSQL server version number format, which is a 6-digit integer
#┃ format as <major:2 digit><minor:2 digit>:<release: 2 digit>.
#┃ For example, 090600 stands for 9.6 and 120100 stands for 12.1
#┃ And beware that version compatibility range is left-inclusive right exclusive: [min, max), set to zero or
#┃ leaving blank will affect as -inf or +inf
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ 8. Fatality
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ If a collector is marked with `fatal` falls, the entire scrape operation will be marked as fail and key metrics
#┃ `pg_up` / `pgbouncer_up` will be reset to 0. It is always a good practice to set up AT LEAST ONE fatal
#┃ collector for pg_exporter. `pg.pg_primary_only` and `pgbouncer_list` are the default fatal collector.
#
#┃ If a collector without `fatal` flag fails, it will increase global fail counters. But the scrape operation
#┃ will carry on. The entire scrape result will not be marked as faile, thus will not affect the `<xx>_up` metric.
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ 9. Skip
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ Collector with `skip` flag set to true will NOT be installed.
#┃ This could be a handy option to disable collectors
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ 10. Tags and Planning
#┣┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
#┃ Tags are designed for collector planning & schedule. It can be handy to customize which queries run
#┃ on which instances. And thus you can use one-single monolith config for multiple environments
#
#┃  Tags are a list of strings, each string could be:
#┃  Pre-defined special tags
#┃    * 'cluster' marks this collector as cluster level, so it will ONLY BE EXECUTED ONCE for the same PostgreSQL Server
#┃    * 'primary' or 'master' mark this collector as primary-only, so it WILL NOT work iff pg_is_in_recovery()
#┃    * 'standby' or 'replica' mark this collector as replica-only, so it WILL work iff pg_is_in_recovery()
#┃  Special tag prefix which have different interpretation:
#┃    * 'dbname:<dbname>' means this collector will ONLY work on database with name '<dbname>'
#┃    * 'username:<user>' means this collector will ONLY work when connect with user '<user>'
#┃    * 'extension:<extname>' means this collector will ONLY work when extension '<extname>' is installed
#┃    * 'schema:<nspname>' means this collector will only work when schema '<nspname>' exists
#┃  Customized positive tags (filter) and negative tags (taint)
#┃    * 'not:<negtag>' means this collector WILL NOT work when exporter is tagged with '<negtag>'
#┃    * '<tag>' means this query WILL work if exporter is tagged with '<tag>' (special tags not included)
#
#┃  pg_exporter will trigger the Planning procedure after connecting to the target. It will gather database facts
#┃  and match them with tags and other metadata (such as supported version range). Collector will only
#┃  be installed if and only if it is compatible with the target server.
#┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

About

Author: Vonng ([email protected])

License: Apache Apache License Version 2.0

Copyright: 2018-2023 [email protected]

pg_exporter's People

Contributors

dependabot[bot] avatar dovreshef avatar japinli avatar maxgabriel avatar mjf avatar mouchar avatar vancona avatar vonng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg_exporter's Issues

Allow to collect metrics from multiple sources

Hello. Thank you for the unique Postgres exporter first! 🥇 IMHO, it could be even a improved significantly (with maybe quite a low effort in Golang) if you added higher level of configuration allowing us to configure the exporter to collect metrics from multiple sources, distinquished by some "top level" configuration of either tags (eg. a hostname), metric prefix (eg pg1, pg2), or both of them.

I don't like the idea to have separated exporter instance running for every single Postgres server I happen to manage. Also some of the clusters are replicated across distant locations (which indeed asks for collecting metrics from far point or multiple points). Sometimes it's viable to run exporter on the same server as Postgres cluster and sometimes it would be really nice to monitor many Postgres instances belonging to a single cluster flared across various providers etc. from a single point.

Side note:

I also play with some (vague yet) idea that having data from a replicated cluster collected from one place could lead to even further development of the exporter (eg. so that we could use collected metrics in some sort of templating of the probes to gain additional flexibility and dynamicity of the monitoring process, or to slow down collecting in case of certain conditions, or to add something like priority queue to the scheduler, or perhaps even writting some results back to the database in a "loop" or to completely different database from the monitored ones (well, it's just SQL and if it was templated as well as if the check paramethers were templated base on some measurements then...)

😁

AFAIK, no Postgres collector is so finely grained as this one and also no I know of allows to collect metrics from multiple sources at once and make some use of the fact (if nothing else then at least "administratively" or saving overall resources). Therefor I think it can be a great improvement. Thank you.

fail connecting to primary server: fail fetching server version

Thanks for the great work on this! I'm running pg_exporter and I'm hitting an error on the precheck steps.

$ pg_exporter
INFO[0000] retrieve target url  from PG_EXPORTER_URL     source="pg_exporter.go:1938"
INFO[0000] retrieve config path pg_exporter.yaml from PG_EXPORTER_CONFIG  source="pg_exporter.go:2009"
ERRO[0000] fail connecting to primary server: fail fetching server version: driver: bad connection, retrying in 10s  source="pg_exporter.go:1517"

This appears to be where the query in question is made, SHOW server_version_num;

When I connect directly using psql at the same URI, I'm able to run it without issue.

$ psql $PG_EXPORTER_URL
psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1), server 14.0 (Debian 14.0-1.pgdg110+1))
WARNING: psql major version 12, server major version 14.
         Some psql features might not work.
Type "help" for help.

postgres=# show server_version_num;
 server_version_num 
--------------------
 140000
(1 row)

I'm not sure how/if it matters but I'm accessing this db over a TCP proxy using kubectl proxy. It doesn't seem to impact any other postgres clients but worth mentioning.

What am I missing?

error with PG_EXPORTER_NAMESPACE="pg"

Hi,

When in i put export PG_EXPORTER_NAMESPACE="pg" in default env file there is an error
..;pg_exporter[3999]: pg_exporter: error: strconv.ParseBool: parsing "pg": invalid syntax, try --help

If i exclude PG_EXPORTER_NAMESPACE but i set --namespace=pg in PG_EXPORTER_OPTS it work's
PG_EXPORTER_OPTS='--namespace=pg --log.level=info --log.format="logger:syslog?appname=pg_exporter&local=7"'

TLS/SSL support

Are there any plans for the exporter output to have TLS/SSL support? Just looking through docs and didn't see it mentioned.

Add option to test selected metric(s) and also options to select metric(s) to test, explain and configuration dump

Hello. I suggest to add

Flags
       [...]

       --test
           Testing run (single pass run returning collected '/metrics' to the standard output to be revised).
           To limit the test run to a certain list of metrics use the '-M' option.

       -M, --metrics LIST
           Limit testing run (see the '--test' option) to a list of comma (',') separated metric names.
           Only the SQL queries of the selected metrics will be run (eg. '--test --metrics pg_up,pg_uptime'
           will run the SQL queries of the 'pg_up' and 'pg_uptime' metrics).

       [...]

I also suggest to apply --metrics LIST to --explain similarly, if possible.

The --dry-run option should not produce any output instead of eg. warnings or errors etc. In fact, it must not even try to perform the connection to the Postgres etc. Therefor it's "dry run" (!) and everybody expects that it does literally nothing, no real action (which connection certainly is).

To dump the actual configuration I suggest to provide another option, say --show-config, --dump-config or --format-config or similar and to apply --metrics LIST to it as well.

That way you could easily

  1. see what's configured for certain metric (eg. --show-config -M pg_up);
  2. check if there is no error or discrepancy in the configuration (eg. --dry-run);
  3. test the metric of interest (eg. --test -M pg_up).

The exporter should return proper exit codes so that these features could be used in scripts etc.

Whether to provide also environment variables for the suggested options remains a question. I generally do not see any point in not doing so (it could be good for container users who tend to use environment variables).


I think all these things are somewhat related one to each other so I aggregated them in this single feature request, sorry for that.

Add includeDatabase option

There should be two kind of multiple database support policy

  • List target database with --include-database
    • can be a comma separated datname list
    • can be a regex which be used for name matching
  • Exclude specific database with --exclude-database

Histogram Support

Histogram are perfect for lock & session duration distribution.
e.g

pg_lock_histo{le=0}
pg_lock_histo{le=10}
pg_lock_histo{le=50}
pg_lock_histo{le=100}
pg_lock_histo{le=1000}
...

PG_EXPORTER_INCLUDE_DATABASE doesn't work

When I set PG_EXPORTER_AUTO_DISCOVERY = true
to discover the other databases they must absolutely be in --include_database.
The PG_EXPORTER_INCLUDE_DATABASE parameter is not supported.
why is it necessary to have the list of databases in this parameter?
For example if it is empty then we take all the databases which are not in exclude_database.

Add Column.Default option to set default value when null

It would be convenient to have the ability to define default value for metrics if null is acquired.

E.g: Lot's of metrics could be better to have 0 instead of NaN.

While using coalesce(col, 0) could work. It's just too ugly and hacky.

It would be nice to do it in configuration

    - exec_time:
        usage: COUNTER
        default: 0
        description: Total time spent executing the statement, in µs

Built-in metrics definition

Embed a default metrics definition into pg_exporter which support PG 10 - 14.

  • Embed static config into binary with Go 1.16 embed
  • Add option to disabled default metrics.
  • Add option to append new collectors rather than overwrite default collectors.

Dial tcp: lookup postgres on 127.0.0.53:53: server misbehaving, retrying in 10s source="pg_exporter.go:1379"

When I run the following command.

pg_exporter --url=postgresql://user:password@localhost:5432/?sslmode=disable&host=/var/run/postgresql --config=pg_exporter.yaml

I get the following error

INFO[0000] retrieve target url postgresql://user:password@localhost:5432/?sslmode=disable from command line  source="pg_exporter.go:1626"
INFO[0000] fallback on default config path: pg_exporter.yaml  source="pg_exporter.go:1666"
ERRO[0000] fail connecting to primary server: fail fetching server version: dial tcp: lookup postgres on 127.0.0.53:53: server misbehaving, retrying in 10s  source="pg_exporter.go:1379"
ERRO[0010] fail connecting to primary server: fail fetching server version: dial tcp: lookup postgres on 127.0.0.53:53: server misbehaving, retrying in 10s  source="pg_exporter.go:1383"
ERRO[0020] fail connecting to primary server: fail fetching server version: dial tcp: lookup postgres on 127.0.0.53:53: server misbehaving, retrying in 10s  source="pg_exporter.go:1383"

I am using Ubuntu 20.04 Focal Fossa LTS.
pg_exporter version being used: v0.2.0
Any assistance will be highly appreciated.

0.5.0 会 coredump

rpm 中 upx 之后的 binary 会 coredump,环境:5.14.0-162.23.1.el9_1.x86_64
手动编译未经 upx 则正常。

Error when scraping with 2 databases

Having 2 databases assigned, when I try to extract the metrics I have the following error:
An error has occurred while serving metrics:

11 error (s) occurred:

  • collected metric "pg_bgwriter_checkpoints_timed" {counter: <value: 73>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_checkpoints_req" {counter: <value: 7>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_checkpoint_write_time" {counter: <value: 51>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_checkpoint_sync_time" {counter: <value: 17>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_buffers_checkpoint" {counter: <value: 84>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_buffers_clean" {counter: <value: 0>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_buffers_backend" {counter: <value: 48>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_maxwritten_clean" {counter: <value: 0>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_buffers_backend_fsync" {counter: <value: 0>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_buffers_alloc" {counter: <value: 1516>} was collected before with the same name and label values
  • collected metric "pg_bgwriter_stats_reset" {counter: <value: 1.621928245e + 09>} was collected before with the same name and label values

Add pg_exporter connect PG instance timeout setting

If we deploy the pigsty to monitor the PG instance which is in the same datacenter, the pg_exporter connect time out is 100ms.

But in the read prod env, we will have multi-region PG instance. If we develop one pigsty to monitor all the PG instances in the different regions, the pg_exporter will have this error:

Nov 24 10:27:07 staging-gcp-sg-vm-platform-pigsty-1 pg_exporter_staging-gcp-hk-pgsql12-platform-1-1[28699]: time="2021-11-24T10:27:07Z" level=error msg="fail connecting to primary server: fail fetching server version: driver: bad connection, retrying in 10s" source="pg_exporter.go:1521"
Nov 24 10:27:07 staging-gcp-sg-vm-platform-pigsty-1 pg_exporter: time="2021-11-24T10:27:07Z" level=error msg="fail connecting to primary server: fail fetching server version: driver: bad connection, retrying in 10s" source="pg_exporter.go:1521"

But in fact, we can use psql command line to connect the PG instance in the pigsty host.

The time out config in the pg_exporter is:

image
image

As we discuss:

100ms以上会主动取消,判定抓取失败,避免雪崩。之前我也没想到会有跨数据中心抓取的情况。
从这个抓包情况看,大概正好打到100ms的阈值。
+150ms已经返回结果了,但是还是因为超时而主动请求报错

-- 但对于这个 timeout 的阈值,下一个版本是否也可以设置成为一个可选参数,默认还是 100ms, 如果有这种跨region的情况,那可以调整大一些,比如调整为1s, 这样按说对于监控也够了,好处是 一个 region一个VM部署 pigsty,就可以监控所有 region的PG实例了

-- 
Reasonable
欢迎帮我提个Issue啊,https://github.com/Vonng/pg_exporter
我下个Release修改一下

So I suggest that in the pg_export, we can set the time pg_exporter time_out threshold.
By default, the value is 100ms.
In the special env, such as we use one pigsty to monitor multi-data center PG instance, we can increase the value.

PostgreSQL 14 Support

Add support for PostgreSQL new metrics:

  • pg_stat_database
  • pg_locks
  • pg_stat_wal
  • pg_stat_replication_slots
  • pg_stat_wal
  • pg_stat_progress_copy
  • pg_prepared_statements

IMPROVEMENT: Respond fake pg_up metrics before planning

Since Queries (Collectors) are dynamically planned. PgExporter need an alive server to fetch facts.

Which comes into a dilemma: if target PostgreSQL Instance is dead, pg_exporter can not gathering facts thus will abort or wait target online according to config parameter. But sometimes we may want pg_exporter report the situation that target postgres instance is down.

There's a work around: Create a dummy server before PgExporter is waiting dead server online (responding with constant pg_up{} 0). And destroy that server after PgExporter successfully connecting to target server.

Add Column.Scale to multiply a factor to scraped metrics

It would be convenient to have the ability to scale metric value by a center factor.

E.g The origin metric is in µs , while common practice was transform those metrics into standard unit. such as second.

It would be annoying to do so in original raw SQL. But would be great to do it in configuration.

    - exec_time:
        usage: COUNTER
        scale: 1e-6
        description: Total time spent executing the statement, in µs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.