Giter Club home page Giter Club logo

teraslice-exporter's Introduction

Teraslice Job Exporter README

Note: This exporter is only meant for use with Teraslice using Kubernetes clustering. It hasn't been tested with Teraslice running in Native clustering mode.

Usage

So far it works like this:

TERASLICE_URL="https://localhost" \
  TERASLICE_DISPLAY_URL="https://teraslice-xyz.lan" \
  DEBUG=True \
  NODE_EXTRA_CA_CERTS=/path/to/ca.crt \
  node dist/index.js | bunyan

All options are passed as environment variables

TERASLICE_URL="https://localhost" \
  TERASLICE_DISPLAY_URL="https://teraslice-xyz.lan" \
  DEBUG=True \
  NODE_EXTRA_CA_CERTS=/path/to/ca.crt \
  PORT=4242 \
  TERASLICE_QUERY_DELAY=90000 \
  node dist/index.js | bunyan

The TERASLICE_URL is the only environment variable that is required.

Environment variables

  • TERASLICE_URL - URL to the Teraslice Instance to Monitor
  • TERASLICE_DISPLAY_URL - Optional override of TERASLICE_URL for metric label purposes only
  • DEBUG - Enable debug logging
  • NODE_EXTRA_CA_CERTS - Standard Node variable to specify CA cert for SSL connections
  • PORT - The port that the http express server will listen on
  • TERASLICE_QUERY_DELAY - The delay between updating the Teraslice stats, this value is in ms.

Docker

Build the docker image:

docker build -t teraslice-exporter:v0.4.0 .

Run the docker image:

docker run --rm -p 3000:3000 \
    -e TERASLICE_URL="http://url.to.teraslice/" \
    teraslice-exporter:v0.4.0 | bunyan

Design

The exporter will scrape several of the Teraslice API endpoints every TERASLICE_QUERY_DELAY milliseconds and update it's exported metrics after that update is completed.

# HELP teraslice_controller_slicers_count Number of execution controllers (slicers) running for this execution.
# HELP teraslice_controller_slices_failed Number of slices failed.
# HELP teraslice_controller_slices_processed Number of slices processed.
# HELP teraslice_controller_slices_queued Number of slices queued for processing.
# HELP teraslice_controller_workers_active Number of Teraslice workers actively processing slices.
# HELP teraslice_controller_workers_available Number of Teraslice workers running and waiting for work.
# HELP teraslice_controller_workers_disconnected Total number of Teraslice workers that have disconnected from execution controller for this job.
# HELP teraslice_controller_workers_joined Total number of Teraslice workers that have joined the execution controller for this job.
# HELP teraslice_controller_workers_reconnected Total number of Teraslice workers that have reconnected to the execution controller for this job.
# HELP teraslice_execution_cpu_limit CPU core limit for a Teraslice worker container.
# HELP teraslice_execution_cpu_request Requested number of CPU cores for a Teraslice worker container.
# HELP teraslice_execution_created_timestamp_seconds Execution creation time.
# HELP teraslice_execution_info Information about Teraslice execution.
# HELP teraslice_execution_memory_limit Memory limit for Teraslice a worker container.
# HELP teraslice_execution_memory_request Requested amount of memory for a Teraslice worker container.
# HELP teraslice_execution_slicers Number of slicers defined on the execution.
# HELP teraslice_execution_status Current status of the Teraslice execution.
# HELP teraslice_execution_updated_timestamp_seconds Execution update time.
# HELP teraslice_execution_workers Number of workers defined on the execution.  Note that the number of actual workers can differ from this value.
# HELP teraslice_master_info Information about the Teraslice master node.
# HELP teraslice_query_duration Total time to complete the named query, in ms.
# HELP teraslice_exporter_errors Number of errors encountered by teraslice exporter.

teraslice-exporter's People

Contributors

briend avatar busma13 avatar godber avatar

Watchers

 avatar  avatar  avatar

teraslice-exporter's Issues

Capture error responses as a metric and log rather than crash

When teraslice-exporter gets a bad response from teraslice-master it causes the exporter to crash. Ideally these events would be caught, logged, added to its own metrics instead of exiting the process.

Example log and crash

> @0.2.0 start /app/source
> node dist/index.js
{
  "name": "teraslice_exporter",
  "terasliceUrl": "http://localhost:5678/",
  "hostname": "teraslice-foo-master-5c64d8c9f4-fvf6g",
  "pid": 19,
  "level": 50,
  "msg": "Error encountered getting terasliceStats: Error: Error getting http://localhost:5678/v1/cluster/controllers: HTTPError: Response code 500 (Internal Server Error)",
  "time": "2023-11-30T19:47:08.295Z",
  "v": 0
}
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! @0.2.0 start: `node dist/index.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the @0.2.0 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2023-11-30T19_47_08_318Z-debug.log

Add tests

Add test coverage. A good start would be making sure the terasliceStats.update() and updateTerasliceMetrics() function calls return the proper results or error responses. We can use nock to mock the responses from teraslice.

This exporter crashes if pointed at a Teraslice custer without state indices

When this exporter is bundled in a k8s pod with teraslice and that Teraslice cluster is new, there is no state cluster, this container will crash with the following error and delay the pod startup:

> @0.2.0 start /app/source
> node dist/index.js
(node:20) UnhandledPromiseRejectionWarning: Error: Added label "error" is not included in initial labelset: [
  'arch',
  'clustering_type',
  'name',
  'node_version',
  'platform',
  'teraslice_version',
  'url',
  'name'
]
    at /app/source/node_modules/prom-client/lib/validation.js:26:10
    at Array.forEach (<anonymous>)
    at validateLabel (/app/source/node_modules/prom-client/lib/validation.js:24:22)
    at /app/source/node_modules/prom-client/lib/gauge.js:211:3
    at Gauge.set (/app/source/node_modules/prom-client/lib/gauge.js:79:32)
    at Object.updateTerasliceMetrics (/app/source/dist/metrics.js:403:30)
    at main (/app/source/dist/index.js:51:15)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:20) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:20) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I think this is because there is no state indices on the initial teraslice pod startup. Possible fixes would be to look at this code and see if it can be fixed here, or wait to stat this container up until sometime after the teraslice container comes up.

Type error in v0.3.0

I must have made a mistake in the update from 0.2.0 to 0.3.0.
When trying to hit the /metrics endpoint I get the following error:

    at new NodeError (node:internal/errors:399:5)
    at write_ (node:_http_outgoing:872:11)
    at ServerResponse.end (node:_http_outgoing:1019:5)
    at /Users/peterluitjens/WORKSPACE/teraslice-exporter/dist/index.js:40:13
    at Layer.handle [as handle_request] (/Users/peterluitjens/WORKSPACE/teraslice-exporter/node_modules/express/lib/router/layer.js:95:5)
    at next (/Users/peterluitjens/WORKSPACE/teraslice-exporter/node_modules/express/lib/router/route.js:144:13)
    at Route.dispatch (/Users/peterluitjens/WORKSPACE/teraslice-exporter/node_modules/express/lib/router/route.js:114:3)
    at Layer.handle [as handle_request] (/Users/peterluitjens/WORKSPACE/teraslice-exporter/node_modules/express/lib/router/layer.js:95:5)
    at /Users/peterluitjens/WORKSPACE/teraslice-exporter/node_modules/express/lib/router/index.js:284:15
    at Function.process_params (/Users/peterluitjens/WORKSPACE/teraslice-exporter/node_modules/express/lib/router/index.js:346:12)

Add TERASLICE_DISPLAY_URL

We currently have a TERASLICE_URL which is the URL used for queries. But in a sidecar deployment that URL ends up being localhost so the URL presented in the metrics is localhost for all clusters. Lets add a TERASLICE_DISPLAY_URL environment variable that gets shown in the metrics (but NOT used for queries).

add teraslice_exporter_info metric and improve teraslice_exporter_errors

We should add an info metric, mostly so we can see the version, and instance info. Looking at the other metrics, it seems that url and name are the labels used for the instance info

teraslice_exporter_info{version="v0.4.0", url="http://ts-WHATEVER.example.com/",name="teraslice-WHATEVER"} 1

furthermore, we should add the url and name to the recently added teraslice_exporter_errors:

teraslice_exporter_errors{error_type="update_metrics_errors", url="http://ts-WHATEVER.example.com/",name="teraslice-WHATEVER"} 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.