Giter Club home page Giter Club logo

lemmy-meter's People

Contributors

bahmanm avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lemmy-meter's Issues

Externally embeddable gauges

Investigate if it is possible to embed the health indicator gauges for a given instance in another website, the way a usual health "badge" works.


Thanks @unruffled for bringing this up.

Investigate alerts and notifications

Explore whether it is possible for viewers to sign up for notifications as to when their favourite instances becomes (partially) unavailable.

This may be potentially helpful for admins as well.

For this to happen:

  1. There should be an un/subscribe form.
  2. lemmy-meter should be able to able to send e-mails - probably plenty of them.
  3. Reasonable alerts should be configured.

Configure alerts for slow DNS resolution

There have been a couple of incidents already when Blackbox Exporter takes a very long time (10s+) to finish the "resolve" phase.

One suspect is the connection between the Docker daemon and provider's nameserver can become stale (:man_shrugging:) I patched the configuration to always use DNS servers outside the internal network.

However, I'd like to be alerted the next time this happens so I can start investigating right away.

Configure Alertmanager

Configure Prometheus alerts and Alertmanager to notify instance admins/communities of outages/degraded performance, eg in a Matrix channel/chat or a Discord server.

Try out Kamal instead of Compose

Kamal v1.0.0 which has just been released seems to be an interesting alternative to Docker Compose. It's worth trying it out while lemmy-meter is in its early stages.

Import/export Grafana dashboards w/ zero downtime

It should be possible to transfer the changes between local lemmy-meter and lemmy-meter.info w/o requiring the cluster to be stopped.

One workflow is

  1. Grab latest dashboards from remote
  2. Experiment and make changes locally
  3. Upload the changes to remote

Or even better is to store the relevant Grafana configurations such data sources, users and dashboards so that they can be versioned in git.

Integrate Alertmanager with ntfy

  • Configure ntfy to run as a component in the cluster.
  • Write a webhook receiver which translates Alertmanager payload to ntfy model.
  • Configure Alertmanager to use the said receiver.

Run matrix-webhook in the cluster

Currently, matrix-webhoo which is used for Alert notifications is run as a separate user. Move it to the same cluster as other services to ensure fail-over and consistency.

Endpoint to validate scheduled downtime file

It would be helpful to implement an endpoint to assist admins in validating scheduled-downtime.json.

For example:

$ curl -X GET https://lemmy-meter.info/.metadata/validate-json?instance=<INSTANCE>
Invalid 
<detailed error message>

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

cpanfile
cluster/downtime-processor/cpanfile
  • perl 5.39.8
  • Mojolicious 9.35
  • Net::Prometheus 0.12
  • Data::Dump 1.25
  • Schedule::Cron::Events 1.96
  • Text::CSV 2.04
  • Moose 2.2207
  • JSON 4.10
  • JSON::Validator 5.14
  • File::Slurper 0.014
  • Data::UUID 1.226
  • Log::Log4perl 1.57
docker-compose
cluster/docker-compose.yml
  • prom/prometheus v2.50.1
  • grafana/grafana 10.3.3
  • prom/blackbox-exporter v0.24.0
  • prometheuscommunity/json-exporter v0.6.0
  • nginx 1.25
  • postgres 16.2
  • prom/alertmanager v0.27.0
  • ixdotai/smtp v0.5.2
  • binwiederhier/ntfy v2.8.0
dockerfile
cluster/downtime-processor/Dockerfile
  • perl 5.39.8
pip_requirements
ansible/requirements.txt
  • ansible ==9.3.0
  • molecule ==6.0.3
  • molecule-plugins ==23.5.3
  • passlib == 1.7.4

  • Check this box to trigger a request for Renovate to run again on this repository

Scrape downtime schedules off instances

Follow up on #22


It should be possible to scrape downtime schedules off predefined URLs from instances. For example, https://INSTANCE/.well-known/host-metadata.json or https://INSTANCE/.well-known/scheduled-downtime.json

Automate the rollout of a new version

The current process for deploying a new version is quite laborious and involves scp, wget and unzip which is just not right ๐Ÿ˜…

Ideally, there should be an Ansible playbook(s) to automate all or most aspects of that:

  • Deploying a new version of lemmy-meter
  • Deploying Grafana dashboards
  • Restarting the cluster
  • Restarting a particular service

For the sake of simplicity, the task of deploying a cluster to a new machine can be skipped.

Configure alerts

It should be possible to subscribe to a particular instance's alerts and receive a notification (eg an e-mail) whenever the alert is triggered.

Expose stats via APIs

It'd be useful to expose the health check results that lemmy-meter collects via some API to interested parties.

For example, uptime.lemmings.world could use such stats to generate uptime badges.


Things to note at the first pass:

  • The API shouldn't be public. Not at least for now, as lemmy-meter simply hasn't got the infrastructure for that.
  • There are two types of data that lemmy-meter ingests and stores: snapshot and time-series. Again, for the infrastructural reason, for the time being, the focus should be on the snapshot data.

Thanks @RikudouSage for bringing this up.

Retire matrix-webhook

With Prometheus alerts in place, there's no more need for the Grafana-Matrix bridge and it can be safely retired.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.