bahmanm / lemmy-meter Goto Github PK
View Code? Open in Web Editor NEWA web application to track Lemmy instances performance and represent the results visually
Home Page: https://lemmy-meter.info
License: GNU General Public License v3.0
A web application to track Lemmy instances performance and represent the results visually
Home Page: https://lemmy-meter.info
License: GNU General Public License v3.0
The link points to localhost:3000
Investigate if it is possible to embed the health indicator gauges for a given instance in another website, the way a usual health "badge" works.
Thanks @unruffled for bringing this up.
Explore whether it is possible for viewers to sign up for notifications as to when their favourite instances becomes (partially) unavailable.
This may be potentially helpful for admins as well.
For this to happen:
There have been a couple of incidents already when Blackbox Exporter takes a very long time (10s+) to finish the "resolve" phase.
One suspect is the connection between the Docker daemon and provider's nameserver can become stale (:man_shrugging:) I patched the configuration to always use DNS servers outside the internal network.
However, I'd like to be alerted the next time this happens so I can start investigating right away.
As the first iteration, the following values and durations should be enough:
Currently during a deploy, the cluster is torn down and recreated which is very inefficient. It should be possible to instruct lemmy-meter
to restart/recreate only the changed services.
Based on the admins' feedback and real world experience, a default duration of 5m to trigger the warning and 10m to trigger the error is quite reasonable.
Configure Prometheus alerts and Alertmanager to notify instance admins/communities of outages/degraded performance, eg in a Matrix channel/chat or a Discord server.
Currently all the cluster nodes use mount binds which are not totally reliable. Use volumes for at least Prometheus and Grafana.
Follow up from a conversation in #lemmy-meter:matrix.org
Kamal v1.0.0 which has just been released seems to be an interesting alternative to Docker Compose. It's worth trying it out while lemmy-meter is in its early stages.
It should be possible to transfer the changes between local lemmy-meter and lemmy-meter.info w/o requiring the cluster to be stopped.
One workflow is
Or even better is to store the relevant Grafana configurations such data sources, users and dashboards so that they can be versioned in git.
Currently, the default configuration sends a good deal of HTTP requests per minute to an instance (~30 req/min.)
Tune it down to 2-4 req/min.
As it stands, this project is very good for detecting unplanned service outages, but there is currently not a way to distinguish between planned and unplanned outages.
Currently, matrix-webhoo which is used for Alert notifications is run as a separate user. Move it to the same cluster as other services to ensure fail-over and consistency.
There are calculations and metrics which rely on at 90 days of data to be available. The current retention period is 60 days.
It would be helpful to implement an endpoint to assist admins in validating scheduled-downtime.json
.
For example:
$ curl -X GET https://lemmy-meter.info/.metadata/validate-json?instance=<INSTANCE>
Invalid
<detailed error message>
Investigate whether it's possible to assume the site is in maintenance mode if it responds to probes w/ a special 5xx response such as 503.
Follow up from #22
In the case of the planned downtime Google sheet, there should be two new columns for cron schedules.
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
cluster/downtime-processor/cpanfile
perl 5.39.8
Mojolicious 9.35
Net::Prometheus 0.12
Data::Dump 1.25
Schedule::Cron::Events 1.96
Text::CSV 2.04
Moose 2.2207
JSON 4.10
JSON::Validator 5.14
File::Slurper 0.014
Data::UUID 1.226
Log::Log4perl 1.57
cluster/docker-compose.yml
prom/prometheus v2.50.1
grafana/grafana 10.3.3
prom/blackbox-exporter v0.24.0
prometheuscommunity/json-exporter v0.6.0
nginx 1.25
postgres 16.2
prom/alertmanager v0.27.0
ixdotai/smtp v0.5.2
binwiederhier/ntfy v2.8.0
cluster/downtime-processor/Dockerfile
perl 5.39.8
ansible/requirements.txt
ansible ==9.3.0
molecule ==6.0.3
molecule-plugins ==23.5.3
passlib == 1.7.4
Follow up on #22
It should be possible to scrape downtime schedules off predefined URLs from instances. For example, https://INSTANCE/.well-known/host-metadata.json
or https://INSTANCE/.well-known/scheduled-downtime.json
The current process for deploying a new version is quite laborious and involves scp
, wget
and unzip
which is just not right ๐
Ideally, there should be an Ansible playbook(s) to automate all or most aspects of that:
For the sake of simplicity, the task of deploying a cluster to a new machine can be skipped.
It should be possible to subscribe to a particular instance's alerts and receive a notification (eg an e-mail) whenever the alert is triggered.
It'd be useful to expose the health check results that lemmy-meter collects via some API to interested parties.
For example, uptime.lemmings.world could use such stats to generate uptime badges.
Things to note at the first pass:
Thanks @RikudouSage for bringing this up.
In cases like changes to Prometheus service discover files (eg adding an instance) there's no need to restart the cluster as Prometheus will pick the changes up OOTB.
With Prometheus alerts in place, there's no more need for the Grafana-Matrix bridge and it can be safely retired.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.