twin / gatus Goto Github PK
View Code? Open in Web Editor NEW⛑ Automated developer-oriented status page
Home Page: https://gatus.io
License: Apache License 2.0
⛑ Automated developer-oriented status page
Home Page: https://gatus.io
License: Apache License 2.0
If the health of your services are checked every 5 minutes, there's no point refreshing the page every 10 seconds.
Therefore, a dropdown should be added for setting the refresh interval.
Should be able to click on the service:
and get a modal with the following information:
There is any way is calculated automatically so we can get it?
Hi !
There is an extra comma at the end of the array, it may cause parsing errors. Mattermost (Slack compatible) for example is rejecting the payload (400 HTTP Error).
Regards,
Hi!
Actually I'm using a service called "Uptime Robot" that offers this check: if a certificate is going to expire in X days, it sends me a message on slack or an email.
This can probably be a new condition [CERTIFICATE] or [CERT_EXPIRE] or [CERTIFICATE_EXPIRE] like:
metrics: true # Whether to expose metrics at /metrics
services:
- name: twinnation # Name of your service, can be anything
url: "https://twinnation.org/health"
interval: 30s # Duration to wait between every status check (default: 60s)
conditions:
- "[STATUS] == 200" # Status must be 200
- "[BODY].status == UP" # The json path "$.status" must be equal to UP
- "[RESPONSE_TIME] < 300" # Response time must be under 300ms
- "[CERTIFICATE_EXPIRE] > 1week" # Certificate must expire in more than a week
- name: example
url: "https://example.org/"
interval: 30s
conditions:
- "[STATUS] == 200"
The check can be done with standard go library or something like github.com/genkiroid/cert
. This summer i worker on a custom go program similar to Gatus written with this library:
// Load the certificate from the host.
certificate := cert.NewCert(domain.Host)
// Check if it's expiring in less than 1 month.
if certificate.Detail().NotAfter.Before(time.Now().AddDate(0, 1, 0)) {
.....
I leave here the feature proposal. I know that is not exactly a thing that is useful to check every 30seconds, but in my specific use case it will be really useful if this can be tested day by day or something similar.
Anyway, thank you for Gatus! It's a really good self-hosted solution for this type of monitoring.
Adding 3 dots on the top right corner of each services would be pretty useful.
For now, an useful option to add to the dropdown would be "Badges", which would open up a modal with the following content:
![Uptime 7d](https://status.twinnation.org/api/v1/badges/uptime/7d/group-core-service-twinnation%20-%20external.svg)
![Uptime 24h](https://status.twinnation.org/api/v1/badges/uptime/24h/group-core-service-twinnation%20-%20external.svg)
![Uptime 1h](https://status.twinnation.org/api/v1/badges/uptime/1h/group-core-service-twinnation%20-%20external.svg)
Assume you have Gatus deployed in several security subnets (or zones) to monitor individual services because one single Gatus instance is not able to reach those services (administratively prohibited due to firewall rules, etc.)
But you want one main Gatus instance which is capable of retrieving health information of services from all those other Gatus instances to display them in one unified Gatus Dashboard.
Can we start a short discussion on this?
PS. Thank you for this great project!
By mocking the HTTP client, we can also mock the response, therefore letting us test much more specific use cases.
We could even add better tests for alerting providers.
For an example, see:
First of all, very good job! I was working on something like this aswell, but not as nice.
Have you thought about adding support for exporting the host http status as a prometheus metric? This would be so nice to have when using tools like grafana as you main source of monitoring.
Using a guage to toggle between 0 (offline/not 200-ok) and 1 (online/200-ok) can be enough to give value to a grafana dashboard and its alerting capabilities.
Again, solid work.
Supporting service groups could allow a cuter front end experience.
i.e.
services:
- name: k8s-cluster-watch-dog
url: http://k8s-cluster-watch-dog-v1.tools-${ENVIRONMENT}:8080/health
group: core <-------
interval: 1m
conditions:
- "[STATUS] == 200"
- "[BODY].status == UP"
- name: prometheus
url: http://prometheus-operator-prometheus.kube-system:9090/-/healthy
group: core <-------
interval: 1m
conditions:
- "[STATUS] == 200"
- "[BODY] == Prometheus is Healthy."
would generate a dashboard that puts both k8s-cluster-watch-dog
and prometheus
under the "core" folder.
Could also support tags, and allow filtering by tags instead
i.e.
services:
- name: k8s-cluster-watch-dog
url: http://k8s-cluster-watch-dog-v1.tools-${ENVIRONMENT}:8080/health
tags: <-------
- core
interval: 1m
conditions:
- "[STATUS] == 200"
- "[BODY].status == UP"
- name: prometheus
url: http://prometheus-operator-prometheus.kube-system:9090/-/healthy
tags: <-------
- core
- metrics
interval: 1m
conditions:
- "[STATUS] == 200"
- "[BODY] == Prometheus is Healthy."
Forgive the terrible drafts, just thought of this on the fly.
Hi,
first of all thank you for providing this great tool!
I am currently testing it in our environment and noticed that some health endpoints I like to monitor do not respond within 10 seconds as they check multiple system dependencies themselves. It would be great to be able to configure the timeout for HTTP (and TCP) requests.
Furthermore, I noticed that the user-agent is currently Go-http-client/2.0
which makes it hard to identify the requests as the ones coming from Gatus. Would be great if this became a config option as well or a string that contains the name of the software.
Not all services are web based - but they can be checked fairly simply (i.e. see if they're at least running) by opening a TCP socket. If the socket opens successfully, the service is considered online.
This is crude - but at least opens the door to monitoring applications which are not based on HTTP.
This is pretty similar to this request: #5 but should be a lot more generic.
As mentioned in my previous issue #61 i'm currently using healthchecks, and the last feature i would need is some generic content regex / string presence check. now, i might be missing something everybody already knows, since you have this BODY placeholder, and maybe that's just it.
anyway, it seems that the process basically would mean gatus curls the site content and runs a regex or literal string against it, and when it finds it, test passes.
what do you think?
in terms of performance, i think that this is actually the most heavy that healthchecks is doing, but already now i'm running all the tests in parallel, and gatus outperforms healthchecks by magnitudes. so nice.
At the moment, it seems that the status results are stored in an in-memory map: https://github.com/TwinProduction/gatus/blob/3773f952a80058eb88f48fe9ae9ac51bf1c1efe7/watchdog/watchdog.go#L16
And also limited to only 20 results:
https://github.com/TwinProduction/gatus/blob/3773f952a80058eb88f48fe9ae9ac51bf1c1efe7/watchdog/watchdog.go#L59-L62
It'd be great if these were stored in a persistent data store somehow instead (e.g a database, or files on disk). Whilst Gatus currently only returns the last 20 results, it'd be nice to keep the history to review outages that might have occurred in the past. Storing the results in a database/persistence layer would enable this as the first step.
Of course, the option to retain results in memory should also be kept as it makes Gatus very easy to get up and running.
Hello!
This might be a rather ambitious feature request, but I'll try my luck.
It would be great to have the ability to dynamically create monitored backends via labels within docker. Similar to how traefik uses labels to define backends for load balancing / content switching.
This would work by monitoring the docker socket for events, pulling the configuration out of labels and creating monitored backends dynamically from this information. The advantage of this is that configuration of gatus can be essentially static, and monitored configuration is applied per container.
By extension, this would allow additional status checks for docker based containers (i.e. is the container running or not). Any containers which cease to exist (i.e. deleted, not stopped) are removed from gatus' interface.
It would be great if we can add an alert when the service is up again.
We used it to monitor our kubernetes services and so it can be down for a time then up again without any actions, so if we can be inform that the service is up again it's great.
Something like that could be used
alerts:
- type: slack
enabled: true
description: "healthcheck failed 3 times in a row"
upAlert: true
Hi !
I want to use a self hosted Mattermost for custom alerting notifications, unfortunately it does use an internal certificate authority not recognized by Gatus :
[watchdog][handleAlertsToTrigger] Ran into error sending an alert: Post "https://mattermost.****.***/hooks/hy3ajdg*****": x509: certificate signed by unknown authority
It would be nice to add an insecure option for alerting.custom.
Thank you for Gatus ! Easy to setup and configure, gets the job well done !
Best regards,
Support alerting via Slack webhook and/or PagerDuty
alerting:
slack: "http://...."
pagerduty: "http://...."
services:
- name: example
url: https://example.org/
interval: 30s
alerts:
- type: slack
enabled: true
threshold: 5
description: "Request failed 5 times in a row"
- type: pagerduty
enabled: true
threshold: 10
description: "Request failed 10 times in a row"
conditions:
- "[STATUS] == 200"
Would trigger a Slack alert after 5 failures in a row and a PagerDuty alert after 10 failures in a row.
Could store the number of failures in a row in the service rather than in the result, that way there would be no need to store more than 20 results per service. Might also help prevent triggering the same alerts twice?
It's currently possible to use Discord as an alerting provider by using the custom
configuration:
alerting:
custom:
url: "https://discord.com/api/webhooks/******/*********/slack"
method: "POST"
body: |
{
"text": ":helmet_with_cross: **Gatus**\nAn alert for **[SERVICE_NAME]** has been **[ALERT_TRIGGERED_OR_RESOLVED]**\n> [ALERT_DESCRIPTION]"
}
The result from the custom alert above looks like this:
⛑️ Gatus
An alert for example has been TRIGGERED
wow
While this works, custom alerts don't currently support placeholders for individual conditions, which is why non-custom providers are necessary for sending visually pleasant notifications:
This issue is for implementing notifications similar to that of Slack, but for Discord.
The provider configuration should look like so:
alerting:
discord:
webhook-url: "https://discord.com/api/webhooks/******/*********"
The easiest way to implement this feature would be to:
/slack
to the end of alerting.discord.webhook-url
{ "text": "..." }
rather than slack.AlertProvider
's more complex request body, as it doesn't seem to work with Discord's webhook.I'm planning on implementing this soon, but if somebody else would like to give it a try before I do, feel free to let me know.
Gatus is awesome. There is a way to setup a maintenance window to avoid being notified if a server (or many) goes down?
that's a feature request. i am thinking about using gatus to replace https://github.com/arachnys/cabot but it's currently missing low level tests for infrastructure. any chance this would make sense to be implemented?
I would love to see DNS based Health checks
something like
type: dns
url: "udp://127.0.0.1:53"
queryname: "host.example.org"
querytype: "A"
conditions:
- "[STATUS] == NOERROR"
- "[DNS].response == 1.2.3.4"
Support client SSL certificate and SSL skip validation
Thanks so much for adding this!
Not sure if I am doing this wrong BUT I just found that if I generate a SHA512 password and the alpha chars are uppercase it doesn't seem to work.. Maybe the code needs to convert it to lowercase when read from the config
Ie this didn't seem to work when I put it in the config:
BC547750B92797F955B36112CC9BDD5CDDF7D0862151D03A167ADA8995AA24A9AD24610B36A68BC02DA24141EE51670AEA13ED6469099A4453F335CB239DB5DA
Yes this did:
bc547750b92797f955b36112cc9bdd5cddf7d0862151d03a167ada8995aa24a9ad24610b36a68bc02da24141ee51670aea13ed6469099a4453f335cb239db5da
(the password here is "password1" ... safest password ever)
Hope that helps.
there are some API for gatus?
I'm getting a check on connected for an ICMP but 0ms response time on a host that is down right now. Trying a manual ping from the docker host shows the same.
I've changed the condition to "[RESPONSE_TIME] > 0"
When performing an HTTP request, Gatus computes the duration for the whole HTTP client library call, which includes the DNS request time.
My proposal is to make the response time be only the HTTP response time (+- TLS?).
This can probably be implemented using Go's HTTP Tracing feature.
Hi,
I Really like Gatus, and have an question about config - as an example I have a domain that has two A records, one or the other could be the one that answers first - so how could I config gatus with that? Like adding an OR in Body response (yeah, this did not work :) )
Like:
dns:
query-name: "mydomain.com"
query-type: "A"
conditions:
- "[BODY] == 199.1.1.0||199.1.1.1"
It would be great if Gatus had the possibility to retry a failed request X
times with Y
interval to verify an endpoint is "really" down and it is not just a short hiccup.
I saw this is already implemented for alerts (services[].alerts[].failure-threshold
), but with this setting it will be shown as down on the dashboard anyway. And the interval for the retry is the same.
I was thinking of something like this:
- name: google
url: "https://google.com"
interval: 1m
retry-count: 2
retry-interval: 10s
conditions:
- "[STATUS] == 200"
In case retry-interval
is not set, interval
is used. Not sure if it makes sense though to have the alert failure threshold and a generic retry logic.
Supporting patterns would open up a lot of possibilities, such as:
[IP] == 10.*
[BODY].url == *example.com/images/*
One of the challenges is detecting what's a pattern and what isn't.
To some extent, yes, *
could be escaped, but due to the way conditions are resolved, when both sides of the equation are compared, they're already resolved, this could cause problems.
For instance, let's say you have the following condition:
[BODY].comment == *butterfl*
When it would be resolved, it would then become the following, assuming [BODY].comment
is "I love butterflies":
I love butterflies == *butterfl* // condition passes
At this point, we can easily detect which one is the pattern and which one isn't, but what if [BODY].comment
contains a *
?:
f*** this == *butterfl* // condition fails
So what, do we compare two patterns? That's weird.
Now let's pretend that the condition doesn't actually contain a pattern, but the resolved placeholder does:
[BODY].name == john.doe
and let's assume [BODY].name
is resolved to *
:
* == john.doe // condition passes
So now you'd have placeholders potentially triggered a pattern check for no reason.
Just like the length function (len()
, i.e. len([BODY].name)
), a pattern function could be introduced:
[BODY].comment == pattern(*butterfl*)
is resolved to
I love butterflies == pattern(*butterfl*) // condition passes
and
[BODY].name == john.doe
assuming [BODY].name
is resolved to *
, would give this:
* == john.doe // condition fails
which would fail because *
isn't wrapped by pattern(
and )
You could however make the placeholder value a pattern, if you wanted:
pattern([BODY].name) == john.doe
pattern(*) == john.doe // condition passes
Readability is important, but so is consistency.
The length function is len(...)
- a shorter version of length(...)
.
So for the sake of consistency, the pattern function should be pat(...)
, a shorter version of pattern(...)
.
I originally wanted to call it pattern
despite the reasoning above, but after further consideration, since the string resolved will contain one or multiple asterisk (i.e. pat(blabla*)
), it should be clear enough.
Here's an example:
Condition:
[IP] == pat(10.*)
Result:
10.0.0.0 == pat(10.*)
I believe that by default, the pat(10.*)
is self-explanatory.
Could be done in a separate repository and a link to that repository could be documented in Gatus' README
How can I send request to ip address with request header Host? Is there any way?
Example:
- name: test
url: "https://5.5.5.5/asd"
headers:
Host: www.google.com
interval: 5s
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 1000"
This scenario not working
</HEAD><BODY>
<H1>Invalid URL</H1>
The requested URL "[no URL]", is invalid.<p>
Reference #9.b5df3a17.1608149667.2020b03
</BODY></HTML>```
On a single site I get this error.
From my quick googling "server misbehaving" seems to be an dns error with go?
Maybe if that error occurs a second try for the sit should be done before marking the site as failing?
Out of out 35 sites theres only one that does this.
Or at least give a better error message than this, something like "DNS error".
Good day
Awesome project.
I have an idea / suggestion:
How would you feel about a context root?
For example if someone wanted to host on /status/
It would be nice to reload services cfg. This enable us to dynamically create config from another source (like fetching services from consul and generating service config) and reload it via http api or some interval.
Actually, assets are loaded using a relative path in the repository:
https://github.com/TwinProduction/gatus/blob/76d45d7eb8bca0dcf01e503234832d45e8aee884/main.go#L34
My proposal is to bundle assets into the gatus
binary, to ease distribution with a single binary.
This can probably be done with go-bindata.
Currently, Gatus binds to *:8080
:
While it is suitable for a container deployment, it may be desirable to, for example:
localhost
to prevent direct access from the Internet (when running on the host),8080
is already taken.My proposal is to add an option for configuring address and port to bind to.
Hi, I would like to deploy this project on PaaS like Heroku, but port is fixed.
In PaaS, the application should use port provided by the environment variable. This is also a good practice given by 12 Factor methodology.
To solve this issue, i suggest a small modification as follow :
port := os.Getenv("PORT")
ret, err := strconv.Atoi(port)
if err != nil {
panic(fmt.Sprintf("port not provide"))
}
Thanks,
Regards
output on mattermost log:
Jan 16 18:02:23 host docker-compose[3074]: mattermost | {"level":"debug","ts":1610812943.5811453,"caller":"web/webhook.go:52","msg":"Incoming webhook received","webhook_id":"<secret>","request_id":"znoajjaje7gsuphp1yry9tkbcw","payload":"null"}
Jan 16 18:02:23 host docker-compose[3074]: mattermost | {"level":"error","ts":1610812943.5813825,"caller":"mlog/log.go:229","msg":"Unable to parse incoming data.","path":"/hooks/<secret>","request_id":"znoajjaje7gsuphp1yry9tkbcw","ip_addr":"10.0.0.23","user_id":"","method":"POST","err_where":"IncomingWebhookRequestFromJson","http_code":400,"err_details":"invalid character 'e' after object key:value pair"}
Jan 16 18:02:23 host docker-compose[3074]: mattermost | {"level":"debug","ts":1610812943.581708,"caller":"web/handlers.go:100","msg":"Received HTTP request","method":"POST","url":"/hooks/<secret>","request_id":"znoajjaje7gsuphp1yry9tkbcw","status_code":"400"}
I'd like to run gatus on my raspberry pi cluster, but the docker image twinproduction/gatus
is built only with linux/amd64
architecture support.
PRI4 Output:
arch; docker run twinproduction/gatus
armv7l
standard_init_linux.go:211: exec user process caused "exec format error"
failed to resize tty, using default size
This is how you can make sure that the manifest does not contain any CPU architecture description.
docker buildx imagetools inspect twinproduction/gatus
I added few build steps to Github Actions for cross compilation of docker images and now gatus works on raspberry.
arch; docker run pentusha/gatus
armv7l
2020/12/02 09:21:50 [config][Load] Reading configuration from configFile=config/config.yaml
2020/12/02 09:21:50 [config][validateAlertingConfig] Alerting is not configured
2020/12/02 09:21:50 [config][validateServicesConfig] Validated 5 services
2020/12/02 09:21:50 [main][main] Listening on 0.0.0.0:8080/
2020/12/02 09:21:51 [watchdog][monitor] Monitored serviceName=frontend; success=true; errors=0; requestDuration=648ms;
2020/12/02 09:21:51 [watchdog][monitor] Monitored serviceName=backend; success=true; errors=0; requestDuration=218ms;
2020/12/02 09:21:52 [watchdog][monitor] Monitored serviceName=monitoring; success=true; errors=0; requestDuration=223ms;
docker buildx imagetools inspect pentusha/gatus | grep Platform
Platform: linux/amd64
Platform: linux/arm/v6
Platform: linux/arm/v7
Platform: linux/arm64
I also tried to some other platforms, but build was failed so I keep only ARM.
Gatus could support dynamic POST body variables.
To enable checking the functionality of API's.
It could look like this:
- name: "test variables"
url: "https://api.example.com/v1/orders"
method: POST
body: { "amount" : [RAND(1,10)]}
conditions:
- "[BODY].orders.amount == [LAST_RAND]"
It would be good to be able to put a password on the dashboard page so I can secure it from being visible to everyone.
Great project. I was looking to write something similar myself but have not had time. Don't need to now.
Gatus should provide a way to add conditions to validate the value of a header.
Here's what it could look like:
- name: example
url: "https://example.org/"
interval: 30s
conditions:
- "[HEADER].Content-Encoding == gzip"
Note that thanks to RFC 2616, headers must be case insensitive.
Some placeholders return durations, in milliseconds.
My proposal is to be able to compare durations with human-readable strings along with integers, such as 2 minutes
or 28 days
, which internally resolves to milliseconds.
The choice of supported formats is left for the implementation.
When a condition is failing, Gatus prints the full response body in output. It may be useful when monitoring APIs, but it is not for a big HTML page.
My proposal is to either:
Hi - this looks very promising. What is your view on supporting additional monitoring options besides sending HTTP requests? In particular it would be great to also monitor service health for backend services with gatus. I read you are deploying this in Kubernetes and I would want to do the same. It would be great if we could also probe for the Kubernetes health status of backend services, which do not provide HTTP endpoints. Anyway, would this be out of scope for your project here?
alerting.slack
and alerting.pagerduty
are strings while alerting.twilio
and alerting.custom
are structs.
For the sake of making adding providers easier, they should all be structs.
I'm aware that this will be a breaking change, but I'd rather do it sooner than later, lest the inconsistencies scare potential contributors away from waiting to implement other alerting providers.
So I had alerts come in for 3 of the sites i monitor over the weekend.
But I didn't look at them until this morning.
All I know is something happened to make the alert send.
I have 2 conditions set, its not uncommon for my sites to get > 1000 ever so often.
After 2 failed checkins it will alert me via teams of the issue.
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 1000"
Is it possible to say what condition was invalid in the alert to me via teams? (via the description of the custom alert)
Also to see what that value of the failing condition was.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.