Comments (6)
Hello,
vmalert supports hot config reload by calling /-/reload
endpoint or using -configCheckInterval
flag.
I'd recommend to add a config reloader sidecar in your vmalert pod, which watches the rule files and calls /-/reload
when there is config update.
You can also use vm-operator to manage vmalert, which contains config reloader by default.
from victoriametrics.
Hello, vmalert supports hot config reload by calling
/-/reload
endpoint or using-configCheckInterval
flag. I'd recommend to add a config reloader sidecar in your vmalert pod, which watches the rule files and calls/-/reload
when there is config update. You can also use vm-operator to manage vmalert, which contains config reloader by default.
thanks for your reply.
but it cant help me. my vmalert has already use vm-operator to manage, and set the flag -configCheckInterval
. but pod also restart.
my helmchart config like this.
# bare k8s deployment for vmalert
vmalert:
enable: true
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations: {}
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
spec:
replicaCount: 2
image:
repository:
pullPolicy: Always
tag: "v1.89.1"
imagePullSecrets: []
podAnnotations: {}
podSecurityContext: {}
# fsGroup: 2000
securityContext: {}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 100m
memory: 128Mi
# Allowed values: `soft` or `hard`
podAntiAffinityPreset: hard
# configMap name of the prometheusRules
promRules:
- prometheus-app-telemetry-middleware-prometheus-rulefiles-.+
extraArgs: {}
# Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
# datasource.lookback: 5m
# How far a value can fallback to when evaluating queries. For example, if -datasource.queryStep=15s then param "step" with value "15s" will be added to every query. If set to 0, rule's evaluation interval will be used instead. (default 5m0s)
# datasource.queryStep: 5m
# Interval for checking for changes in '-rule' or '-notifier.config' files.
# By default the checking is disabled. Send SIGHUP signal in order to force config check for changes.
configCheckInterval: 60s
# How often to evaluate the rules (default 1m0s)
evaluationInterval: 30s
# External label to be applied for each rule
externalLabels: []
# - "prometheus=plat-diamond-metric/diamond-monitor-prometheus"
# - "prometheus_replica=prometheus-cluster-monitor-diamond-mo-prometheus-0"
service:
type: ClusterIP
port: 8080
notifierConfig:
dns_sd_configs:
- names:
- alertmanager-operated
type: 'A'
port: 9093
from victoriametrics.
@ALEX-yinhao , that's not expected.
Although, from this pic, I don't see vmalert pods got restarted when rules are modified, but pods prometheus-rulefiles-(I assum they are not vmalert) restarted. What pods prometheus-rulefiles- do here?
Do you see any logs when vmalert pod got terminated?
from victoriametrics.
@ALEX-yinhao , that's not expected. Although, from this pic, I don't see vmalert pods got restarted when rules are modified, but pods prometheus-rulefiles-(I assum they are not vmalert) restarted. What pods prometheus-rulefiles- do here? Do you see any logs when vmalert pod got terminated?
prometheus-app-telemetry-middleware-prometheus-rulefiles-
is configmap, this is created by prometheus-opeartor. in the past ,i use prometheus to archive alert. now i use vmalert instead of promethues , but want to use the rules of prometheus .
this rulefiles will be refresh all by prometheus-operator sometimes, and when the rulefiles refresh , vmalert wiil be stop and start a new pod , so you can see the pod restart count status is 0.
in the vmalert pod ,i cant see any error message. i only can see the log like this
2024-05-09T03:20:48.670Z info VictoriaMetrics/app/vmalert/main.go:189 service received signal terminated
from victoriametrics.
From the log, someone is sending terminate signal to vmalert. And since there is no config-reloader in vmalert pod, I'd guess you have some external service to do it.
my helmchart config like this.
...
# configMap name of the prometheusRules
promRules:
- prometheus-app-telemetry-middleware-prometheus-rulefiles-.+
If you already mount all the rules configMap in vmalert pod, you can just call /-/reload
endpoint.
And if you're using vm-opertor, I'd suggest to use vmrule[vm-operator can auto-convert prometheusRule to vmrule] and enable ruleSelector in VMAlertSpec, which brings automatically config reload.
from victoriametrics.
From the log, someone is sending terminate signal to vmalert. And since there is no config-reloader in vmalert pod, I'd guess you have some external service to do it.
my helmchart config like this.
...configMap name of the prometheusRules
promRules:
- prometheus-app-telemetry-middleware-prometheus-rulefiles-.+
If you already mount all the rules configMap in vmalert pod, you can just call
/-/reload
endpoint.And if you're using vm-opertor, I'd suggest to use vmrule[vm-operator can auto-convert prometheusRule to vmrule] and enable ruleSelector in VMAlertSpec, which brings automatically config reload.
yes,about the config-reloader, i set vmalert config reloader env in my vm-operator chart, but in the vmalert pod, i cant find the config-reloader
like this
env:
- name: VM_VMAGENTDEFAULT_CONFIGRELOADIMAGE
value: registry.sensetime.com/diamond/prometheus-operator/prometheus-config-reloader:v0.48.1
- name: VM_VMAUTHDEFAULT_CONFIGRELOADIMAGE
value: registry.sensetime.com/diamond/prometheus-operator/prometheus-config-reloader:v0.48.1
- name: VM_VMALERTDEFAULT_CONFIGRELOADIMAGE
value: registry.sensetime.com/diamond/jimmidyson/configmap-reload:v0.3.0
- name: VM_PODWAITREADYTIMEOUT
value: "180s"
- name: VM_PODWAITREADYINTERVALCHECK
value: "15s"
- name: VM_PODWAITREADYINITDELAY
value: "30s"
from victoriametrics.
Related Issues (20)
- vmauth per-user metrics can cause high memory usage in the long term HOT 2
- vmbackup cannot open a snapshot that it just created HOT 7
- How to drop selected metrics received at victoriametrics (single) from api/v1/write remoteWrite vmagents? HOT 4
- Scrape CRD's support by single-node VictoriaMetrics HOT 2
- victorialogs crash HOT 4
- ERROR: 422 on query with binary operation and keep_metric_names HOT 4
- vmui requests are not automatically quoted HOT 1
- Graphite queries not calculated correctly HOT 1
- how to set custom tag in filebeat output.elasticsearch _stream_fields HOT 2
- vmagent k8s target discovery is too slow HOT 4
- Data integrity problem after the vmstorage breaks down HOT 3
- victoria logsQL sort unexpected result. HOT 3
- How to search log as quickly as possible while writing log in client side HOT 2
- Attempts to configure VM for small memory footprint don't yield expected results HOT 2
- Maximum advised storage capacity for a vmstorage instance? HOT 1
- VictoriaMetrics Datadog APM/Trace Agent URL integration HOT 3
- MetricsQL: Document stalness marker differences between `default_rollup` and `last_over_time` HOT 2
- How to properly setup remoteWrite credentials in vmagent HOT 8
- VictoriaLogs UI sometimes loses log records in Firefox HOT 5
- Teach -httpAuth.username to read content of a file HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from victoriametrics.