Giter Club home page Giter Club logo

operator's Introduction

Latest Release Docker Pulls Slack GitHub license Go Report Build Status

Victoria Metrics logo

VictoriaMetrics operator

Overview

Design and implementation inspired by prometheus-operator. It's great a tool for managing monitoring configuration of your applications. VictoriaMetrics operator has api capability with it. So you can use familiar CRD objects: ServiceMonitor, PodMonitor, PrometheusRule, Probe and ScrapeConfig. Or you can use VictoriaMetrics CRDs:

  • VMServiceScrape - defines scraping metrics configuration from pods backed by services.
  • VMPodScrape - defines scraping metrics configuration from pods.
  • VMRule - defines alerting or recording rules.
  • VMProbe - defines a probing configuration for targets with blackbox exporter.
  • VMScrapeConfig - define a scrape config using any of the service discovery options supported in victoriametrics.

Besides, operator allows you to manage VictoriaMetrics applications inside kubernetes cluster and simplifies this process quick-start With CRD (Custom Resource Definition) you can define application configuration and apply it to your cluster crd-objects.

Operator simplifies VictoriaMetrics cluster installation, upgrading and managing.

It has integration with VictoriaMetrics vmbackupmanager - advanced tools for making backups. Check Backup automation for VMSingle or Backup automation for VMCluster.

Use cases

For kubernetes-cluster administrators, it simplifies installation, configuration, management for VictoriaMetrics application. And the main feature of operator - is ability to delegate applications monitoring configuration to the end-users.

For applications developers, its great possibility for managing observability of applications. You can define metrics scraping and alerting configuration for your application and manage it with an application deployment process. Just define app_deployment.yaml, app_vmpodscrape.yaml and app_vmrule.yaml. That's it, you can apply it to a kubernetes cluster. Check quick-start for an example.

Operator vs helm-chart

VictoriaMetrics provides helm charts. Operator makes the same, simplifies it and provides advanced features.

Documentation

  • quick start doc

  • high availability doc

  • relabeling configuration doc

  • managing crd objects versions doc

  • design and description of implementation design

  • operator objects description doc

  • backups docs

  • external access to cluster resourcesdoc

  • security doc

  • resource validation doc

    NOTE documentations was moved into main VictoriaMetrics repo link All changes must be done there.

Configuration

Operator configured by env variables, list of it can be found at link

It defines default configuration options, like images for components, timeouts, features.

Kubernetes' compatibility versions

operator tested on officially supported Kubernetes versions

Community and contributions

Feel free asking any questions regarding VictoriaMetrics:

Development

Dependencies:

  • kubebuilder v4
  • golang 1.22+
  • kubectl
  • docker

start:

make run

to run unit tests

make test

to run e2e tests on automatically configured Kind cluster

# make test-e2e

operator's People

Contributors

amper avatar andrewchubatiuk avatar artifactori avatar blesswinsamuel avatar cosrider avatar denisgolius avatar dependabot[bot] avatar dmitryk-dk avatar elmariofredo avatar f41gh7 avatar fatsheep9146 avatar flokli avatar g7r avatar gidesh avatar hagen1778 avatar haleygo avatar incubator4 avatar iyuroch avatar jaysonyangxd avatar k1rk avatar looka149 avatar lujiajing1126 avatar miketth avatar shichanglin5 avatar tamcore avatar tenmozes avatar umezawatakeshi avatar valyala avatar zekker6 avatar zhiyin009 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

operator's Issues

Default installation reports error

Steps:

  • install crds from release
  • install operator
  • open operator logs
E0802 07:14:29.150717       1 reflector.go:178] k8s.io/[email protected]+incompatible/tools/cache/reflector.go:125: Failed to list *v1beta1.VMCluster: vmclusters.operator.victoriametrics.com is forbidden: User "system:serviceaccount:default:vm-operator" cannot list resource "vmclusters" in API group "operator.victoriametrics.com" at the cluster scope
{"level":"info","ts":1596352469.9754841,"logger":"controller_vmcluster","msg":"api resource doesnt exist, waiting for it","group":"monitoring.coreos.com/v1","kind":"PodMonitor"}
E0802 07:14:30.562840       1 reflector.go:178] k8s.io/[email protected]+incompatible/tools/cache/reflector.go:125: Failed to list *v1beta1.VMAgent: vmagents.operator.victoriametrics.com is forbidden: User "system:serviceaccount:default:vm-operator" cannot list resource "vmagents" in API group "operator.victoriametrics.com" at the cluster scope
{"level":"info","ts":1596352471.72563,"logger":"controller_vmcluster","msg":"api resource doesnt exist, waiting for it","group":"monitoring.coreos.com/v1","kind":"PrometheusRule"}
{"level":"info","ts":1596352471.8257701,"logger":"controller_vmcluster","msg":"api resource doesnt exist, waiting for it","group":"monitoring.coreos.com/v1","kind":"ServiceMonitor"}
E0802 07:14:33.111039       1 reflector.go:178] k8s.io/[email protected]+incompatible/tools/cache/reflector.go:125: Failed to list *v1beta1.VMRule: vmrules.operator.victoriametrics.com is forbidden: User "system:serviceaccount:default:vm-operator" cannot list resource "vmrules" in API group "operator.victoriametrics.com" at the cluster scope

Error's creating crd's

Hi,

customresourcedefinition.apiextensions.k8s.io/vmclusters.operator.victoriametrics.com configured
customresourcedefinition.apiextensions.k8s.io/vmpodscrapes.operator.victoriametrics.com configured
customresourcedefinition.apiextensions.k8s.io/vmrules.operator.victoriametrics.com configured
customresourcedefinition.apiextensions.k8s.io/vmservicescrapes.operator.victoriametrics.com configured
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmagents.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmalertmanagers.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmalerts.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmsingles.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]

multiple vmagent/vmalert resources

For rule/monitor objects reconciliation we need to know corresponding vmalert/vmagent

  1. monitors:
    https://github.com/VictoriaMetrics/operator/blob/master/pkg/controller/servicemonitor/servicemonitor_controller.go#L114
  2. rules:
    https://github.com/VictoriaMetrics/operator/blob/master/pkg/controller/prometheusrule/prometheusrule_controller.go#L110

Currently it works only for exactly one match, but it`s possible to have multiple agents.

So, i think, we have to reconcile objects for all vmagents/vmalerts. Possible performance degradation - need to investigate.

Quick start guide

I've observed error when follow quick-start guide instructions
No resources were created

E0802 07:20:52.014949       1 reflector.go:178] k8s.io/[email protected]+incompatible/tools/cache/reflector.go:125: Failed to list *v1beta1.VMServiceScrape: vmservicescrapes.operator.victoriametrics.com is forbidden: User "system:serviceaccount:default:vm-operator" cannot list resource "vmservicescrapes" in API group "operator.victoriametrics.com" at the cluster scope
E0802 07:20:54.770054       1 reflector.go:178] k8s.io/[email protected]+incompatible/tools/cache/reflector.go:125: Failed to list *v1beta1.VMCluster: vmclusters.operator.victoriametrics.com is forbidden: User "system:serviceaccount:default:vm-operator" cannot list resource "vmclusters" in API group "operator.victoriametrics.com" at the cluster scope

The CustomResourceDefinition is invalid for 1.17 and 1.18 kubernetes versions

There are two problems:

  1. with 1.16 and 1.17 clusters
 The CustomResourceDefinition "vmagents.victoriametrics.com" is invalid:
* spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.type: Invalid value: "object": must be a scalar or atomic type as item of a list with x-kubernetes-list-type=set
* spec.validation.openAPIV3Schema.properties[spec].properties[volumeMounts].items.type: Invalid value: "object": must be a scalar or atomic type as item of a list with x-kubernetes-list-type=set
* spec.validation.openAPIV3Schema.properties[spec].properties[remoteWrite].items.type: Invalid value: "object": must be a scalar or atomic type as item of a list with x-kubernetes-list-type=set
* spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.type: Invalid value: "object": must be a scalar or atomic type as item of a list with x-kubernetes-list-type=set
* spec.validation.openAPIV3Schema.properties[spec].properties[imagePullSecrets].items.type: Invalid value: "object": must be a scalar or atomic type as item of a list with x-kubernetes-list-type=set

listType=set must be removed to fix it.

  1. with 1.18 cluster version
The CustomResourceDefinition "appservices.app.example.com" is invalid: spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property

Kubernetes upstream problem.

operator-framework/operator-sdk#3235

need to find out minimal version of Kubernetes, that operator supports.

Add ARM support

I understand that it's early and not priority just want to raise my hand that my raspberry pi 4 64bit cluster is forced to use prometheus-operator. So please if it's not that big of a issue let's make it happen ๐Ÿ™

Operator default values review

  1. fix default port for vmAlert to 8080

  2. change requirements and limits for defaults values.

  3. bump version to actual VictoriaMetrics release.

VictoriaMetrics cluster api design

It`s nice to have Victoria metrics cluster object, which would manage victoriametrics setup with vmselect,vminsert, vmstorage.

I propose to use:

  1. statefulset for vmstorage

  2. deployment for vmselect/vminsert.

Publish operator to Hub

need to define the flow what we publish, consider 3 options:

  • publish prom crds as requirements to vm operator
  • provide same crds as prom
  • both

RBAC: possibly missing "replicasets" resource in vm-operator ClusterRole

When starting the operator :

vm-operator-85f8d8b599-xp5s8 vm-operator {"level":"info","ts":1592233312.4326746,"logger":"cmd","msg":"Could not create metrics Service","error":"failed to initialize service object for metrics: replicasets.apps \"vm-operator-85f8d8b599\" is forbidden: User \"system:serviceaccount:monitoring:vm-operator\" cannot get resource \"replicasets\" in API group \"apps\" in the namespace \"monitoring\""}

Fixed by adding replicasets in the ClusterRole definition of vm-operator (not sure if it was an oversight or voluntary to not include them in the first place)

load tlsConfig for serviceMonitors isn't working

Prometheus-operator provides ability to load tls config for target scrape from secrets:

https://github.com/coreos/prometheus-operator/blob/master/pkg/prometheus/operator.go#L1519

(c *Operator) loadTLSAssets(mons map[string]*monitoringv1.ServiceMonitor) (map[string]TLSAsset, error) ```

 We support the same api objects:

 https://github.com/VictoriaMetrics/operator/blob/master/pkg/apis/monitoring/v1/servicemonitor_types.go#L129

```yaml
type TLSConfig struct {
  ...
	// Stuct containing the CA cert to use for the targets.
	CA SecretOrConfigMap `json:"ca,omitempty"`
	Cert SecretOrConfigMap `json:"cert,omitempty"`

	// Secret containing the client key file for the targets.
	KeySecret *v1.SecretKeySelector `json:"keySecret,omitempty"`
...
}```

  It would be nice to have the same behaviour. 

add force configuration sync

It's nice to have feature for VMAlert and VMAgent.

After configuration was changed at service configmap, we can update corresponding pod annotation with last configuration update time. It will trigger configmap sync and immediately updates services configuration.

Deduplicate alerts when populating vmalert configmap

Type
Feature request

Description
Operator creates alert configmaps for vmalert, from prometheusrules and vmrules. Since vmalert does not allow duplicated alerts(ref), it would be good to have the operator to deduplicate the alerts from configmap.
Currently, vmalert fails to start if there is a duplicate alert.

Expectation
Rules with duplicate alerts should get deduplicated, so that vmalert can boot up.

validation admission webhook support

https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/

Adding it would help to validate crd creation request for logic errors.

For instance, we can parse prometheusRule and find syntax errors, or using unknown flags for vmagent.

In general, needed:

  1. Prometheus rule syntax validation, check it for vmalert syntax.
  2. vmagent/vmsingle/vmalert checking for bad args.
  3. discard attempts to reduce size for vmSingle pvc or mutate it.

Some sanity checks?

Option to add maxScrapeSize to vmagent

Type
Feature request

Problem Statement
For some services in a big cluster, kube-state-metrics for example, the scrape response size is much larger than the default promscrape.maxScrapeSize(16777216). Hence, VMAgent ignores the scrape with an error.

Describe the solution you'd like
Having a maxScrapeSize as a parameter for the VMAgentSpec API would be a good start to unblock this.

Additional context
Error log:

2020-10-28T18:51:37.921Z	error	VictoriaMetrics/lib/promscrape/scrapework.go:199	error when scraping "https://10.140.56.100:8443/metrics" from job "kube-state-metrics" with labels {endpoint="https-main", instance="10.140.56.100:8443", job="kube-state-metrics", namespace="monitoring", pod="kube-state-metrics-76b65c475d-j4ww6", prometheus="victoria-metrics/example-vmagent", service="kube-state-metrics"}: the response from "https://10.140.56.100:8443/metrics" exceeds -promscrape.maxScrapeSize=16777216; either reduce the response size for the target or increase -promscrape.maxScrapeSize

argocd misbehaves if removePvcAfterDelete is false

e.g yaml used to created VMSingle.

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMSingle
metadata:
  name: test
spec:
  retentionPeriod: "1"
  removePvcAfterDelete: true
  serviceAccountName: "test"
  storage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 10Gi

if removePvcAfterDelete: true, it works perfectly via argocd. as the owner reference of pvc is set to vmsingle object. but pvc will be deleted when vmsingle is deleted in this case as per doc.

if removePvcAfterDelete: false, it doesn't work properly in argocd as the owner reference of pvc is not set to vmsingle object. so argo tries to delete the pvc immediately.

Is it possible to set owner reference of pvc to vmsingle object irrespective of setting removePvcAfterDelete: true/false ?

auth_proxy_service.yaml contains system namepsace

Should it be kube-system

Error from server (NotFound): error when creating "auth_proxy_service.yaml": namespaces "system" not found
error validating "kustomization.yaml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

improve e2e tests

There is method for checking deployment:

err = e2eutil.WaitForOperatorDeployment(t, f.KubeClient, namespace, "vm-operator", 1, retryInterval, timeout)```

And i sugest to add  few additional methods: 
1) WaitForConfigmap() - optional check keys name
2) WaitForStatefulSet  - validate status ready
3) WaitForSecret() - optional check keys name
 

Question about default serviceMonitor

A serviceMonitor is created by the operator for each new vmAgent object - I fathom the aim is to generate the vmagent config to scrap itself.
Users are supposed to create additional serviceMonitors to scrap other targets and not to edit this one ? Asking because it wasn't clear from the docs and examples :)

Alertmanager CRD overlaps with the Prometheus Operator one

Hi there,

Not sure if the files in master are the most up to date ones, since you mention an install directory in the docs and it's not there yet, but in the deploy/crds for vmalertmanagers, singular and plural terms mismatch - If the goal is to use the coreos CRD for alertmanager, better remove it from CRDs here as it will conflict with an already deployed prometheus operator

  names:
    kind: Alertmanager
    listKind: AlertmanagerList
    plural: vmalertmanagers
    shortNames:
    - vma
    singular: alertmanager

Remove Prometheus Operator CRD and replace it with VM

I suggest removing completely all monitoring.com api and replace:

  • ServiceMonitor with VMServiceScrape
  • PodMonitor with VMPodScrape
  • PrometheusRule with VMAlertRule

For users, who want to migrate from Prometheus operator to victoria metrics operator, we will add:

  • VMPrometheusConvertor service, it will watch monitoring.com api objects (serviceMonitor, podMonitor, prometheusRule) and create corresponding victoriametrics.com api objects.

Main benefit:

  • remove Prometheus-operator dependency
  • easy operator hub integration
  • remove possible conflicts with Prometheus operator installation
  • validation for our api objects.

vmagent relabel config support

VmAgent provides the feature for relabeling metrics:

-remoteWrite.urlRelabelConfig

I suggest adding such option to remoteSpec and rename it to remoteWriteSpec:

// RemoteWriteSpec defines the remote_write configuration for vmAgent.
type RemoteWriteSpec struct {
	// URL of the endpoint to send samples to.
	URL string `json:"url"`
        URLRelabelConfig *v1.ConfigMapKeySelector `json:urlRelabelConfig,omitempty"`
}

open question how to load this config? with configmap or secret selector?

add compatability info

Hi,

Please add compatibility info to your docs.
Currently, your crd.yaml (v0.1.0) doesn't work in openshift 3.11 and kubernetes 1.18.5

for k8s_1.18.5:
customresourcedefinition.apiextensions.k8s.io/vmclusters.operator.victoriametrics.com configured
customresourcedefinition.apiextensions.k8s.io/vmpodscrapes.operator.victoriametrics.com configured
customresourcedefinition.apiextensions.k8s.io/vmrules.operator.victoriametrics.com configured
customresourcedefinition.apiextensions.k8s.io/vmservicescrapes.operator.victoriametrics.com configured
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmagents.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmalertmanagers.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmalerts.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmsingles.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]

http.pathPrefix fails on healthchecks

When setting http.pathPrefix in VictoriaMetrics flags the readiness and liveness probes fails because they use /health instead of <pathPrefix>/health.

2020-08-25T19:43:04.048Z	info	VictoriaMetrics/app/victoria-metrics/main.go:43	started VictoriaMetrics in 0.022 seconds
2020-08-25T19:43:04.048Z	info	VictoriaMetrics/lib/httpserver/httpserver.go:77	starting http server at http://:8429/
2020-08-25T19:43:04.048Z	info	VictoriaMetrics/lib/httpserver/httpserver.go:78	pprof handlers are exposed at http://:8429/debug/pprof/
2020-08-25T19:43:06.348Z	warn	VictoriaMetrics/lib/httpserver/httpserver.go:175	remoteAddr: "10.20.16.1:57744"; cannot get canonical path: missing `-pathPrefix="/victoriametrics-1"` in the requested path: "/health"
2020-08-25T19:43:08.747Z	warn	VictoriaMetrics/lib/httpserver/httpserver.go:175	remoteAddr: "10.20.16.1:57748"; cannot get canonical path: missing `-pathPrefix="/victoriametrics-1"` in the requested path: "/health"
2020-08-25T19:43:11.348Z	warn	VictoriaMetrics/lib/httpserver/httpserver.go:175	remoteAddr: "10.20.16.1:57758"; cannot get canonical path: missing `-pathPrefix="/victoriametrics-1"` in the requested path: "/health"
2020-08-25T19:43:13.747Z	warn	VictoriaMetrics/lib/httpserver/httpserver.go:175	remoteAddr: "10.20.16.1:57768"; cannot get canonical path: missing `-pathPrefix="/victoriametrics-1"` in the requested path: "/health"
2020-08-25T19:43:16.348Z	warn	VictoriaMetrics/lib/httpserver/httpserver.go:175	remoteAddr: "10.20.16.1:57782"; cannot get canonical path: missing `-pathPrefix="/victoriametrics-1"` in the requested path: "/health"
2020-08-25T19:43:18.747Z	warn	VictoriaMetrics/lib/httpserver/httpserver.go:175	remoteAddr: "10.20.16.1:57786"; cannot get canonical path: missing `-pathPrefix="/victoriametrics-1"` in the requested path: "/health"
2020-08-25T19:43:18.756Z	info	VictoriaMetrics/app/victoria-metrics/main.go:46	received signal terminated
2020-08-25T19:43:18.756Z	info	VictoriaMetrics/app/victoria-metrics/main.go:50	gracefully shutting down webservice at ":8429"

Is there any way to workaround this issue?

Update documentation

Improve readme:

  • add uses cases.
  • problems that operator may solve.
  • difference between helm chart.
  • troubleshooting.

Wrong path for secrets and configs mount in api docs

Hello,

In the api docs mentioned that secrets and configmaps for vmagent is mounted to /etc/vmagent/secrets and /etc/vmagent/configsbut they are mounted to/etc/vm/secretsand/etc/vm/configs` instead.

Not sure if it's the bug in operator or in the docs so not i'm proposing any PR.

metadata labels is not copied when converting from prometheus-operator resource to vm one

Hello, i'm just spotted that labels are not copied when converting resource from the prometheus-operator to vm ones.
For example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2020-03-13T09:08:39Z"
  generation: 3
  labels:
    app: prometheus-operator-kube-etcd
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-operator-8.12.3
    heritage: Helm
    release: prometheus
  name: prometheus-prometheus-oper-kube-etcd
  namespace: monitoring
  resourceVersion: "270969920"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/prometheus-prometheus-oper-kube-etcd
  uid: deb272c5-555b-4151-b77a-0ecf2bb05fc7

and

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
  creationTimestamp: "2020-08-18T04:05:25Z"
  generation: 3
  managedFields:
  - apiVersion: operator.victoriametrics.com/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:endpoints: {}
        f:jobLabel: {}
        f:namespaceSelector:
          .: {}
          f:matchNames: {}
        f:selector:
          .: {}
          f:matchLabels:
            .: {}
            f:app: {}
            f:release: {}
      f:status: {}
    manager: manager
    operation: Update
    time: "2020-08-20T18:05:12Z"
  name: prometheus-prometheus-oper-kube-etcd
  namespace: monitoring
  resourceVersion: "270969921"
  selfLink: /apis/operator.victoriametrics.com/v1beta1/namespaces/monitoring/vmservicescrapes/prometheus-prometheus-oper-kube-etcd
  uid: 056b1b68-f6dc-4a39-912f-e233162137db

this is also true for VMRule and other resources.

Also there is possible bug in UpdateServiceMonitor and analogues: you are updating only Spec part but not the labels and annotations.

Add kustomization.yml example to the docs

There is support to avoid manual downloading of manifest simply by using Kustomize resources. All we need is to add following example to the documentation to allow others to quickly start deploying and modifying their operator.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yml
- github.com/VictoriaMetrics/operator/config/default?ref=v0.2.0

namespace: monitoring
images:
- name: victoriametrics/operator
  newTag: v0.2.0

Please let me know what do you think and I can send patch right away.

As bonus point it would be great if we could bake in version into git revision so one can skip image patching, but I would expect that as separate step as it would probably require some changes in release process.

sanity check crds objects

Check:

  • VmSingle
  • VmAlert
  • VmAgent

for sanity, update description and naming if needed, add additional mandatory options.

Docs for operator

  1. use cases - basic setup, HA setup, init containers.

  2. VictoriaMetrics Single - docs, by design, you cannot have more then 1 replica for this object. If you need more - create additional resource or use Cluster version

  3. Api object description.

VMAgent Improve RemoteSpec

Lets' improve Remote write spec and add all missing properties like basic auth,

full list of supported params for remote write

-remoteWrite.basicAuth.password value
    	Optional basic auth password to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
  -remoteWrite.basicAuth.username value
    	Optional basic auth username to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
  -remoteWrite.bearerToken value
    	Optional bearer auth token to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
  -remoteWrite.flushInterval duration
    	Interval for flushing the data to remote storage. Higher value reduces network bandwidth usage at the cost of delayed push of scraped data to remote storage. Minimum supported interval is 1 second (default 1s)
  -remoteWrite.label value
    	Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage
  -remoteWrite.maxBlockSize int
    	The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics (default 33554432)
  -remoteWrite.maxDiskUsagePerURL int
    	The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
  -remoteWrite.queues int
    	The number of concurrent queues to each -remoteWrite.url. Set more queues if a single queue isn't enough for sending high volume of collected data to remote storage (default 1)
  -remoteWrite.relabelConfig string
    	Optional path to file with relabel_config entries. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config for details
  -remoteWrite.sendTimeout duration
    	Timeout for sending a single block of data to -remoteWrite.url (default 1m0s)
  -remoteWrite.showURL
    	Whether to show -remoteWrite.url in the exported metrics. It is hidden by default, since it can contain sensistive auth info
  -remoteWrite.tlsCAFile value
    	Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
  -remoteWrite.tlsCertFile value
    	Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
  -remoteWrite.tlsInsecureSkipVerify
    	Whether to skip tls verification when connecting to -remoteWrite.url
  -remoteWrite.tlsKeyFile value
    	Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
  -remoteWrite.tlsServerName value
    	Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
  -remoteWrite.tmpDataPath string
    	Path to directory where temporary data for remote write component is stored (default "vmagent-remotewrite-data")
  -remoteWrite.url value
    	Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems
  -remoteWrite.urlRelabelConfig value
    	Optional path to relabel config for the corresponding -remoteWrite.url

prometheus selectors not working correctly

https://github.com/coreos/prometheus-operator/blob/master/Documentation/design.md#servicemonitor

There are few selectors for filtering objects:

  1. serviceMonitorNamespaceSelector/serviceMonitorSelector - filters service monitor by namespace (if not specified, use vmagent deployed namespace).
    Atm, filters only for current namespace or namespace selector:

https://github.com/VictoriaMetrics/operator/blob/master/pkg/controller/factory/serviceMonExec.go#L140

possible solution - filter twice.

  1. podMonitorSelector/ podMonitorNamespaceSelector
    https://github.com/VictoriaMetrics/operator/blob/master/pkg/controller/factory/serviceMonExec.go#L201
    The same problem

  2. ruleSelector/ruleNamespaceSelector for vmalert
    https://github.com/VictoriaMetrics/operator/blob/master/pkg/controller/factory/rulescm.go#L145

As example,

You can specify for filter everything:

apiVersion: monitoring.victoriametrics.com/v1beta1
kind: VmAgent
metadata:
  name: example-vmagent
spec:
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}

Current every serviceMon at current namespace

apiVersion: monitoring.victoriametrics.com/v1beta1
kind: VmAgent
metadata:
  name: example-vmagent
spec:
  serviceMonitorSelector: {}

Some namespaces and every service mon:

apiVersion: monitoring.victoriametrics.com/v1beta1
kind: VmAgent
metadata:
  name: example-vmagent
spec:
  serviceMonitorNamespaceSelector: 
       matchLabels:
              name:  dev
  serviceMonitorSelector: {}

and etc

VMAgent: unable to configure additional volumes

Hello.

I'm trying to setup a sidecar container in VMAgent and mount an additional volume inside it, using spec.volumes and spec.containers.volumeMounts like this:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: vmagent
spec:
  volumes:
  - name: vm-shared
    emptyDir: {}
  serviceScrapeNamespaceSelector: {}
  podScrapeNamespaceSelector: {}
  podScrapeSelector: {}
  serviceScrapeSelector: {}
  replicaCount: 1
  serviceAccountName: vmagent
  scrapeInterval: 60s
  remoteWrite:
    - url: "http://vmsingle-vmsingle-persisted.default.svc:8429/api/v1/write"
  containers:
  - image: busybox
    imagePullPolicy: IfNotPresent
    name: busybox
    volumeMounts:
    - mountPath: /etc/vm-shared
      name: vm-shared

After applying this manifest, no deployment appears and there is an error in VM Operator logs:

{"level":"error","ts":1604322110.788751,"logger":"controllers.VMAgent","msg":"cannot create or update vmagent deploy","vmagent":"default/vmagent","error":"cannot create new vmagent deploy: Deployment.apps \"vmagent-vmagent\" is invalid: spec.template.spec.containers[0].volumeMounts[2].name: Not found: \"vm-shared\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tgithub.com/go-logr/[email protected]/zapr.go:128\ngithub.com/VictoriaMetrics/operator/controllers.(*VMAgentReconciler).Reconcile\n\tgithub.com/VictoriaMetrics/operator/controllers/vmagent_controller.go:75\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:90"}

It looks like the operator ignores the volumes in spec.volumes part when creating a Deployment object.

Moving the volume list inside spec.containers also produces the same error, as well as moving spec.volumeMounts from spec.containers to spec. I'm able only to mount pre-set volumes inside my sidecar container (config, tls-assets and config-out).

VM Operator is version 0.3.0.

improper relabel configuration after serviceMonitor conversion

When serviceMonitor converted to VMServiceScrape, the following relabelConfig breaks vmagent configuration:

 metricRelabelings:
    - action: keep

with error at vmagent:

error when parsing relabel_config #1: missing source_labels for action=keep

Prometheus works ok with this action, because it accepts following relabel rule:

  - separator: ;
    regex: (.*)
    replacement: $1
    action: keep

I suggest to filter such metricRelabelings, as not supported and mean-less.

code links:

This is handled by the prometheus relabel config, by assigning a default regex to it. https://github.com/prometheus/prometheus/blob/2fe1e9fa93772fd831b1c040968bb748eac12795/pkg/relabel/relabel.go#L101

But in case of VM, this validation is breaking: https://github.com/VictoriaMetrics/VictoriaMetrics/blob/bca468bb55f0015c6a5d32cafcb1074f833448fe/lib/promrelabel/config.go#L105

remove hardcoded namespace default

Hi,
Please remove hardcoded namespace=default from you rbac.yaml and manager.yaml
Operator should has the possibility to be installed into any namespace of user's choose.
Thanks

problem with converting servicemonitor with tlsAuth

Hello, I'm trying to migrate our monitoring stack from prometheus-operator to VictoriaMetrics operator but got a problem.

We are using prometheus helm chart and specifically we are using defaultRules section from there.

One of our ServiceMonitor's look like this:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2020-03-13T09:08:39Z"
  generation: 2
  labels:
    app: prometheus-operator-kubelet
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-operator-8.12.3
    heritage: Helm
    release: prometheus
  name: prometheus-prometheus-oper-kubelet
  namespace: monitoring
  resourceVersion: "158913151"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/prometheus-prometheus-oper-kubelet
  uid: 93c96dc1-6365-452e-b0e6-018c96393cf9
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    metricRelabelings:
    - action: replace
      sourceLabels:
      - node
      targetLabel: instance
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    path: /metrics/cadvisor
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kubelet

pay attention to this part

    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true

and relative VMServiceScrape looks like this

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
  creationTimestamp: "2020-08-18T04:05:27Z"
  generation: 1
  managedFields:
  - apiVersion: operator.victoriametrics.com/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:endpoints: {}
        f:jobLabel: {}
        f:namespaceSelector:
          .: {}
          f:matchNames: {}
        f:selector:
          .: {}
          f:matchLabels:
            .: {}
            f:k8s-app: {}
      f:status: {}
    manager: manager
    operation: Update
    time: "2020-08-18T04:05:27Z"
  name: prometheus-prometheus-oper-kubelet
  namespace: monitoring
  resourceVersion: "266920582"
  selfLink: /apis/operator.victoriametrics.com/v1beta1/namespaces/monitoring/vmservicescrapes/prometheus-prometheus-oper-kubelet
  uid: 11ae1bfb-1eca-48b4-8a26-00287c6d5a41
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    bearerTokenSecret:
      key: ""
    honorLabels: true
    metricRelabelConfigs:
    - action: replace
      sourceLabels:
      - node
      targetLabel: instance
    port: https-metrics
    relabelConfigs:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      ca: {}
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      cert: {}
      insecureSkipVerify: true
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    bearerTokenSecret:
      key: ""
    honorLabels: true
    path: /metrics/cadvisor
    port: https-metrics
    relabelConfigs:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      ca: {}
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      cert: {}
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kubelet

you can see that tlsConfig has empty cert field

  tlsConfig:
      ca: {}
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      cert: {}
      insecureSkipVerify: true

and in the pod rendered vmagent.env.yaml has this section

- job_name: monitoring/prometheus-prometheus-oper-kubelet/0
  honor_labels: true
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - kube-system
  scheme: https
  tls_config:
    insecure_skip_verify: true
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    cert_file: /etc/vmagent-tls/certs/monitoring__
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_k8s_app
    regex: kubelet
  - action: keep
    source_labels:
    - __meta_kubernetes_endpoint_port_name
    regex: https-metrics
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Node;(.*)
    replacement: ${1}
    target_label: node
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Pod;(.*)
    replacement: ${1}
    target_label: pod
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: job
    replacement: ${1}
  - source_labels:
    - __meta_kubernetes_service_label_k8s_app
    target_label: job
    regex: (.+)
    replacement: ${1}
  - target_label: endpoint
    replacement: https-metrics
  - source_labels:
    - __metrics_path__
    target_label: metrics_path
  metric_relabel_configs:
  - source_labels:
    - node
    target_label: instance
    action: replace
- job_name: monitoring/prometheus-prometheus-oper-kubelet/1
  honor_labels: true
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - kube-system
  metrics_path: /metrics/cadvisor
  scheme: https
  tls_config:
    insecure_skip_verify: true
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    cert_file: /etc/vmagent-tls/certs/monitoring__
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_k8s_app
    regex: kubelet
  - action: keep
    source_labels:
    - __meta_kubernetes_endpoint_port_name
    regex: https-metrics
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Node;(.*)
    replacement: ${1}
    target_label: node
  - source_labels:
    - __meta_kubernetes_endpoint_address_target_kind
    - __meta_kubernetes_endpoint_address_target_name
    separator: ;
    regex: Pod;(.*)
    replacement: ${1}
    target_label: pod
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: service
  - source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - source_labels:
    - __meta_kubernetes_service_name
    target_label: job
    replacement: ${1}
  - source_labels:
    - __meta_kubernetes_service_label_k8s_app
    target_label: job
    regex: (.+)
    replacement: ${1}
  - target_label: endpoint
    replacement: https-metrics
  - source_labels:
    - __metrics_path__
    target_label: metrics_path

the error is in the following part

  tls_config:
    insecure_skip_verify: true
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    cert_file: /etc/vmagent-tls/certs/monitoring__

as i understand there shouldn't be cert_file at all according to original ServiceMonitor definition.

Because of wrong config file vmagent process crash with error that it couldn't find specified cert_file.

If i could be some help with this issue feel free to ask :)

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.