Giter Club home page Giter Club logo

event_exporter's Introduction

Kubernetes Event Exporter

Build Status License

Kubernetes events to Prometheus bridge.

A Collector that can list and watch Kubernetes events, and according to events' occurrence, determine how long the event lasts. The information is then translated into metrics.

Metrics Overview

  1. kube_event_count Count of kubernetes event that was seen for the last hour. The metric value is the same as the count property of Event object in the cluster.
    kube_event_count{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.1640452bd04fc7bf",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
    
  2. kube_event_unique_events_total Total number of kubernetes unique event that happened for the last hour.
    kube_event_unique_events_total{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.1640452bd04fc7bf",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
    
  3. event_exporter_versionInformation of the event exporter that was built
    event_exporter_build_info{branch="v1.0",build_date="2020-10-22T10:11:29Z",build_user="Caicloud Authors",go_version="go1.13.15",version="v1.0.0"} 1
    

Getting Started

Build

$ VERSION=v1.0.0 REGISTRY=docker.io make build

If you want to get more information about flag options,please refer to Makefile in our repository

Run

running outside Kubernetes (Exporter will search for kubeconfig in ~/.kube)

$ ./event_exporter  --kubeConfigPath=$HOME/.kube/config

running inside Kubernetes (Exporter will use Kubernetes serviceaccount)

$ ./event_exporter

Check the metrics

curl http://<pod-ip>:9102/metrics

General Flags

Name Example Description
kubeMasterURL --kubeMasterURL= Optional. The URL of kubernetes apiserver to use as a master
kubeConfigPath --kubeConfigPath=$HOME/.kube/config Optional. The path of kubernetes configuration file
eventType --eventType=Warning --eventType=Normal Optional. List of allowed event types. The default value is Warning type
port --port=9102 Optional. Port to expose event metrics on (default 9102)
version --version Print version information

Use Kubernetes

You can deploy this exporter by using the image caicloud/event-exporter:${VERSION} in k8s cluster, the available versions can be got from the releases.

Deploy

kubectl apply -f deploy.yml

Then check the pod status:

kubectl get pods | grep event
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.6811e-05
go_gc_duration_seconds{quantile="0.25"} 2.6e-05
go_gc_duration_seconds{quantile="0.5"} 3.0795e-05
go_gc_duration_seconds{quantile="0.75"} 8.0126e-05
go_gc_duration_seconds{quantile="1"} 0.000186691
go_gc_duration_seconds_sum 0.001432397
go_gc_duration_seconds_count 24
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 27
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.13.15"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 6.29132e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.6787848e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.452877e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 236938
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 2.924731798616038e-06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.377728e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 6.29132e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 5.8359808e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 7.766016e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 21220
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 5.7688064e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6125824e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6033609023805106e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 258158
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13888
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 66096
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 81920
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.07428e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.772971e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 983040
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 983040
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.2810744e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 18
# HELP event_exporter_build_info A metric with a constant '1' value labeled by version, branch,build_user,build_date and go_version from which event_exporter was built
# TYPE event_exporter_build_info gauge
event_exporter_info{branch="v1.0",build_date="2020-10-22T10:11:29Z",build_user="Caicloud Authors",go_version="go1.13.15",version="v1.0.0"} 1
# HELP kube_event_count Number of kubernetes event happened
# TYPE kube_event_count gauge
kube_event_count{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.1640452bd04fc7bf",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_count{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.164045435014f51c",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_count{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.164045638ee80ccb",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_count{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.164045efda48031f",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_count{involved_object_kind="Deployment",involved_object_name="my-nginx",involved_object_namespace="default",name="my-nginx.1640456cf4c9fbad",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_count{involved_object_kind="PersistentVolumeClaim",involved_object_name="prometheus-data-prometheus-0",involved_object_namespace="kube-system",name="prometheus-data-prometheus-0.163ff24070ae83e5",namespace="kube-system",reason="ProvisioningFailed",source="/persistentvolume-controller",type="Warning"} 6303
# HELP kube_event_unique_events_total Total number of kubernetes unique event happened
# TYPE kube_event_unique_events_total counter
kube_event_unique_events_total{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.1640452bd04fc7bf",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_unique_events_total{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.164045435014f51c",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_unique_events_total{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.164045638ee80ccb",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_unique_events_total{involved_object_kind="Deployment",involved_object_name="event-exporter",involved_object_namespace="default",name="event-exporter.164045efda48031f",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_unique_events_total{involved_object_kind="Deployment",involved_object_name="my-nginx",involved_object_namespace="default",name="my-nginx.1640456cf4c9fbad",namespace="default",reason="ScalingReplicaSet",source="/deployment-controller",type="Normal"} 1
kube_event_unique_events_total{involved_object_kind="PersistentVolumeClaim",involved_object_name="prometheus-data-prometheus-0",involved_object_namespace="kube-system",name="prometheus-data-prometheus-0.163ff24070ae83e5",namespace="kube-system",reason="ProvisioningFailed",source="/persistentvolume-controller",type="Warning"} 10
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.69
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.4009088e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.60335836753e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 6.90274304e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 174
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

License

event_exporter is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

event_exporter's People

Contributors

anoopwebs avatar bbbmj avatar caicloud-bot avatar ddysher avatar denysvitali avatar lichuan0620 avatar pendoragon avatar reason2010 avatar seymourtang avatar supereagle avatar turbotankist avatar zionwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

event_exporter's Issues

Debian stretch EOL

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
Debian stretch reached EOL
What you expected to happen:
Update linux distro to supported version
How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Feature: Add configuration to limits reading events to a specific namespace

/kind feature

What happened: Unable to specified a specific namespace to read events from

What you expected to happen:
In our cluster, we do not have the rights to see all cluster resources. By starting the application, we get the following exception.

E0421 17:03:18.033794       1 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Event: events is forbidden: User "system:serviceaccount:dev:event-exporter" cannot list resource "events" in API group "" at the cluster scope

How to reproduce it (as minimally and precisely as possible):

  • Create a new namespace
  • Create a new ServiceAccount in that namespace
  • Create a new RoleBinding to view clusterRole.
  • Bind the pod to the namespaced service account
  • Should see errors

Anything else we need to know?:
I tried installing it on OpenShift in a newly created namespace/project.

grafana dashboard

/kind feature

You probably already have some grafana dashboard and it would be helpful to have some starting point... So can you add a example grafana dashboard (or upload to grafana site and link to in in the documentation)?

Exporting specific events (filter by specific Event Reasons)

/kind feature

Is there a way today to specify a filter of Event reasons to the collector and generate only those metrics?
For example: I want this tool to export metrics only for Reason="OOMKilled" events.

Is it possible? what code changed need to be done to achieve this?
Thanks!

Existing event alert keeps getting deleted and fired again

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:

TLDR; The alert is getting deleted and recreate while the kubernetes event still exist.

The exporter reports existing event as new event every few minutes, meaning even when a kubernetes event exist, the exporter keeps report that the event is resolved (disappear) but then he recreate it (because the event is still exist) which cause to prometheus keeps deleting and recreate alerts of the same event.

What you expected to happen:
When event occurred, the exporter should report and keep it until the event is deleted. (and prometheus will keep firing)

How to reproduce it (as minimally and precisely as possible):
Every event that keeps repeating himself would behave like what I described.

Anything else we need to know?:
It could be miss-configuration of alert-rules in prometheus, but I wanted to be sure if it's a well-known bug.

Security fix: upgrade golang-runtime version

kind feature

What happened:
This image runs on top of a very old version of golang-runtime (1.13.11) while there are newer versions which include security fixes (1.20)

What you expected to happen:
Upgrade your golang-runtime version to the latest version

How to reproduce it (as minimally and precisely as possible):
Scan your image via blackduck binary analysis and you will see the issues.
See screenshot for more details

Anything else we need to know?:
No

image

Update vendors

Is this a BUG REPORT or FEATURE REQUEST?: FEATURE REQUEST

kind feature

What happened:
Vendor are getting old and last update in 3 years. Can we get it updated?

What you expected to happen:
Updated Vendors

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

/kind bug

What happened:
when run helm command there is a warning
rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole

What you expected to happen:
update apiVersion: rbac.authorization.k8s.io/v1beta1 to apiVersion: rbac.authorization.k8s.io/v1

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

UnexpectedAdmissionError Alerting

Want to label as "help wanted"
We've encounterd an issue where pods end up being in UnexpectedAdmissionError state. Found this tool and it looks like it's exactly what i need to get monitoring setup. I was wondering if i have the log.level set to warning since the error is a "Warning" type

Events:
  Type     Reason                    Age    From                    Message
  ----     ------                    ----   ----                    -------
  Normal   Scheduled                 2m51s  default-scheduler       Successfully assigned default/backend-549f576d5f-xzdv4 to std-16gb-g7mo
  Warning  UnexpectedAdmissionError  2m51s  kubelet, std-16gb-g7m

Would the event-exporter catch this? If so i'm wondering what the event_reason parameter prometheus query would be

V1.0 image - "exec format error"

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

/kind bug

What happened:
After building V1.0 from makefile, created image.
started new project on openshift.
deployment with image v1.0
got error:
standard_init_linux.go:178: exec user process caused "exec format error"

What you expected to happen:
expected container to start

How to reproduce it (as minimally and precisely as possible):
git pull && make
new project on openshift 3.11
deploy with deploy file

Failed to list *api.Event: events is forbidden

/kind bug

What happened:
I removed these two in annotations, because I'm not running in AWS.

         "service.beta.kubernetes.io/aws-load-balancer-backend-protocol": "http",
         "service.beta.kubernetes.io/aws-load-balancer-ssl-ports": "https"
E1001 06:28:54.277852       1 reflector.go:216] github.com/caicloud/event_exporter/store.go:111: Failed to list *api.Event: events is forbidden: User "system:serviceaccount:monitoring:event-exporter" cannot list resource "events" in API group "" at the cluster scope

What you expected to happen:
events metrics kubernetes_events

How to reproduce it (as minimally and precisely as possible):
kubectl --context current_context_name -n kube-system apply -f deploy.yaml

Feature request: OOM or cpu limit events

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

I think it is a feature, but don't even know if it is possible :)

What happened:
My K8s setup was having a Pod OOM and another with a CPU limit. I was expecting that the k8s event showed those events, but all i could see was the healtcheck failure.

What you expected to happen:
Some k8s event for when the pod limits are trigger

How to reproduce it (as minimally and precisely as possible):
Set a pod with low memory and cpu limit and start it up

Anything else we need to know?:
i'm currently in k8s 1.15

Does it support use on large-scale production clusters?

Is this a BUG REPORT or FEATURE REQUEST?:
No
What happened:
请教一下,这个exporter是否支持在大规模生产集群上使用,如何高可用部署。
另外,events的指标数据规模会不会导致prometheus消耗过多的存储,有测试记录吗?�
What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Deploy event_exporter to Kubernetes cluster NO Metrics in Prometheus

This event_exporter is nice! We would like to use this to monitoring our cluster behavior and health.
Try to deploy it in cluster with following:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: event-exporter-sa
  namespace: kyma-system
  labels:
    app: event-exporter
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: event-exporter 
rules:
- apiGroups: [""]
  resources: ["events"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: event-exporter-rb
  labels:
    app: event-exporter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: event-exporter
subjects:
- kind: ServiceAccount
  name: event-exporter-sa
  namespace: kyma-system

Deployment

{
   "apiVersion": "apps/v1",
   "kind": "Deployment",
   "metadata": {
      "labels": {
         "name": "event-exporter"
      },
      "name": "event-exporter"
   },
   "spec": {
      "replicas": 1,
      "revisionHistoryLimit": 2,
      "selector": {
         "matchLabels": {
            "app": "event-exporter"
         }
      },
      "strategy": {
         "type": "RollingUpdate"
      },
      "template": {
         "metadata": {
            "annotations": {
               "prometheus.io/path": "/metrics",
               "prometheus.io/port": "9102",
               "prometheus.io/scrape": "true"
            },
            "labels": {
               "app": "event-exporter",
            }
         },
         "spec": {
            "containers": [
               {
                  "command": [
                     "./event_exporter"
                  ],
                  "env": [ ],
                  "image": "caicloud/event-exporter:v0.2.0",
                  "imagePullPolicy": "Always",
                  "name": "event-exporter",
                  "ports": [
                     {
                        "containerPort": 9102,
                        "name": "http"
                     }
                  ],
                  "resources": {
                     "limits": {
                        "memory": "100Mi"
                     },
                     "requests": {
                        "memory": "40Mi"
                     }
                  }
               }
            ],
            "serviceAccountName": "event-exporter-sa",
            "terminationGracePeriodSeconds": 30
         }
      }
   }
}

Service

apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true"},"labels":{"name":"event-exporter"},"name":"event-exporter","namespace":"kyma-system"},"spec":{"ports":[{"name":"http","port":80,"targetPort":9102}],"selector":{"app":"event-exporter"}}}
    prometheus.io/scrape: "true"
  creationTimestamp: "2020-04-16T15:53:31Z"
  labels:
    name: event-exporter
  name: event-exporter
  namespace: kyma-system
spec:
  clusterIP: 1*****
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 9102
  selector:
    app: event-exporter
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

In event-exporter container, it shows error like this

INFO  0417-02:21:32.480+00 store.go:110 | start event store...
INFO  0417-02:21:32.480+00 main.go:105 | Starting event_exporter (version=v0.0.1, branch=master, revision=6590adfed64518de2429d2a3becc588671704380)
INFO  0417-02:21:32.481+00 main.go:106 | Build context (go=go1.12.9, user=root@26f1ca322850, date=20190906-09:51:13)
INFO  0417-02:21:32.481+00 main.go:113 | Listening on :9102
E0417 02:21:32.482142       1 reflector.go:126] github.com/caicloud/event_exporter/store.go:111: Failed to list *v1.Event: Get https://100.64.0.1:443/api/v1/events?limit=500&resourceVersion=0: dial tcp 100.64.0.1:443: connect: connection refused
E0417 02:21:33.486319       1 reflector.go:126] github.com/caicloud/event_exporter/store.go:111: Failed to list *v1.Event: Get https://100.64.0.1:443/api/v1/events?limit=500&resourceVersion=0: dial tcp 100.64.0.1:443: connect: connection refused
E0417 02:21:34.487073       1 reflector.go:126] github.com/caicloud/event_exporter/store.go:111: Failed to list *v1.Event: Get https://100.64.0.1:443/api/v1/events?limit=500&resourceVersion=0: dial tcp 100.64.0.1:443: connect: connection refused

Open Prometheus metrics http://localhost:9090/metrics, cannot search out "kubernetes_events"

Question: What do the values mean for a kubernetes_events?

/kind feature

I noticed that some of the kubernetes_event metrics have a value of 0.
I'm assuming that a kubernetes_event with a value of 1 means that the event happened at around the time of scrapping, however I'm not sure about what an event with a 0 value means.

Like in the example:
/# HELP kubernetes_events State of kubernetes events
/# TYPE kubernetes_events gauge
kubernetes_events{event_kind="Pod",event_name="nginx-pc-534913751-2yzev",event_namespace="allen",event_reason="BackOff",event_source="kube-node-3/kubelet",event_subobject="spec.containers{nginx}",event_type="Normal"} 1
kubernetes_events{event_kind="Pod",event_name="nginx-pc-534913751-2yzev",event_namespace="allen",event_reason="Failed",event_source="kube-node-3/kubelet",event_subobject="spec.containers{nginx}",event_type="Warning"} 0

My question is, what do the 0 and 1 values mean for the kubernetes_events ?

Thank you!

Helm chart available?

Sorry for not following the template, but it's rather a question than a feature request: Is there an existing Helm Chart available for event_exporter? Asking because comments in https://github.com/caicloud/event_exporter/blob/master/deploy/deploy.yaml suggest the usage of helm template, but I am unable to find the source.

I see also the included clusterrolebinding is bound to the general purpose clusterrole view. Is there a list of required privileges needed so that a clusterrole can be created for that purpose?

Thanks!

pull v1.0 - manifest unkown

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
trying to pull the v1.0 as states in the deploy.yml ended in
Error response from daemon: manifest for caicloud/event-exporter:v1.0.0 not found: manifest unknown: manifest unknown
i'm guessing you haven't release it yet..?

If that's the case - it the src buildable in master brach?

thanks!

Event message no longer included as label in v1.0.0

Is this a BUG REPORT or FEATURE REQUEST?:
Potential bug report/regression or a feature request depending on motivations for v1.0.0

Uncomment only one, leave it on its own line:

kind bug

What happened:
The event message is no longer exported as a label.

I believe it was lost in the refactoring. I'm not sure if this was on purpose or by mistake.
This looks like the line in the PR where the event_message used to be: https://github.com/caicloud/event_exporter/pull/43/files#diff-ed648181f98484bd12541509e0ae7b5ad1d1de7674ab1721814b47ddd5c95de4L49

With the new implementation of the event metric defined here: https://github.com/caicloud/event_exporter/pull/43/files#diff-56f9d3288b78a9692046117d91600fe9b201066802b9db18b7f59d283808cd39R154
The message is now no longer present.

What you expected to happen:
I would expect message to be a label on the event metric that we could then pattern match with alerting rules.

Anything else we need to know?:
Like I said, I'm not sure if this is by design or not for v1.0.0 refactor.

Question: Why not report the timestamp

/kind feature

I'm interested in using this project, but I need to find a way to maintain order of events that are scraped in the same interval. For monitoring purposes the scrape timestamp is good enough, but if multiple events are created within a scrape interval it would be nice to have the option to sort them chronologically.

I want to be able to replay events in a timeseries way in my monitoring solution. If prometheus isn't appropriate for this do you have any suggestions on what to backup events to for analytics purposes?

Container crashes

After running for a while, I see this printed over and over:

W0904 15:26:14.287525       1 reflector.go:334] github.com/caicloud/event_exporter/store.go:111: watch of *api.Event ended with: The resourceVersion for the provided watch is too old.
W0904 15:36:14.476497       1 reflector.go:334] github.com/caicloud/event_exporter/store.go:111: watch of *api.Event ended with: The resourceVersion for the provided watch is too old.

Then:

fatal error: concurrent map writes

goroutine 31137 [running]:
runtime.throw(0x1478e97, 0x15)
	/usr/local/go/src/runtime/panic.go:566 +0x95 fp=0xc42054ebe0 sp=0xc42054ebc0
runtime.mapassign1(0x12fd240, 0xc42028b890, 0xc42054ed58, 0xc42054ed68)
	/usr/local/go/src/runtime/hashmap.go:458 +0x8ef fp=0xc42054ecc8 sp=0xc42054ebe0
github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).queueActionLocked(0xc4200904d0, 0x14657ad, 0x4, 0x143ee20, 0xc420120800, 0x1, 0x0)
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:314 +0x249 fp=0xc42054edb0 sp=0xc42054ecc8
github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).Resync(0xc4200904d0, 0x0, 0x0)
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:498 +0x179 fp=0xc42054ee88 sp=0xc42054edb0
github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc42032e0f0, 0xc420516ae0, 0xc4202ad860, 0xc420517740, 0xc42032e0f8)
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:289 +0x1b5 fp=0xc42054ef88 sp=0xc42054ee88
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc42054ef90 sp=0xc42054ef88
created by github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0x4c0

goroutine 1 [IO wait, 2756 minutes]:
net.runtime_pollWait(0x7f396b5cf870, 0x72, 0x0)
	/usr/local/go/src/runtime/netpoll.go:160 +0x59
net.(*pollDesc).wait(0xc420354990, 0x72, 0xc4204d3ab8, 0xc42001c0b0)
	/usr/local/go/src/net/fd_poll_runtime.go:73 +0x38
net.(*pollDesc).waitRead(0xc420354990, 0x1e48ba0, 0xc42001c0b0)
	/usr/local/go/src/net/fd_poll_runtime.go:78 +0x34
net.(*netFD).accept(0xc420354930, 0x0, 0x1e46e60, 0xc4202835c0)
	/usr/local/go/src/net/fd_unix.go:419 +0x238
net.(*TCPListener).accept(0xc4201b0058, 0x29e8d60800, 0x0, 0x0)
	/usr/local/go/src/net/tcpsock_posix.go:132 +0x2e
net.(*TCPListener).AcceptTCP(0xc4201b0058, 0xc4204d3be0, 0xc4204d3be8, 0xc4204d3bd8)
	/usr/local/go/src/net/tcpsock.go:209 +0x49
net/http.tcpKeepAliveListener.Accept(0xc4201b0058, 0x150cab0, 0xc420368b00, 0x1e53360, 0xc4201afb00)
	/usr/local/go/src/net/http/server.go:2608 +0x2f
net/http.(*Server).Serve(0xc420368080, 0x1e52c20, 0xc4201b0058, 0x0, 0x0)
	/usr/local/go/src/net/http/server.go:2273 +0x1ce
net/http.(*Server).ListenAndServe(0xc420368080, 0xc420368080, 0x2)
	/usr/local/go/src/net/http/server.go:2219 +0xb4
net/http.ListenAndServe(0x1465f8d, 0x5, 0x0, 0x0, 0xc4203e64b0, 0x0)
	/usr/local/go/src/net/http/server.go:2351 +0xa0
main.main()
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/main.go:107 +0x59a

goroutine 17 [syscall, 2756 minutes, locked to thread]:
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

goroutine 7 [chan receive]:
github.com/caicloud/event_exporter/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x1e6ecc0)
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/github.com/golang/glog/glog.go:879 +0x7a
created by github.com/caicloud/event_exporter/vendor/github.com/golang/glog.init.1
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/github.com/golang/glog/glog.go:410 +0x21d

goroutine 23 [syscall, 2756 minutes]:
os/signal.signal_recv(0x0)
	/usr/local/go/src/runtime/sigqueue.go:116 +0x157
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
	/usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 24 [chan receive, 2756 minutes]:
main.(*EventStore).Run(0xc4201b5220)
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/store.go:112 +0xf5
created by main.main
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/main.go:95 +0x1e4

goroutine 11122 [select, 3 minutes]:
net/http.(*persistConn).readLoop(0xc4204ee500)
	/usr/local/go/src/net/http/transport.go:1541 +0x9c9
created by net/http.(*Transport).dialConn
	/usr/local/go/src/net/http/transport.go:1062 +0x4e9

A bit more of that and then this repeating:

goroutine 15512 [select, 1 minutes]:
github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc4201b00f8, 0xc420516ae0, 0xc4202ad860, 0xc4204818c0, 0xc4201b0100)
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:283 +0x303
created by github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch
	/home/vagrant/gocode/src/github.com/caicloud/event_exporter/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:296 +0x4c0

Then container container is killed by k8s.

Here's the container information from k8s:

Containers:
  app:
    Container ID:  docker://87c603584b7d193716ecb79f70a10a8ed49e4050fc1e89412322e09a233473a9
    Image:         cargo.caicloud.io/sysinfra/event-exporter:latest
    Image ID:      docker-pullable://cargo.caicloud.io/sysinfra/event-exporter@sha256:826f54f71c3802f59164a108b87a7a5b002efccbd80e0d13c3da94140baf5c3a
    Port:          9102/TCP
    Host Port:     0/TCP
    Args:
      --logtostderr
    State:          Running
      Started:      Wed, 04 Sep 2019 14:03:32 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Mon, 02 Sep 2019 16:07:08 +0200
      Finished:     Wed, 04 Sep 2019 14:03:31 +0200
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     200m
      memory:  128Mi
    Requests:
      cpu:        100m
      memory:     128Mi
    Environment:  <none>

/kind bug

What happened:

Container crashes

What you expected to happen:

Container not to crash.

How to reproduce it (as minimally and precisely as possible):

Just run it for 1-5 days.

Docker image version is not up-to-date

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
Docker image version is still v0.0.1

What you expected to happen:
Docker image version is v0.1.0

How to reproduce it (as minimally and precisely as possible):
Run docker pull cargo.caicloud.io/sysinfra/event-exporter
Then start the container, the log says version is 0.0.1

time="2019-06-20T14:09:54+08:00" level=info msg="start event store..." source="store.go:110" 
time="2019-06-20T14:09:54+08:00" level=info msg="Starting event_exporter (version=v0.0.1, branch=doc, revision=4bc2df81dbed60cd33c379f6eabe4e3b8bcd6dac)" source="main.go:98" 
time="2019-06-20T14:09:54+08:00" level=info msg="Build context (go=go1.7.3, user=vagrant@vagrant-ubuntu-trusty-64, date=20161122-03:59:17)" source="main.go:99" 
time="2019-06-20T14:09:54+08:00" level=info msg="Listening on :9102" source="main.go:106"

Anything else we need to know?:
Would be nice to be able to run docker pull cargo.caicloud.io/sysinfra/event-exporter:v0.1.0

What is the etcd version that event_exporter using?

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
We need this information for doing some security scans on the app.

Add opportunity to filter events

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: All events are published into Prometheus

What you expected to happen: Some variable to select only certain events

How to reproduce it (as minimally and precisely as possible): N/A

Anything else we need to know?: N/A

Major Release Proposal (v1.0)

#28 has exposed a serious bug in our current design.

We are currently grouping events by the following labels:

label from
event_namespace event.InvolvedObject.Namespace
event_name event.InvolvedObject.Name
event_kind event.InvolvedObject.Kind
event_reason event.Reason
event_type event.Type
event_subobject event.InvolvedObject.FieldPath
event_message event.Message
event_source event.Source.Host/event.Source.Component

Due to the absence of both event.Name and event.UID, if one object produced multiple events with the same reason and error message at the same time, then our code would attempt to expose multiple metrics with identical label set and #28 would happen.

#36 attempted to address this problem by adding an event_metaname label. This would fix the problem, but it would make the label names confusing. It's not immediately obvious for what do event_metaname and event_name stand. Another problem related to label naming is that Kubernetes has adopted a new metrics design best practice and a metrics overhaul was implemented in Kubernetes 1.14. Our naming practice doesn't fit the standard.

For the reasons explained above, I propose that we release a major release (v1.0) that completely redefine the metrics. The code would use a clean up in the process as well. We could test the changes alone side Compass 2.11.0 (which is also going through a non-compatible metric overhaul), and release event_exporter v1.0.0 as soon as Compass 2.11.0.

Security Report on Vulnerabilities Identified Through Prisma Scan.

1. Introduction
This report summarizes the vulnerabilities identified through the Prisma scan conducted.
The identified vulnerabilities have been categorized based on their severity levels, potential impacts, and recommended actions for remediation.

2. Vulnerabilities

2.1 Critical Vulnerabilities:
Vulnerability: CVE-2020-29652
Description: A nil pointer dereference in the golang.org/x/crypto/ssh component through v0.0.0-20201203163018-be400aefbc4c for Go allows remote attackers to cause a denial of service against SSH servers.

2.2 High Vulnerabilities:
Vulnerability: CVE-2023-44487
Description: The HTTP/2 protocol allows a denial of service (server resource consumption) because request cancellation can reset many streams quickly, as exploited in the wild in August through October 2023.

3. How to reproduce it (as minimally and precisely as possible):
Scan your image via Prisma and you will see the issues.
See screenshot for more details

4. Conclusion
The Prisma scan identified several vulnerabilities in the environment.
Immediate attention should be given to critical and high-severity vulnerabilities to mitigate potential risks.
Medium and low-severity vulnerabilities should also be addressed in a timely manner to strengthen the security posture.
Continuous monitoring and regular vulnerability assessments are recommended to ensure ongoing security.

Please review this report and prioritize the remediation efforts accordingly.

image
Screenshot from 2024-07-23 15-51-05
Screenshot from 2024-07-23 15-52-06

* collected metric "kubernetes_events" was collected before with the same name and label values

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

Error returned to prometheus scraping target: server returned HTTP status 500 Internal Server Error

An error has occurred while serving metrics:
2 error(s) occurred:
* collected metric "kubernetes_events" { label:<name:"event_kind" value:"Ingress" > label:<name:"event_message" value:"Ingress xxx/docker-registry-cache-docker-registry-caching-proxy" > label:<name:"event_name" value:"docker-registry-cache-docker-registry-caching-proxy" > label:<name:"event_namespace" value:"xxx" > label:<name:"event_reason" value:"UPDATE" > label:<name:"event_source" value:"/nginx-ingress-controller" > label:<name:"event_subobject" value:"" > label:<name:"event_type" value:"Normal" > gauge:<value:1 > } was collected before with the same name and label values
* collected metric "kubernetes_events" { label:<name:"event_kind" value:"Ingress" > label:<name:"event_message" value:"Ingress xxx/docker-registry-cache-docker-registry-caching-proxy" > label:<name:"event_name" value:"docker-registry-cache-docker-registry-caching-proxy" > label:<name:"event_namespace" value:"xxx" > label:<name:"event_reason" value:"UPDATE" > label:<name:"event_source" value:"/nginx-ingress-controller" > label:<name:"event_subobject" value:"" > label:<name:"event_type" value:"Normal" > gauge:<value:1 > } was collected before with the same name and label values

What you expected to happen:

An healthy endpoint with the healthy metrics and the error in the logs.

How to reproduce it (as minimally and precisely as possible):

Specific to one of our production cluster, unfortunately could not reproduce.

Anything else we need to know?:

Checking the events with kubectl we do see three events created at the same time with the same TYPE/REASON/OBJECT

docs to integrate with prometheus

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

There is no docs about how to integrate with prometheus & alertmanager

What you expected to happen:

There is docs about how to integrate with prometheus & alertmanager

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.