Giter Club home page Giter Club logo

heapster's Introduction

Heapster

RETIRED: Heapster is now retired. See the deprecation timeline for more information on support. We will not be making changes to Heapster.

The following are potential migration paths for Heapster functionality:

  • For basic CPU/memory HPA metrics: Use metrics-server.

  • For general monitoring: Consider a third-party monitoring pipeline that can gather Prometheus-formatted metrics. The kubelet exposes all the metrics exported by Heapster in Prometheus format. One such monitoring pipeline can be set up using the Prometheus Operator, which deploys Prometheus itself for this purpose.

  • For event transfer: Several third-party tools exist to transfer/archive Kubernetes events, depending on your sink. heptiolabs/eventrouter has been suggested as a general alternative.

GoDoc Build Status Go Report Card

Heapster enables Container Cluster Monitoring and Performance Analysis for Kubernetes (versions v1.0.6 and higher), and platforms which include it.

Heapster collects and interprets various signals like compute resource usage, lifecycle events, etc. Note that the model API, formerly used provide REST access to its collected metrics, is now deprecated. Please see the model documentation for more details.

Heapster supports multiple sources of data. More information here.

Heapster supports the pluggable storage backends described here. We welcome patches that add additional storage backends. Documentation on storage sinks here. The current version of Storage Schema is documented here.

Running Heapster on Kubernetes

Heapster can run on a Kubernetes cluster using a number of backends. Some common choices:

Running Heapster on OpenShift

Using Heapster to monitor an OpenShift cluster requires some additional changes to the Kubernetes instructions to allow communication between the Heapster instance and OpenShift's secured endpoints. To run standalone Heapster or a combination of Heapster and Hawkular-Metrics in OpenShift, follow this guide.

Troubleshooting guide here

Community

Contributions, questions, and comments are all welcomed and encouraged! Developers hang out on Slack in the #sig-instrumentation channel (get an invitation here). We also have the kubernetes-dev Google Groups mailing list. If you are posting to the list please prefix your subject with "heapster: ".

heapster's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

heapster's Issues

Heapster stops pushing data into Influxdb on Deis cluster

I have a Deis cluster set up on AWS and I've tried using heapster to monitor it, but it pushes data into influx db only for a while after start / restart and then stops. I set up everything as explained in this guide. Here are my configs. Am I doing something wrong or heapster simply wont work with deis?

core@ip-10-21-1-34 ~/monitoring $ cat cadvisor.service 
[Unit]
Description=cAdvisor Service
After=docker.socket
Requires=docker.socket

[Service]
TimeoutStartSec=10m
Restart=always
ExecStartPre=-/usr/bin/docker kill cadvisor
ExecStartPre=-/usr/bin/docker rm -f cadvisor
ExecStartPre=/usr/bin/docker pull google/cadvisor
ExecStart=/usr/bin/docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --publish=4194:4194 --publish=8080:8080 --name=cadvisor --net=host google/cadvisor:latest --logtostderr
ExecStop=/usr/bin/docker stop -t 2 cadvisor

[X-Fleet]
Global=true
core@ip-10-21-1-34 ~/monitoring $ cat influxdb.service 
[Unit]
Description=InfluxDB Service
After=docker.socket
Requires=docker.socket

[Service]
TimeoutStartSec=10m
Restart=always
ExecStartPre=-/usr/bin/docker kill influxdb
ExecStartPre=-/usr/bin/docker rm -f influxdb
ExecStartPre=/usr/bin/docker pull kubernetes/heapster_influxdb
ExecStart=/usr/bin/docker run --name influxdb -p 8083:8083 -p 8086:8086 -p 8090:8090 -p 8099:8099 kubernetes/heapster_influxdb
ExecStop=/usr/bin/docker stop -t 2 influxdb

[X-Fleet]
MachineID=60023ec21a6643ebb0a71a54454d1bcc
core@ip-10-21-1-34 ~/monitoring $ cat heapster.service 
[Unit]
Description=Heapster Service
After=docker.socket
Requires=docker.socket

[Service]
ExecStartPre=/bin/sh -c "docker history kubernetes/heapster:latest >/dev/null || docker pull kubernetes/heapster:latest"
ExecStart=/usr/bin/docker run -d -e INFLUXDB_HOST=10.21.***.***:8086 -e COREOS=true --name heapster --net=host --restart=on-failure:5 --publish=8082:8082 kubernetes/heapster:latest

[X-Fleet]
MachineOf=influxdb.service
core@ip-10-21-1-34 ~/monitoring $ cat grafana.service 
[Unit]
Description=Grafana service
Requires=docker.socket
After=docker.socket

[Service]
ExecStartPre=/bin/sh -c "docker history kubernetes/heapster_grafana:latest >/dev/null || docker pull kubernetes/heapster_grafana:latest"
ExecStart=/usr/bin/docker run -d -p 8081:80 -e INFLUXDB_HOST=52.11.***.*** -e INFLUXDB_NAME=k8s -e HTTP_USER=admin -e HTTP_PASS=****** kubernetes/heapster_grafana:v0.4

[X-Fleet]
MachineOf=heapster.service

Heapster logs:

core@ip-10-21-1-34 ~/monitoring $ docker logs -f heapster         
+ EXTRA_ARGS=
+ '[' '!' -z true ']'
+ EXTRA_ARGS=' --coreos'
+ '[' '!' -z ']'
+ '[' '!' -x ']'
+ HEAPSTER='/usr/bin/heapster  --coreos '
+ '[' '!' -z ']'
+ '[' '!' -z 10.21.***.***:8086 ']'
+ /usr/bin/heapster --coreos --sink influxdb --sink_influxdb_host 10.21.***.***:8086
I0311 10:28:25.571350       7 heapster.go:44] /usr/bin/heapster --coreos --sink influxdb --sink_influxdb_host 10.21.1.34:8086
I0311 10:28:25.571446       7 heapster.go:45] Heapster version 0.8
I0311 10:28:25.571555       7 heapster.go:46] Flags: alsologtostderr='false' bq_account='' bq_credentials_file='' bq_id='' bq_project_id='' bq_secret='notasecret' cadvisor_port='8080' coreos='true' external_hosts_file='/var/run/heapster/hosts' fleet_endpoints='http://127.0.0.1:4001' kubelet_port='10250' kubernetes_insecure='true' kubernetes_master='' listen_ip='' log_backtrace_at=':0' log_dir='' logtostderr='true' max_procs='0' poll_duration='10s' port='8082' sink='influxdb' sink_influxdb_buffer_duration='10s' sink_influxdb_host='10.21.1.34:8086' sink_influxdb_name='k8s' sink_influxdb_password='root' sink_influxdb_username='root' sink_memory_ttl='1h0m0s' stderrthreshold='2' v='0' vmodule=''
I0311 10:28:25.571585       7 influxdb.go:255] Using influxdb on host "10.21.1.34:8086" with database "k8s"
I0311 10:28:25.573334       7 heapster.go:57] Starting heapster on port 8082

Provide binaries as part of the release

I believe it would be beneficial to users if pre-built binaries were provided as part of the release. Kubernetes and cAdvisor do this which makes it simpler to get started.

Collecting application level metrics

Currently Heapster collects metrics from cAdvisor to gather container level metrics. Collecting application level metrics would provide an even richer set of data. Applications would be required to implement an http endpoint to expose the metrics that it wants to publish.

As a PoC, we have a heapster fork, jadvisor (https://github.com/fabric8io/jadvisor), that collects JVM stats via JMX exposed via Jolokia (http://jolokia.org/) for http access to JMX. Currently the stats that are collected are hard-coded, or rather discovered by what is available in the JMX tree. I would like to make this more generic so it can handle any http endpoint & hence be stack agnostic.

The difference that this would introduce to heapster is the need to connect to individual pods rather than just to the kubelet/cadvisor so would increase the number of connections made significantly. This would likely require work on clustering & sharding to implement properly.

export diskio stats

cAdvisor supports diskio stats. It needs to be exported to the storage backends.

Collect and export kubernetes events and docker events

InfluxDB supports events natively. Other backends might not. We can place structure around some specific events like container starts, deletions, restarts, etc and export them as metrics, but that might be sub-optimal.
Need to explore how events is currently handled by various monitoring/alerting systems.

heapster and kubernetes: not getting any container stats

I'm currently testing heapster with kubernetes v0.5 and cadvisor 0.6.2.

Grafana doesn't display any graphs with per container stats (total cluster CPU and memory usage is displayed fine), it's just giving the error "Couldn't look up columns for series: stats"

Heapster is also displaying errors constantly:

E1121 09:30:41.665006 00007 kube.go:104] failed to get stats from kubelet on host with ip   192.168.200.230 - Got 'Internal Error: unable to unmarshal container info for "/docker/35c89c2b64d4d76ffda3cd7e3797273a6ba829bae81ca15364ed35765b546fbb" (failed to get container "/docker/35c89c2b64d4d76ffda3cd7e3797273a6ba829bae81ca15364ed35765b546fbb" with error: unknown container "/docker/35c89c2b64d4d76ffda3cd7e3797273a6ba829bae81ca15364ed35765b546fbb"
): invalid character 'i' in literal false (expecting 'l')
': invalid character 'I' looking for beginning of value

So I guess there's some problem with getting the data from cadvisor into InfluxDB?

Best regards,
Julian

Parallelize Heapster Core

Allow the Heapster core to fetch and export multiple streams simultaneously. This should also entail work to allow a single stream to fall behind and be robust to node failures.

Incorrect data type NodeList vs MinionList

I'm running Kubernetes 0.7.0 and Heapster 0.4.0 and I noticed the Heapster container exits after ~10 seconds. I see in its logs:

heapster.go:22] data of kind 'NodeList', obj of type 'MinionList'

CPU spike

Is this normal?

Almost 60% cpu usage on a 2gb ram vm .. fedora 20 in digitalocean, yet it's not running in the first place:

[root@k8-master heapster]# k8 get pods
NAME                                   IMAGE(S)                       HOST                LABELS                             STATUS
548e1e240e7bc2780d000003               fedora/apache:latest           <unassigned>        name=548e1e240e7bc2780d000003      Pending
27a27940-83d9-11e4-bea6-040135044f01   kubernetes/heapster_influxdb   188.226.142.107/    name=influxGrafana                 Failed
                                       kubernetes/heapster_grafana                                                           
                                       dockerfile/elasticsearch                                                              
ffed25fd-83e4-11e4-bea6-040135044f01   kubernetes/heapster            188.226.142.107/    name=heapster,uses=influx-master   Failed
55de4dd2-83e8-11e4-bea6-040135044f01   kubernetes/heapster_influxdb   <unassigned>        name=influxGrafana                 Pending
                                       kubernetes/heapster_grafana                                                           
                                       dockerfile/elasticsearch                                                              
66471f57-83e3-11e4-bea6-040135044f01   kubernetes/heapster            188.226.142.107/    name=heapster,uses=influx-master   Failed

marathon

Does it support getting cadvisor nodes from marathon ?

Duplicate data sent to sinks

While writing the GCM sink I found that Heapster sent a lot of duplicate data to the sinks. Something along the lines of:

Container: / - Time: 0s-10s
Container: / - Time: 0s-10s
Container: / - Time: 0s-10s
Container: / - Time: 30s-40s
Container: / - Time: 30s-40s
Container: / - Time: 30s-40s
Container: / - Time: 60s-70s
Container: / - Time: 60s-70s
Container: / - Time: 60s-70s

Make heapster work outside of GCE

Currently Kubernetes does not expose the public IP of services in GCE. This makes it necessary to obtain public IP of Influxdb service using GCE's metadata servers. If there is no public IP restriction on some of the supported Kubernetes platforms, heapster can be made to work by splitting influxdb and grafana into separate services.

cc @bmaltais

grafana images keeps restarting on vagrant

deployed this on kuberentes running on virtual box using vagrant. The grafana image keeps failing continuously.

[vagrant@kubernetes-minion-3 ~]$ sudo docker events
[2014-08-30 06:23:27 +0000 UTC] 89317b9961b190fd4d00273f91e04342f6f7208a3e6c111011770551bbca6673: (from vish/k8s_grafana:latest) die
[2014-08-30 06:23:36 +0000 UTC] 0e0427e166bc57924fd52d8631d7e8211a6dca4adc3541ee3d36800b01c486f6: (from vish/k8s_grafana:latest) create
[2014-08-30 06:23:36 +0000 UTC] 0e0427e166bc57924fd52d8631d7e8211a6dca4adc3541ee3d36800b01c486f6: (from vish/k8s_grafana:latest) start

grafana-influxdb-pod.json contains extra ','

I could not get grafana-influxdb-pod.json to properly install on kubernetes. I would just sit there at the waiting stage... so I dug a bit and noticed two extra ',' in the json file. Removing those has fixed the issue.

You mighty want to update the file ;-) Here is the file with the fix:

{
"id": "influx-grafana",
"kind": "Pod",
"apiVersion": "v1beta1",
"desiredState": {
"manifest": {
"version": "v1beta1",
"id": "influxdb",
"containers": [{
"name": "influxdb",
"image": "vish/k8s_influxdb",
"ports": [{"containerPort": 8086, "hostPort": 8086},
{"containerPort": 8083, "hostPort": 8083},
{"containerPort": 8090, "hostPort": 8090},
{"containerPort": 8099, "hostPort": 8099}]
}, {
"name": "grafana",
"image": "vish/k8s_grafana",
"ports": [{"containerPort": 80, "hostPort": 80}],
"env": [{"name": HTTP_USER, "value": admin},
{"name": HTTP_PASS, "value": admin}]
}, {
"name": "elasticsearch",
"image": "dockerfile/elasticsearch",
"ports": [{"containerPort": 9200, "hostPort": 9200},
{"containerPort": 9300, "hostPort": 9300}]
}]
},
},
"labels": {
"name": "influxdb",
}
}

deploy/run.sh doesn't detect COREOS env var

I've been trying to get Heapster up and running on a CoreOS cluster, and have been having issues. Using the v0.7 images, I get the following output from the container:

Feb 23 12:36:05 <snip> docker[18078]: Status: Image is up to date for kubernetes/heapster:v0.7
Feb 23 12:36:05 <snip> systemd[1]: Started Heapster Agent Service.
Feb 23 12:36:06 <snip> docker[18086]: + EXTRA_ARGS=
Feb 23 12:36:06 <snip> docker[18086]: + '[' '!' -z ']'
Feb 23 12:36:06 <snip> docker[18086]: + '[' '!' -z ']'
Feb 23 12:36:06 <snip> docker[18086]: + '[' '!' -x ']'
Feb 23 12:36:06 <snip> docker[18086]: + HEAPSTER='/usr/bin/heapster  '
Feb 23 12:36:06 <snip> docker[18086]: + '[' '!' -z ']'
Feb 23 12:36:06 <snip> docker[18086]: + '[' '!' -z 10.10.0.14:8086 ']'
Feb 23 12:36:06 <snip> docker[18086]: + /usr/bin/heapster --sink influxdb --sink_influxdb_host 10.10.0.14:8086
Feb 23 12:36:06 <snip> docker[18086]: I0223 12:36:06.631846       8 heapster.go:44] /usr/bin/heapster --sink influxdb --sink_influxdb_host 10.10.0.14:8086
Feb 23 12:36:06 <snip> docker[18086]: I0223 12:36:06.632198       8 heapster.go:45] Heapster version 0.7
Feb 23 12:36:06 <snip> docker[18086]: I0223 12:36:06.632341       8 heapster.go:46] Flags: alsologtostderr='false' bq_account='' bq_credentials_file='' bq_id='' bq_project_id='' bq_secret='notasecret' cadvisor_port='8080' coreos='false' external_hosts_file='/var/run/heapster/hosts' fleet_endpoints='http://127.0.0.1:4001' kubelet_port='10250' kubernetes_insecure='true' kubernetes_master='' listen_ip='' log_backtrace_at=':0' log_dir='' logtostderr='true' max_procs='0' poll_duration='10s' port='8082' sink='influxdb' sink_influxdb_buffer_duration='10s' sink_influxdb_host='10.10.0.14:8086' sink_influxdb_name='k8s' sink_influxdb_password='root' sink_influxdb_username='root' sink_memory_ttl='1h0m0s' stderrthreshold='2' v='0' vmodule=''
Feb 23 12:36:06 <snip> docker[18086]: I0223 12:36:06.632825       8 influxdb.go:255] Using influxdb on host "10.10.0.14:8086" with database "k8s"
Feb 23 12:36:07 <snip> docker[18086]: I0223 12:36:07.642797       8 heapster.go:57] Starting heapster on port 8082

The command used to run it is (as per https://github.com/GoogleCloudPlatform/heapster/tree/master/clusters/coreos):

/usr/bin/docker run --name %n -e INFLUXDB_HOST=10.10.0.14:8086 -e COREOS kubernetes/heapster:v0.7

As can be seen, the COREOS environmental variable isn't picked up, because it's empty. [ ! -z $COREOS ] checks for not null, rather than checking if the variable exists. Changing to [ -v COREOS ] will check correctly for it's existance.

I applied this patch (3a10535) manually, by changing run.sh on the container, and it seems fix the issue. I can't seem to get Heapster to build to test properly though (I've not used Go before).

IPV6 only constraint possibly

Not sure if the issue is environmental (only me) or configuration baked into heapster and grafana pre-canned images. Every time I test the URL (port 80) to the minion running the heapster pod I get a "connection refused" error. All ports are open and this is reproducable even from a curl command running via SSH locally on the heapster minion. It appears to be web server config versus something at the infrastructure layer.

I noticed the 'default' file in the grafana folder has "IPV6only=on" as a config item. Potentially the same in the native tutum/nginx underlying image being called. Tried setting to "off" and recreating a custom image with no success.

As stated, this could be environmental, but the presence of the "IPV6only=on" suggests this could be the reason NGINX is rejecting connections from my client browser.

Environment is running on coreos on AWS via a CloudFormation template.

Injest Prometheus metrics.

Prometheus client model is being adopted for supporting custom metrics from Kubernetes core components. Heapster needs to be able to scrape and parse Prometheus metrics, in-order to annotate them and make them available to various backends.

High CPU usage for heapster v0.6

I updated to latest version of heapster(kubernetes/heapster:v0.6) and influxdb(kubernetes/heapster_influxdb:v0.3) and found high CPU usage of influxdb.

I downgrade to heapster_influxdb:v0.2, still got high load, then tried heapster:v0.5, the cpu loads seemed normal

heapter v0.5 + heapster_influxdb:v0.3:
v0 5

heapter v0.6 + heapster_influxdb:v0.3:
v0 6

cAdvisor link

the cAdvisor link has an extra "https://" so the link is broken

Separate Sources into Collectors

Separate sources into collectors under a stronger API to allow us to collect from various sources and make that more configurable.

This will require us to annotate data outside of the collectors.

Add a REST endpoint to expose resource usage at various levels

Since heapster aggregates data across all nodes and containers, it can and probably should expose resource usage data aggregated at various levels,

  1. container,
  2. Pod,
  3. Service/Controller,
  4. labels
    Instantaneous metrics will be useful for UI purposes and derives metrics (mean, 90%ile, max, etc) will be useful for end users.

Add a deployment tool for kubernetes

As of heapster setup and troubleshooting is difficult at times. Having a deployment tool, that verifies that cadvisor is installed on every node, sets up all the heapster components, and ensures that all the components are up and running. This tool could also help configure backends and dynamically generate kube configs.

Separate Sinks with an API

Sinks today are able to export data to a certain backend. We need to separate these so that we may run them out of process of the main Heapster. This will require a clear API for this.

Suggestion: use InfluxDB continuous queries

In real life scenarios the performance of Grafana dashboard is abysmal and metrics are plain wrong when graphs are built on top of stats table directly. I suggest to use continuous queries.

Why wrong? Imagine a single Docker container with the same name deployed on different machines. There will be multiple rows from different machines, thus derivative(cpu_cumulative_usage) will return random values because there is no group by hostname.
Continuous queries also allows to slice the metric across more than one dimension, thus side-stepping Grafana limitations.

https://github.com/arkadijs/heapster/blob/master/sinks/influxdb.go#L298

Note that group by time(10s) must be twice as large as -poll_duration (5s in my fork), else InfluxDB won't see lagged data. AFAIK, this is going to change in InfluxDB 0.9.

Also, I don't know you plans for disk-io stats, but filesystem stats greatly multiplies the volume of data as they are added as separate rows per filesystem. Consider writing them on lower cadence. Or aggregating (which is not an entirely correct approach) across the filesystems and pushing into the same record.

Push metrics to backends in a separate thread

As of now metrics are pushed to backends synchronously. This can cause housekeeping delays. Instead, cache the metrics internally and use a separate go thread to export metrics to backends.

Make heapster resilient to kubelet failures

As of now heapster fails stats collection completely if kubelet fails to return stats for a container. This is undesirable. heapster should log error and move on instead of failing completely.

lastQuery handling in cAdvisor source is wrong

There are two issues:

  1. lastQuery is not set in the loop, resulting in data duplication, because cAdvisor returns last 64 stats instead. Example fix:
    arkadijs@2fc23c2#diff-3917589e63e61c59289ec2acb14ebcc8R66
  2. On a medium to large cluster with medium to large number of containers the logic of one lastQuery per all peers would be flawed anyway. It leads to missing data points because Heapster+cAdvisor are not fast enough and Heapster queries all cAdvisor instances serially. A crappy fix that works in practice:
    arkadijs@2fc23c2#diff-3917589e63e61c59289ec2acb14ebcc8R37

Metrics pub-sub

While thinking about other services that would require metrics, e.g. alerting, autoscaling, etc, switching to a pub-sub model would allow for services to build around the metrics data. For example, if heapster could publish metrics to e.g. Apache Spark / Kafka then other services could subscribe to topics & provide functionality.

We could split up topics into a hierarchy something like namespace / pod / container, aggregate at each level & enforce security at those layers too.

Not getting graphs on heapster UI.

Below are the pods running on my system -

NAME | IMAGE(S) | HOST | LABELS | STATUS
f95a0a29-a78e-11e4-b11e-984be16ce9ba | kubernetes/heapster_influxdb:v0.2 | 10.227.125.10/10.227.125.10 | name=influxGrafana | Running
kubernetes/heapster_grafana:v0.2

773cf11c-a78f-11e4-b11e-984be16ce9ba | kubernetes/heapster:v0.5 | 10.227.125.10/10.227.125.10 | name=heapster,uses=monitoring-influxdb | Running
cf7bfe93-a78e-11e4-b11e-984be16ce9ba | google/cadvisor:0.8.0 | 10.227.125.10/10.227.125.10 | name=cadvisor | Running
cf7bb0a1-a78e-11e4-b11e-984be16ce9ba | google/cadvisor:0.8.0 | 10.227.125.3/10.227.125.3 | name=cadvisor | Running

Have two minions -

NAME LABELS
10.227.125.10
10.227.125.3

When I query curl -v http://10.227.125.10:10250/stats/, I get below errors -

  • Hostname was NOT found in DNS cache
  • Trying 10.227.125.10...
  • Connected to 10.227.125.10 (10.227.125.10) port 10250 (#0)

    GET /stats/ HTTP/1.1
    User-Agent: curl/7.35.0
    Host: 10.227.125.10:10250
    Accept: /

    < HTTP/1.1 500 Internal Server Error
    < Content-Type: text/plain; charset=utf-8
    < Date: Thu, 29 Jan 2015 09:07:15 GMT
    < Content-Length: 39
    <
    Internal Error: no cadvisor connection
  • Connection #0 to host 10.227.125.10 left intact

While trying with curl -v http://10.227.125.3:10250/stats/heapster/heapster/, getting below errors -

  • Hostname was NOT found in DNS cache
  • Trying 10.227.125.3...
  • Connected to 10.227.125.3 (10.227.125.3) port 10250 (#0)

    GET /stats/heapster/heapster/ HTTP/1.1
    User-Agent: curl/7.35.0
    Host: 10.227.125.3:10250
    Accept: /

    < HTTP/1.1 200 OK
    < Date: Thu, 29 Jan 2015 09:19:06 GMT
    < Content-Length: 2
    < Content-Type: text/plain; charset=utf-8
    <
  • Connection #0 to host 10.227.125.3 left intact
    {}

No graph displayed at heapster UI. Please suggest where it is going wrong?

Suggestion: get rid of the CoreOS / "buddy"

While flexible, the buddy / hosts file things complicates the deployment.
I ended up moving Fleet logic into Heapster and getting rid of noticeable chunk of the code. As there are no currently other types of the buddies (Mesos?), I suggest doing something like the following:
arkadijs@7c4fc6c

  1. If there is Kubernetes - use listMinions
  2. Otherwise try Fleet.

The code is little bit messy - coupling KubeSource to ExternalSource, but my baseline is CoreOS with Deis, with optional Kubernetes, thus such practical solution.

Heapster fails to get stats from kubelet

Potentially related to the push of heapster 0.7. I've been seeing all heapster requests to my kubelets' /stats/ endpoints failing. I initially reported this as a Kubelet issue, but it looks possible that the new heapster added some sort of incompatibility with the kubelet.

The errors all look like this, and I expect it's very reproducible, as both clusters I've created since Friday have had these errors:

E0223 18:32:13.471559       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.173.100:10250/stats/default/kube-dns-0133o/70de926d-bb1a-11e4-b34e-42010af06d77/etcd - Got 'Internal Error: failed to retrieve cadvisor stats
': invalid character 'I' looking for beginning of value
E0223 18:32:33.378167       9 kubelet.go:80] failed to get stats from kubelet url: http://10.240.147.240:10250/stats/ - Got 'Internal Error: received empty response from "container info for \"/\""
': invalid character 'I' looking for beginning of value

Can't load data

i setup Heapster in a Kubernetes cluster with an Influxdb backend and Grafana like in the README.
I can log into Grafana using admin/admin and into the InfluxDB UI using root/root.
In Grafana UI i've got this error :

InfluxDB Error : Clouldn't look up columns

In the InfluxDB UI, i click "Explore Data" for the k8s database, and i try this request ๐Ÿ‘

select * from /.*/ limit 1

And i've got an error message :

ERROR : Clouldn't look up columns

Cannot override service name for influxdb storage when running in kubernetes

As ENTRYPOINT is set to /run.sh this cannot be overridden in kubernetes where you can only specify a command. run.sh has a hard-coded name for the InfluxDB service as INFLUX_MASTER.

Switching to CMD rather than ENTRYPOINT would allow users to deploy & just run /usr/bin/heapster, passing in appropriate arguments rather than using /run.sh which would of course remain default if no argument is specified.

Influxdb clustering

Unless I'm missing something, currently the heapster manifests for influxdb will only support a single Influxdb instance. Clustering Influxdb will probably be necessary for any reasonable size deployment. I assume this would be achieved by updating the seeds in the config.toml but I'm not sure of the best way to retrieve those seed servers?

Proxy calls to InfluxDB with nginx

So, having something like

yaml
- name: "INFLUXDB_HOST"
  value: '"+window.location.hostname+"/api/v1beta1/proxy/services/monitoring-influxdb'

is very limitative when it comes to running influxdb + grafana, especially inside Kubernetes.

See my suggested solution.

Heapster CoreOS buddy assumes fleet server in running on host

As per @andrewwebber suggestion, the coreOS buddy needs to take in fleet server information via a flag.

@MaheshRudrachar reported this issue:
I have setup 5 CoreOS instances on AWS and followed kelseyhightower kubernetes-fleet-tutorial.
Basically I have
1 ETCD dedicated Server,
3 Minions with 1 Minion acting as API Server and
1 dedicated Minion for setting up Heapster.
All Minions are pointing to ETCD dedicated server.

Now when I ran units which are mentioned above:

  1. cAdvisor, Influxdb & Grafana - Works fine with status running

Heapster Agent: Fails with log message as follows:
Starting Heapster Agent Service...
heapster-agent
Pulling repository vish/heapster-buddy-coreos
Started Heapster Agent Service.
INFO log.go:73: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001:
ERROR log.go:81: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100m
timeout reached
heapster-agent.service: main process exited, code=exited, status=1/FAILURE
Unit heapster-agent.service entered failed state.

Heapster: Fails with log message as follows:
Pulling repository kubernetes/heapster
Started Heapster Agent Service.
/usr/bin/heapster --sink influxdb --sink_influxdb_host 127.0.0.1:8086
Heapster version 0.2
Cannot stat hosts_file /var/run/heapster/hosts. Error: stat /var/run/heapster/hosts: no such file or directory
heapster.service: main process exited, code=exited, status=1/FAILURE
heapster
Unit heapster.service entered failed state.
heapster.service holdoff time over, scheduling restart.
Stopping Heapster Agent Service...
Starting Heapster Agent Service...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.