Giter Club home page Giter Club logo

origin-metrics's Introduction

Note

How metrics is installed has changed from using the deployer pod to using ansible.

Please see the OpenShift Metrics installation instructions for more information.

origin-metrics

About

Origin Metrics is designed to gather container, pod and node metrics from across an entire OpenShift cluster. These metrics can then be viewed in the OpenShift Console or exported to another system.

It achieves this goals via these main components:

Heapster

Heapster gathers the metrics from across the OpenShift cluster. It retrieves metadata associated with the cluster from the master API and retrieves individual metrics from the /stats endpoint which is exposed on each individual OpenShift node.

It gathers system level metrics such as CPU, Memory, Network, etc

Heapster will then send these metrics to Hawkular Metrics and expose a REST endpoint that the horizontal pod autoscaler (HPA) uses.

Hawkular Metrics

Hawkular Metrics is the metric storage engine from the Hawkular project. It provides means of creating, accessing and managing historically stored metrics via an easy to use json based REST interface.

Hawkular Metrics uses Cassandra as its metric datastore.

It receives metrics from the Heapster component and is also used to expose metrics to the console and other third party systems.

Cassandra

Cassandra is the data store for Hawkular Metrics. It stores and persists all the metrics coming in from Hawkular Metrics.

Hawkular OpenShift Agent

The Hawkular OpenShift Agent is a component which can be used to gather application level metrics coming from pods. This allows pods themselves to expose metrics that they wish to be collected.

This component runs as a deamon set across the OpenShift cluster and is responsible for gathering application level metrics from each pod running on that node.

By default we deploy the agent to the default project. By deploying to the default project, the agent will be able to monitor all pods even if running with the ovs_multitenant plugin.

Note

The Hawkular OpenShift Agent is currently in Tech Preview.

Before You Begin

The Metrics components requires that OpenShift is properly installed and configured. How to properly install and configure OpenShift is beyond the scope of this document. Please see the OpenShift documentation on how to properly install and setup your system.

The metrics install requires integation with the OpenShift cluster and if there are installation or configuration problems with your OpenShift cluster you may encounter them when tyring to running metrics. Some of the more common issues are checked in the deployer pod and will give you an error message about your installation.

Please see the troubleshooting guide for a complete list of things to watch out for.

Installing

The OpenShift documentation describes how to install metrics.

Note

The old way of deploying metrics which rely on using the deployer pod is now deprecated. For the old instructions please see the deployer installation page.

Deploying the Hawkular OpenShift Agent

A few steps are required to deploy the Hawkular OpenShift Agent into OpenShift.

You will first need to create the ConfigMap that the Agent uses to configure itself:

$oc create -f hawkular-agent/hawkular-openshift-agent-configmap.yaml -n default

You will then need to process the Agent’s template and deploy its components:

$oc process -f hawkular-agent/hawkular-openshift-agent.yaml | oc create -n default -f -

Finally you will need to grant the `hawkular-openshift-agent’s service acount the proper permissions:

$oc adm policy add-cluster-role-to-user hawkular-openshift-agent system:serviceaccount:default:hawkular-openshift-agent

Once this is completed, the Hawkular OpenShift Agent should be deployed into the default project.

If you are running the latest OpenShift, you should be able to verify that the agent is functioning by seeing extra metrics showing up under the agent’s pod metric page.

For instructions on how to expose custom metrics on your own pod and for more information about the agent itself, please the the Hawkular OpenShift Agent project.

Accessing Metrics Directly

If you wish to access and manage metrics directly, you can do so via the Hawkular Metrics REST API. This will allow you to directly access the raw metrics data and export it for use in your own customized systems.

For more information please see the Hawkular Metrics page.

Accessing the Hawkular Metrics Python Client

The Hawkular Metrics pod has been configured so that the Hawkular Python client is installed and given admin priviledges by default. This will allow you to check and query metrics directly from within the Hawkular Metric pods.

The command to run the client is client and can be run from the terminal within the OpenShift console or via the oc exec command.

Please see the Hawkular Metrics python client project for more information.

Accessing Heapster Directly

The Heapster deployed as part of origin-metrics is configured to be only accessible via the API Proxy. Access will require cluster-admin privileges or modifications to your policy to allow another user access to this API Proxy endpoint.

For example, to reach the Heapster metrics endpoint, you would need to access it by doing something like:

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXX" \
       -X GET https://${KUBERNETES_MASTER}/api/v1/proxy/namespaces/openshift-infra/services/https:heapster:/api/v1/model/metrics

For more information about Heapster and how to access its APIs, please refer the Heapster project.

Cleanup

If you wish to undeploy and remove everything deployed by the deployer, the follow commands can be used:

$ oc delete all,secrets,sa,templates --selector=metrics-infra -n openshift-infra
Note

The persistent volume claim will not be deleted by the above command. If you wish to permanently delete the data in persistent storage you can run oc delete pvc --selector=metrics-infra

If you wish to remove the deployer’s components themselves

$ oc delete sa,secret metrics-deployer -n openshift-infra

Docker Containers

All the docker images for Origin Metric components are available at docker hub and there should not be a need to build these directly.

If you wish to build your own images or hack on the project. Please see the build instructions.

Known Issues

Please see the known issues page in the documentation.

origin-metrics's People

Contributors

bbguimaraes avatar burmanm avatar caruccio avatar caugello avatar danmcp avatar ekuric avatar hodrigohamalho avatar jayunit100 avatar jcantrill avatar jimmidyson avatar jotak avatar jpkrohling avatar jsanda avatar jshaughn avatar kurktchiev avatar liggitt avatar mwringe avatar openshift-merge-robot avatar rubenvp8510 avatar sdodson avatar simon3z avatar smarterclayton avatar sosiouxme avatar stevekuznetsov avatar tsegismont avatar vbehar avatar yupengzte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

origin-metrics's Issues

Build Heapster Image

We should be building our own Heapster image. This will allow us to dictate the exact version we want to build and what base image we want to build upon.

hawkular-cassandra CrashLoopBackOff due to "Unable to gossip with any seeds"

When I deploy origin-metrics in openshift-infra project hawkular-cassandra pod always CrashLoopBackOff due to "Unable to gossip with any seeds" .
errlog:
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1328)
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:754)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:688)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:580)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:488)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595)

cassandra.yaml in container:
seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
- class_name: org.hawkular.openshift.cassandra.OpenshiftSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: ",,"
- seeds: "hawkular-cassandra-1-mfzr2"
........
listen_address: hawkular-cassandra-1-mfzr2
........

all nodes of origin cluster are in same security group of aws ec2 and the sg is allow all traffic in sg

Heapster logs too verbose

Heapster seems to be logging (in its own logs) pretty much all the events it's gathering and every kubelet access:

I1027 17:08:25.006839       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/luke/mysql-1-qooo6/2ffbcba9-7cec-11e5-8032-fa163e7fc268/mysql"
I1027 17:08:25.006934       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/luke2/dancer-mysql-example-1-yoaty/305ce7d2-7cec-11e5-8032-fa163e7fc268/dancer-mysql-example"
I1027 17:08:25.007080       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/default/router-2-ldovr/754368be-7ce0-11e5-8032-fa163e7fc268/router"
I1027 17:08:25.007207       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/metrics/hawkular-cassandra-1-il6ik/312a2f0c-7cec-11e5-8032-fa163e7fc268/hawkular-cassandra-1"
I1027 17:08:25.007337       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/metrics/heapster-d0lxc/31735cd2-7cec-11e5-8032-fa163e7fc268/heapster"
I1027 17:08:25.007457       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/default/docker-registry-1-75htv/2f9c7a23-7cec-11e5-8032-fa163e7fc268/registry"
I1027 17:08:25.007654       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/luke/cakephp-mysql-example-1-e3u0g/2faef51b-7cec-11e5-8032-fa163e7fc268/cakephp-mysql-example"
I1027 17:08:25.006819       1 kube_events.go:81] Received new event: api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"docker-registry-1-fldgr.14112722de23749a", GenerateName:"", Namespace:"default", SelfLink:"/api/v1/namespaces/default/events/docker-registry-1-fldgr.14112722de23749a", UID:"95e34558-7cee-11e5-8032-fa163e7fc268", ResourceVersion:"330943", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:63581576786, nsec:0, loc:(*time.Location)(0x13e5b40)}}, DeletionTimestamp:(*unversioned.Time)(0xc20867dea0), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-1-fldgr", UID:"d7d56923-7ce6-11e5-8032-fa163e7fc268", APIVersion:"v1", ResourceVersion:"328772", FieldPath:"implicitly required container POD"}, Reason:"Killing", Message:"Killing with docker id 9a86af736859", Source:api.EventSource{Component:"kubelet", Host:"node2.osv3.example.com"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63581576786, nsec:0, loc:(*time.Location)(0x13e5b40)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63581576786, nsec:0, loc:(*time.Location)(0x13e5b40)}}, Count:1}
I1027 17:08:25.007784       1 kube_events.go:81] Received new event: api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"docker-registry-1-ojhv4.141121a1dd73fa50", GenerateName:"", Namespace:"default", SelfLink:"/api/v1/namespaces/default/events/docker-registry-1-ojhv4.141121a1dd73fa50", UID:"7eae9416-7ce0-11e5-8032-fa163e7fc268", ResourceVersion:"326946", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:63581570734, nsec:0, loc:(*time.Location)(0x13e5b40)}}, DeletionTimestamp:(*unversioned.Time)(0xc20867dfc0), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-1-ojhv4", UID:"7ea9c5f2-7ce0-11e5-8032-fa163e7fc268", APIVersion:"v1", ResourceVersion:"326864", FieldPath:""}, Reason:"FailedScheduling", Message:"no nodes available to schedule pods", Source:api.EventSource{Component:"scheduler", Host:""}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63581570734, nsec:0, loc:(*time.Location)(0x13e5b40)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63581570741, nsec:0, loc:(*time.Location)(0x13e5b40)}}, Count:4}
I1027 17:08:25.007905       1 kube_events.go:81] Received new event: api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"docker-registry-1-ojhv4.141121a561916de5", GenerateName:"", Namespace:"default", SelfLink:"/api/v1/namespaces/default/events/docker-registry-1-ojhv4.141121a561916de5", UID:"87af40d0-7ce0-11e5-8032-fa163e7fc268", ResourceVersion:"326977", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:63581570749, nsec:0, loc:(*time.Location)(0x13e5b40)}}, DeletionTimestamp:(*unversioned.Time)(0xc208666180), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-1-ojhv4", UID:"7ea9c5f2-7ce0-11e5-8032-fa163e7fc268", APIVersion:"v1", ResourceVersion:"326864", FieldPath:""}, Reason:"Scheduled", Message:"Successfully assigned docker-registry-1-ojhv4 to node1.osv3.example.com", Source:api.EventSource{Component:"scheduler", Host:""}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63581570749, nsec:0, loc:(*time.Location)(0x13e5b40)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63581570749, nsec:0, loc:(*time.Location)(0x13e5b40)}}, Count:1}
I1027 17:08:25.008022       1 kube_events.go:81] Received new event: api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"docker-registry-1-ojhv4.141121a56f3d5706", GenerateName:"", Namespace:"default", SelfLink:"/api/v1/namespaces/default/events/docker-registry-1-ojhv4.141121a56f3d5706", UID:"87ec7793-7ce0-11e5-8032-fa163e7fc268", ResourceVersion:"326985", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:63581570749, nsec:0, loc:(*time.Location)(0x13e5b40)}}, DeletionTimestamp:(*unversioned.Time)(0xc2086663c0), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-1-ojhv4", UID:"7ea9c5f2-7ce0-11e5-8032-fa163e7fc268", APIVersion:"v1", ResourceVersion:"326976", FieldPath:"implicitly required container POD"}, Reason:"Pulled", Message:"Container image \"openshift3/ose-pod:v3.0.2.902\" already present on machine", Source:api.EventSource{Component:"kubelet", Host:"node1.osv3.example.com"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63581570749, nsec:0, loc:(*time.Location)(0x13e5b40)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63581570749, nsec:0, loc:(*time.Location)(0x13e5b40)}}, Count:1}
I1027 17:08:25.007163       1 kubelet.go:110] about to query kubelet using url: "https://172.16.4.47:10250/stats/metrics/hawkular-metrics-vrf7d/315f8291-7cec-11e5-8032-fa163e7fc268/hawkular-metrics"
I1027 17:08:25.008377       1 kube_events.go:81] Received new event: api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"docker-registry-1-ojhv4.141121a8a986f5fe", GenerateName:"", Namespace:"default", SelfLink:"/api/v1/namespaces/default/events/docker-registry-1-ojhv4.141121a8a986f5fe", UID:"902fcfcb-7ce0-11e5-8032-fa163e7fc268", ResourceVersion:"327035", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:63581570763, nsec:0, loc:(*time.Location)(0x13e5b40)}}, DeletionTimestamp:(*unversioned.Time)(0xc208666600), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-1-ojhv4", UID:"7ea9c5f2-7ce0-11e5-8032-fa163e7fc268", APIVersion:"v1", ResourceVersion:"326976", FieldPath:"implicitly required container POD"}, Reason:"Created", Message:"Created with docker id 29c7dc064760", Source:api.EventSource{Component:"kubelet", Host:"node1.osv3.example.com"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63581570763, nsec:0, loc:(*time.Location)(0x13e5b40)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63581570763, nsec:0, loc:(*time.Location)(0x13e5b40)}}, Count:1}
...

The log gets huge pretty quickly.

UnknownHostException for service 'hawkular-cassandra-nodes'

Running metrics-deployer template like this:

oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular.apps.variantweb.net,USE_PERSISTENT_STORAGE=false | oc create -f -

Pod hawkular-cassandra-1-ys1sm fails with:

UnknownHostException for service 'hawkular-cassandra-nodes'. It may not be up yet. Trying again

End state looks like this:

# oc get all
CONTROLLER                   CONTAINER(S)                   IMAGE(S)                                                     SELECTOR                              REPLICAS                  AGE
hawkular-cassandra-1         hawkular-cassandra-1           docker.io/openshift/origin-metrics-cassandra:latest          name=hawkular-cassandra-1             1                         10m
hawkular-metrics             hawkular-metrics               docker.io/openshift/origin-metrics-hawkular-metrics:latest   name=hawkular-metrics                 1                         10m
heapster                     heapster                       docker.io/openshift/origin-metrics-heapster:latest           name=heapster                         1                         10m
NAME                         HOST/PORT                      PATH                                                         SERVICE                               LABELS                    INSECURE POLICY   TLS TERMINATION
hawkular-metrics             hawkular.apps.variantweb.net                                                                hawkular-metrics                      metrics-infra=support                       passthrough
NAME                         CLUSTER_IP                     EXTERNAL_IP                                                  PORT(S)                               SELECTOR                  AGE
hawkular-cassandra           172.30.56.107                  <none>                                                       9042/TCP,9160/TCP,7000/TCP,7001/TCP   type=hawkular-cassandra   10m
hawkular-cassandra-nodes     None                           <none>                                                       9042/TCP,9160/TCP,7000/TCP,7001/TCP   type=hawkular-cassandra   10m
hawkular-metrics             172.30.101.166                 <none>                                                       443/TCP                               name=hawkular-metrics     10m
heapster                     172.30.89.20                   <none>                                                       80/TCP                                name=heapster             10m
NAME                         READY                          STATUS                                                       RESTARTS                              AGE
hawkular-cassandra-1-ys1sm   0/1                            Error                                                        0                                     10m
hawkular-metrics-p6f5l       0/1                            Pending                                                      0                                     10m
heapster-i6xcb               0/1                            Pending                                                      0                                     10m

Of note, the hawkular-cassandra-nodes has no cluster IP, as per the portalIp: None in the service definition.

Should CASSANDRA_NODES_SERVICE_NAME="hawkular-cassandra" in the cassandra/Dockerfile?

Hawkular Metric logs are not going to the console

It looks like the logs are being written to the log file for Hawkular Metrics and not outputted to the console. This means that 'oc logs' and 'docker logs' will not show the proper logs when used.

Create diagnostic tool

The deployment and installation can be at times a bit tricky. We should provide some sort of mechanism above the Hawkular Metrics /status and Heapster's /validate endpoints for checking that things are properly installed and running correctly.

When https://issues.jboss.org/browse/HWKMETRICS-325 is completed, it may also be something we need to document or expose for people to use as verification.

tenant by time

Hello @mwringe
Could you please provide me info on what tenant by time table in cassandra represent? Can this be used somehow to get the uptime of a project and then be used to compute % cpu usage at namespace level?

Regards,
Shilpa

Please provide enterprise variants of the deployer template

Currently we pull the deployer template into openshift-ansible and the sync script there copies the origin content for enterprise. It'd be helpful if an enterprise deployer template were maintained in your repo so that the productization team no longer needs to carefully review the diffs each time. Right now the diff is just the IMAGE_PREFIX and IMAGE_VERSION which are set to 'registry.access.redhat.com/openshift3/' and '3.1.1' respectively for the enterprise version but perhaps there are other differences that would be better maintained by the subject matter experts rather than the installer folks.

See https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_examples/examples-sync.sh#L42-L43

Heapster image version in dockerfile should keep consistent with binary version

https://github.com/openshift/origin-metrics/blob/master/heapster/Dockerfile#L20
burmanm/hawkular-heapster:0.18.3 but the binary is 0.18.2. it makes people confuse. should keep consistent

[fedora@ip-172-18-1-128 sample-app]$ docker pull burmanm/hawkular-heapster:0.18.3
Trying to pull repository docker.io/burmanm/hawkular-heapster ... 0.18.3: Pulling from burmanm/hawkular-heapster
31f630c65071: Already exists 
64f8b1fd257d: Already exists 
fb393df2a950: Already exists 
66bcbc94f540: Already exists 
8dcc64a5cefd: Already exists 
d2bf4d4e1966: Already exists 
72e059bc253c: Already exists 
Digest: sha256:25f9d6428853eee2b5f894f018f1055dabbe8dd7de06c170f62390799995bb71
Status: Image is up to date for docker.io/burmanm/hawkular-heapster:0.18.3

[fedora@ip-172-18-1-128 sample-app]$ docker run -it burmanm/hawkular-heapster:0.18.3 /bin/bash
I1117 02:12:06.152959       1 heapster.go:60] /heapster /bin/bash
I1117 02:12:06.153110       1 heapster.go:61] Heapster version 0.18.2
I1117 02:12:06.156372       1 heapster.go:71] Starting heapster on port 8082

test cleanup runs regardless of --continue option

# oc version
oc v3.1.1.6-16-g5327e56
kubernetes v1.1.0-origin-1107-g4c8e6f4

# ./e2e-tests.sh --continue --debug --skipBuild
...
[DEBUG] The current status of the deployer:Completed
[INFO] The deployer was deployed in 110 seconds.

[INFO] ================================================================================
[INFO] Checking the deployment of Cassandra. Please wait.



[ERROR] Cassandra took longer than the timeout of 360 seconds
[INFO]
[ERROR] ================================================================================
[ERROR] Test failed
[ERROR] ================================================================================
[INFO]
[INFO] Performing test shutdown
[INFO] 

================================================================================
[INFO] Teardown took 1 seconds
[INFO] Exiting. The tests took took 484 seconds
[INFO]
[INFO] Deleting test project test-1455901202
[INFO]
[INFO] The tests took 489 seconds
[INFO]
[ERROR] Test failed

[INFO] Exiting. Origin-Metrics tests took took 489 seconds

Create Base Heapster Image

We need to create a base heapster image which will be solely responsible for building specific versions of heapster that we require on a centos base.

Self signed certificate issues

There are a few confusing issues when accessing Hawkular Metrics over https in some situations.

Chrome will just display an strange error message that the site is not available.

Firefox will display an error about the serial number in the certificate being already used from the same CA.

In either case, there is no way to easily by pass this issue and you are basically stuck without access to the site.

The culprit here is that we have in the past accepted a certificate signed with a CA named 'metrics-signer' and now we are trying to access a site signed with a CA also named 'metrics-signer' (even though the CA certificates between the two are different).

This should be an easy fix and we should do what OpenShift does which is to append a timestamp to the signer name.

Alerts in Hawkular

Hi, I'm wondering if it is possible to create and acces to hawkular alerts in openshift.
I'm trying this request but is not working

 curl --insecure --silent --header "Authorization: Bearer $BEARER" --header "Hawkular-Tenant: _system" --request GET https://hawkular-metrics.example.com/hawkular/alerts

The response is a HTTP Status 404 - /hawkular/alerts The requested resource is not available.

Example: How to retrieve metrics?

Hi,

could you please give me (or link me to) an example how to read out gathered metrics. I'd assume I could use the REST API of Hawkular? But I'm always getting 403's. So some guidance would be great.
Thnx.
Markus

Heapster logging filtering "failed" messages

With the following commit:

6aae6d0

Heapster log entries such as the one below are filtered, but should not be:

I1110 15:43:20.067253 1 kubelet.go:96] failed to get stats from kubelet url: https://:10250/stats/viet/cakephp-example-1-build/c531725d-87d9-11e5-80d5-001a4a10176a/sti-build - request failed - "404 Not Found", response: "no matching container\n"
I1110 15:43:20.067300 1 kube_pods.go:110] failed to get stats for container "sti-build" in pod "..."/"cakephp-example-1-build"

CA serial number conflicts possible

This line is problematic:

echo 03 > $dir/ca.serial.txt # otherwise openssl chokes on the file
  1. It generates certificates starting with serial number 3, which will conflict with other certificates signed early on by the master CA. If a browser (or possibly other clients) ever hit a serving certificate with a conflicting serial from the same CA, it will disallow the certificate.
  2. If modifies a local copy of the serial.txt, which means the serial numbers it signs certs for could (later) be reused by the master CA.

Is there a way to avoid injecting the CA and having the deployer sign certs locally, which the parent serial.txt file will have no record of?

hawkular-cassandra in CrashLoopBackOff

With the latest image I get hawkular-cassandra in CrashLoopBackOff :

About to generate seeds
Trying to access the Seed list [try #1]
Trying to access the Seed list [try #2]
Trying to access the Seed list [try #3]
Setting seeds to be hawkular-cassandra-1-ov80w
cat: /etc/ld.so.conf.d/*.conf: No such file or directory
CompilerOracle: inline org/apache/cassandra/db/AbstractNativeCell.compareTo (Lorg/apache/cassandra/db/composites/Composite;)I
CompilerOracle: inline org/apache/cassandra/db/composites/AbstractSimpleCellNameType.compareUnsigned (Lorg/apache/cassandra/db/composites/Composite;Lorg/apache/cassandra/db/composites/Composite;)I
CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;II)I
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
INFO 14:37:49 Loading settings from file:/opt/apache-cassandra-2.2.4/conf/cassandra.yaml
INFO 14:37:49 Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=; cluster_name=hawkular-metrics; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_directory=/cassandra_data/commitlog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/cassandra_data/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=hawkular-cassandra-1-ov80w; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=hawkular-cassandra-1-ov80w; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; seed_provider=[{class_name=org.hawkular.openshift.cassandra.OpenshiftSeedProvider, parameters=[{seeds=hawkular-cassandra-1-ov80w}]}]; server_encryption_options=; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; write_request_timeout_in_ms=2000]
INFO 14:37:49 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO 14:37:50 Global memtable on-heap threshold is enabled at 125MB
INFO 14:37:50 Global memtable off-heap threshold is enabled at 125MB
WARN 14:37:50 Small commitlog volume detected at /cassandra_data/commitlog; setting commitlog_total_space_in_mb to 3837. You can override this in cassandra.yaml
WARN 14:37:50 Only 11043 MB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Exception (org.apache.cassandra.exceptions.ConfigurationException) encountered during startup: org.hawkular.openshift.cassandra.OpenshiftSeedProvider
Fatal configuration error; unable to start server. See log for stacktrace.
org.hawkular.openshift.cassandra.OpenshiftSeedProvider
Fatal configuration error; unable to start server. See log for stacktrace.
ERROR 14:37:50 Exception encountered during startup: org.hawkular.openshift.cassandra.OpenshiftSeedProvider
Fatal configuration error; unable to start server. See log for stacktrace.

It was working correctly before. Have you got an idea ?

Endpoint get gauges

Hello,

Hawkular is running on openshift without any errors and I would like to get some metrics data using the hawkular rest api

I want to eventually get the cpu and/or memory usage for a given tenant

I tried to first list the metrics definitions but it gives me 405 for gauges.

curl -H "Authorization: Bearer 0bHEwTcpnwg9RqNPILTvac8FBCuO7PE6Kz1rkWRSgBw " -H "Hawkular-tenant: ametrics" -H "Accept: application/json" -X GET https://172.30.231.219/hawkular/metrics/gauges/ --insecure
{"errorMsg":"RESTEASY001545: No resource method found for GET, return 405 with Allow header"}

I then tried asking for gauges/data with a bucketDuration of 1 mn and then it gave me the following error:
{"errorMsg":"Either metrics or tags parameter must be used"}

How can I find the metrics name and/or tag parameter for gauge data?

It would be very helpful if you could kindly provide an example for getting cpu/memory usage for a tenant.

Thanks,
Shilpa

delete command doesn't delete SA, templates

the metrics deployer related secrets also have no labels:

Name:       metrics-deployer
Namespace:  metrics
Labels:     <none>
Annotations:    <none>

Type:   Opaque

Data
====
nothing:    0 bytes


Name:       metrics-deployer-dockercfg-z3rdf
Namespace:  metrics
Labels:     <none>
Annotations:    kubernetes.io/service-account.name=metrics-deployer,kubernetes.io/service-account.uid=99d8d499-8ccf-11e5-adc0-525400b33d1d,openshift.io/token-secret.name=metrics-deployer-token-bb1ww

Type:   kubernetes.io/dockercfg

Data
====
.dockercfg: {"172.30.250.181:5000":{"username":"serviceaccount","password":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtZXRyaWNzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1ldHJpY3MtZGVwbG95ZXItdG9rZW4tYmIxd3ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibWV0cmljcy1kZXBsb3llciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6Ijk5ZDhkNDk5LThjY2YtMTFlNS1hZGMwLTUyNTQwMGIzM2QxZCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDptZXRyaWNzOm1ldHJpY3MtZGVwbG95ZXIifQ.XUM5iGihVHc3NpaFY73sBpUO-Hqa9BXyuE9cP3l7IGwlsDFlYCIOjqhnjny8-_J6KyjY5CV8fHjR5SaU05dUXLNPD0Ihs0tryD4yuW1GjxdISsuHvhuJRoub2tL-eIgoidOK0IMPA8C9tePYjWyh0H_elpMDm4Ys5WlC-0U13QLpVXA05B1Nc-THDU4EmLyJt0FGpsDL17763mnebCjCffgoIhNLz8NZ7GvA1gLnirslq0e4eEn7n-AaztC7CbcvqBBW3tQ39sYCQixFxfnX3m4xWppWWxVofsGyx78Vz8_WWMaTgB46_7dCsXi-5SfIqn21tzKclFSuNV6SCHbjcw","email":"[email protected]","auth":"c2VydmljZWFjY291bnQ6ZXlKaGJHY2lPaUpTVXpJMU5pSXNJblI1Y0NJNklrcFhWQ0o5LmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUwSWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXVZVzFsYzNCaFkyVWlPaUp0WlhSeWFXTnpJaXdpYTNWaVpYSnVaWFJsY3k1cGJ5OXpaWEoyYVdObFlXTmpiM1Z1ZEM5elpXTnlaWFF1Ym1GdFpTSTZJbTFsZEhKcFkzTXRaR1Z3Ykc5NVpYSXRkRzlyWlc0dFltSXhkM2NpTENKcmRXSmxjbTVsZEdWekxtbHZMM05sY25acFkyVmhZMk52ZFc1MEwzTmxjblpwWTJVdFlXTmpiM1Z1ZEM1dVlXMWxJam9pYldWMGNtbGpjeTFrWlhCc2IzbGxjaUlzSW10MVltVnlibVYwWlhNdWFXOHZjMlZ5ZG1salpXRmpZMjkxYm5RdmMyVnlkbWxqWlMxaFkyTnZkVzUwTG5WcFpDSTZJams1WkRoa05EazVMVGhqWTJZdE1URmxOUzFoWkdNd0xUVXlOVFF3TUdJek0yUXhaQ0lzSW5OMVlpSTZJbk41YzNSbGJUcHpaWEoyYVdObFlXTmpiM1Z1ZERwdFpYUnlhV056T20xbGRISnBZM010WkdWd2JHOTVaWElpZlEuWFVNNWlHaWhWSGMzTnBhRlk3M3NCcFVPLUhxYTlCWHl1RTljUDNsN0lHd2xzREZsWUNJT2pxaG5qbnk4LV9KNkt5alk1Q1Y4ZkhqUjVTYVUwNWRVWExOUEQwSWhzMHRyeUQ0eXVXMUdqeGRJU3N1SHZodUpSb3ViMnRMLWVJZ29pZE9LMElNUEE4Qzl0ZVBZald5aDBIX2VscE1EbTRZczVXbEMtMFUxM1FMcFZYQTA1QjFOYy1USERVNEVtTHlKdDBGR3BzREwxNzc2M21uZWJDakNmZmdvSWhOTHo4Tlo3R3ZBMWdMbmlyc2xxMGU0ZUVuN24tQWF6dEM3Q2JjdnFCQlczdFEzOXNZQ1FpeEZ4Zm5YM200eFdwcFdXeFZvZnNHeXg3OFZ6OF9XV01hVGdCNDZfN2RDc1hpLTVTZklxbjIxdHpLY2xGU3VOVjZTQ0hiamN3"}}


Name:       metrics-deployer-token-bb1ww
Namespace:  metrics
Labels:     <none>
Annotations:    kubernetes.io/service-account.name=metrics-deployer,kubernetes.io/service-account.uid=99d8d499-8ccf-11e5-adc0-525400b33d1d

Type:   kubernetes.io/service-account-token

Data
====
token:  eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtZXRyaWNzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1ldHJpY3MtZGVwbG95ZXItdG9rZW4tYmIxd3ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibWV0cmljcy1kZXBsb3llciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6Ijk5ZDhkNDk5LThjY2YtMTFlNS1hZGMwLTUyNTQwMGIzM2QxZCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDptZXRyaWNzOm1ldHJpY3MtZGVwbG95ZXIifQ.XUM5iGihVHc3NpaFY73sBpUO-Hqa9BXyuE9cP3l7IGwlsDFlYCIOjqhnjny8-_J6KyjY5CV8fHjR5SaU05dUXLNPD0Ihs0tryD4yuW1GjxdISsuHvhuJRoub2tL-eIgoidOK0IMPA8C9tePYjWyh0H_elpMDm4Ys5WlC-0U13QLpVXA05B1Nc-THDU4EmLyJt0FGpsDL17763mnebCjCffgoIhNLz8NZ7GvA1gLnirslq0e4eEn7n-AaztC7CbcvqBBW3tQ39sYCQixFxfnX3m4xWppWWxVofsGyx78Vz8_WWMaTgB46_7dCsXi-5SfIqn21tzKclFSuNV6SCHbjcw
ca.crt: 1066 bytes


Name:       metrics-deployer-token-yv7j0
Namespace:  metrics
Labels:     <none>
Annotations:    kubernetes.io/service-account.name=metrics-deployer,kubernetes.io/service-account.uid=99d8d499-8ccf-11e5-adc0-525400b33d1d

Type:   kubernetes.io/service-account-token

Data
====
ca.crt: 1066 bytes
token:  eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtZXRyaWNzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1ldHJpY3MtZGVwbG95ZXItdG9rZW4teXY3ajAiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibWV0cmljcy1kZXBsb3llciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6Ijk5ZDhkNDk5LThjY2YtMTFlNS1hZGMwLTUyNTQwMGIzM2QxZCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDptZXRyaWNzOm1ldHJpY3MtZGVwbG95ZXIifQ.nkThoQip8O137ts_dmVqjc55QYr1lVbMf9SVteLZz1PodC_XYcuim1OINmINSNkwEVBIh6GiIdwV16u-4rsn3dAZbpjEnN2LXuW6h6i434D5JZOlfMhX3jqwrSs_C1Kbtmrov3Ph8RVB1Pq6ICXRFR6fZi3MnX4Lv7IYWEbW3VvhhaA6w0JfNjJFcMzqQ24t6pA6EXo6ZSdxyeCrr8DsbP0p4-AIs9558gllXVU7FAbid3XvuWyX3V9aNN1ByT4cirqjOtOUDFv-jhAqPt_xIbWG1P9GBQIiGiYlcGZle2e2ElizQMvDFC4veL8p1GcASUFPkJQwuRWlszWieNZH8A

Make tput optional

I was unable to run e2e test from jenkins -> ansible

stderr: tput: No value for $TERM and no -T specified

to get the test going I had to explicitly add a TERM env var

export TERM=xterm; ./e2e-tests.sh --debug --timeout=420 --skipBuild=true

The log is prefixed with special characters

(B�[m[INFO] Starting Origin-Metric end-to-end test�(B�[m
�(B�[m[INFO] �(B�[m
�(B�[m[INFO] Settings:�(B�[m
�(B�[m[INFO] Base Directory: /tmp/origin-metrics�(B�[m
�(B�[m[INFO] ================================================================================�(B�[m
�(B�[m[INFO] �(B�[m
�(B�[m[INFO] �(B�[m
�(B�[m[INFO] Building new images�(B�[m

Document issue with custom certificates

Currently is someone uses a custom certificate it may not work properly out of the box.

This is due to the system expecting the certificate being valid for both the external and internal hostnames.

We need to at least update the documentation to make it clear this is a requirement, at least until kubernetes-retired/heapster#771 is resolved.

stacked does not provide sum

Hello,
As suggested in docs-- stacked: statistics are first computed for every matching metric and then those statistics are stacked (added), I should get sum of memory usage if I am using only a descriptor memory usage for a tenant.
But as you can see in my example below, the only change i see is in the number of samples

[shilpa@vmapgmstapp2n1 ~]$ curl -H "Authorization: Bearer NLtkT4-4og6BZEauh4cFKX1dxyX77vG-TrLc1GwdRI4" -H "Hawkular-tenant: scalingsample" -H "Accept: application/json" -X GET https://172.30.231.219/hawkular/metrics/gauges/data?'tags=descriptor_name:memory/usage&bucketDuration=8h&stacked=true' --insecure | python -m json.tool

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 192 0 192 0 0 741 0 --:--:-- --:--:-- --:--:-- 741

[
{
"avg": 403436606.6938775,
"empty": false,
"end": 1447322513238,
"max": 457138176.0,
"median": 398397415.5333509,
"min": 361869312.0,
"percentile95th": 448614400.0,
"samples": 1,
"start": 1447293713238
}
]
[shilpa@vmapgmstapp2n1 ~]$ curl -H "Authorization: Bearer NLtkT4-4og6BZEauh4cFKX1dxyX77vG-TrLc1GwdRI4" -H "Hawkular-tenant: scalingsample" -H "Accept: application/json" -X GET https://172.30.231.219/hawkular/metrics/gauges/data?'tags=descriptor_name:memory/usage&bucketDuration=8h&stacked=false' --insecure | python -m json.tool

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 190 0 190 0 0 1042 0 --:--:-- --:--:-- --:--:-- 1043

[
{
"avg": 404510638.08,
"empty": false,
"end": 1447322544621,
"max": 457138176.0,
"median": 394453166.6205565,
"min": 361869312.0,
"percentile95th": 452808704.0,
"samples": 50,
"start": 1447293744621
}
]

can not get cpu usage from hawkular API

Hi,

I need to get cpu usage for a pod using hawkular API provided. I have tried below but could not understand the ouput. How should I read the Cpu or memory usage for a pod.

Note: memory/usage does not return any data here, why ?

any help on this would be appreciated

curl -H "Authorization: Bearer oxG3b4nOlTFPZ6Z1wxPtf_xPoe7Av5js4imU4jcxxdw" -H "Hawkular-tenant: test" -X GET https://hawkular-host/hawkular/metrics/counters/data?tags=descriptor_name:cpu/usage,pod_name:myapp-1-2dku9&stacked=true&buckets=3&start=`date -d -10minutes +%s%3N` --insecure | python -m json.tool

output

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 599 0 599 0 0 5703 0 --:--:-- --:--:-- --:--:-- 5759
[
{
"avg": 384177728.43243253,
"empty": false,
"end": 1452681612365,
"max": 386093564.0,
"median": 384191688.7601133,
"min": 382108657.0,
"percentile95th": 385785471.0322342,
"samples": 1,
"start": 1452681412331
},
{
"avg": 377499125.75,
"empty": false,
"end": 1452681412331,
"max": 381990129.0,
"median": 376540915.0343114,
"min": 373719674.0,
"percentile95th": 381664582.3546182,
"samples": 1,
"start": 1452681212297
},
{
"avg": 371449489.57500005,
"empty": false,
"end": 1452681212297,
"max": 373608416.0,
"median": 371337044.7875711,
"min": 369222075.0,
"percentile95th": 373280499.01336825,
"samples": 1,
"start": 1452681012263
}
]

Changing heapster start arguments

Hello,
I want to change heapster start arguments and hence I made a change in
origin-metrics/deployer/templates/heapster.yaml on my machine and made it use auth insecure

"--source=kubernetes:${MASTER_URL}?auth=&insecure=true&useServiceAccount=true&kubeletPort=10255"

In my node-config.yaml I have

kubeletArguments:
"read-only-port":
- "10255"

However when I checked the heapster logs, it still shows me the old arguments at start

Starting Heapster with the following arguments: --source=kubernetes:https://mastername:8443?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250

Could you kindly suggest what I am missing here?

Thanks,
Shilpa

Heapster doesn't update metrics in hawkular sink

I've deployed through metrics-deployer template heapster and hawkular. It creates 3 pods with a running state and no error in logs. However, in the OpenShift web console, graphics from metrics are empty. If I access to heapster via API Proxy I'm able to see metrics, but when I access to hawkular the information is not updated.

heapster network metrics

Hi,

are network metrics available at project or pod level?? I tried getting network metrics for a pod but it returns:

No JSON object could be decoded

Thanks,
Yash

heapster keeps restarting and throws null pointer exceptions

heapster-error.txt
Hello @mwringe ,
Due to the service account issue we were facing on enterprise version openshift 3.0.2, we reinstalled the complete metrics stack on 3.1. The pods get into a running state. We are able to see metrics in UI.
But now heapster keeps restarting all the time.
When I checked the docker logs, it shows null pointer exception from heapster.
PFA heapster logs.

Could you please look into this issue?

Thanks,
Shilpa

origin-metrics-cassandra does not run off the shelf

Hi all,

running the docker image docker.io/openshift/origin-metrics-cassandra is facing into problems.

Any idea how to fix this issue to use origin-metrics-cassandra standalone?

Thanks, Volker

Errorlog:

sudo docker run -i -t openshift/origin-metrics-cassandra
[sudo] password for volker: 
CompilerOracle: inline org/apache/cassandra/db/AbstractNativeCell.compareTo (Lorg/apache/cassandra/db/composites/Composite;)I
CompilerOracle: inline org/apache/cassandra/db/composites/AbstractSimpleCellNameType.compareUnsigned (Lorg/apache/cassandra/db/composites/Composite;Lorg/apache/cassandra/db/composites/Composite;)I
CompilerOracle: inline org/apache/cassandra/io/util/Memory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/io/util/SafeMemory.checkBounds (JJ)V
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.selectBoundary (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;II)I
CompilerOracle: inline org/apache/cassandra/utils/AsymmetricOrdering.strictnessOfLessThan (Lorg/apache/cassandra/utils/AsymmetricOrdering/Op;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare (Ljava/nio/ByteBuffer;[B)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compare ([BLjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/ByteBufferUtil.compareUnsigned (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/lang/Object;JI)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/lang/Object;JILjava/nio/ByteBuffer;)I
CompilerOracle: inline org/apache/cassandra/utils/FastByteOperations$UnsafeOperations.compareTo (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
INFO  16:38:19 Loading settings from file:/opt/apache-cassandra-2.2.1/conf/cassandra.yaml
INFO  16:38:19 Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=<REDACTED>; cluster_name=hawkular-metrics; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_directory=/opt/apache-cassandra/data/commitlog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; compaction_throughput_mb_per_sec=16; concurrent_counter_writes=32; concurrent_reads=32; concurrent_writes=32; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/opt/apache-cassandra/data/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=fca9830b7e0d; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; memtable_allocation_type=heap_buffers; native_transport_port=9042; num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=fca9830b7e0d; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; seed_provider=[{class_name=org.hawkular.openshift.cassandra.OpenshiftSeedProvider, parameters=[{seeds=fca9830b7e0d}]}]; server_encryption_options=<REDACTED>; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; write_request_timeout_in_ms=2000]
Exception (org.apache.cassandra.exceptions.ConfigurationException) encountered during startup: Invalid yaml: file:/opt/apache-cassandra-2.2.1/conf/cassandra.yaml
org.apache.cassandra.exceptions.ConfigurationException: Invalid yaml: file:/opt/apache-cassandra-2.2.1/conf/cassandra.yaml
    at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:120)
    at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:85)
    at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:127)
    at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:111)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:483)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595)
Caused by: Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=server_encryption_options for JavaBean=org.apache.cassandra.config.Config@4f83df68; Cannot create property=internode_encryption for JavaBean=org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions@55183b20; Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption
 in 'reader', line 10, column 1:
    cluster_name: 'hawkular-metrics'
    ^

    at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:333)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
    at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
    at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:481)
    at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:475)
    at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:113)
    ... 5 more
Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create property=server_encryption_options for JavaBean=org.apache.cassandra.config.Config@4f83df68; Cannot create property=internode_encryption for JavaBean=org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions@55183b20; Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:299)
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:189)
    at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:331)
    ... 11 more
Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create property=internode_encryption for JavaBean=org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions@55183b20; Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:299)
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:189)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:296)
    ... 13 more
Caused by: org.yaml.snakeyaml.error.YAMLException: Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption
    at org.yaml.snakeyaml.constructor.Constructor$ConstructScalar.constructStandardJavaInstance(Constructor.java:477)
    at org.yaml.snakeyaml.constructor.Constructor$ConstructScalar.construct(Constructor.java:365)
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:296)
    ... 16 more
ERROR 16:38:19 Exception encountered during startup
org.apache.cassandra.exceptions.ConfigurationException: Invalid yaml: file:/opt/apache-cassandra-2.2.1/conf/cassandra.yaml
    at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:120) ~[apache-cassandra-2.2.1.jar:2.2.1]
    at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:85) ~[apache-cassandra-2.2.1.jar:2.2.1]
    at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:127) ~[apache-cassandra-2.2.1.jar:2.2.1]
    at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:111) ~[apache-cassandra-2.2.1.jar:2.2.1]
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:483) [apache-cassandra-2.2.1.jar:2.2.1]
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595) [apache-cassandra-2.2.1.jar:2.2.1]
Caused by: org.yaml.snakeyaml.constructor.ConstructorException: null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=server_encryption_options for JavaBean=org.apache.cassandra.config.Config@4f83df68; Cannot create property=internode_encryption for JavaBean=org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions@55183b20; Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption;  in 'reader', line 10, column 1:
    cluster_name: 'hawkular-metrics'
    ^
    at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:333) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:481) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:475) ~[snakeyaml-1.11.jar:na]
    at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:113) ~[apache-cassandra-2.2.1.jar:2.2.1]
    ... 5 common frames omitted
Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create property=server_encryption_options for JavaBean=org.apache.cassandra.config.Config@4f83df68; Cannot create property=internode_encryption for JavaBean=org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions@55183b20; Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:299) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:189) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:331) ~[snakeyaml-1.11.jar:na]
    ... 11 common frames omitted
Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create property=internode_encryption for JavaBean=org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions@55183b20; Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:299) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:189) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:296) ~[snakeyaml-1.11.jar:na]
    ... 13 common frames omitted
Caused by: org.yaml.snakeyaml.error.YAMLException: Unable to find enum value '/none' for enum class: org.apache.cassandra.config.EncryptionOptions$ServerEncryptionOptions$InternodeEncryption
    at org.yaml.snakeyaml.constructor.Constructor$ConstructScalar.constructStandardJavaInstance(Constructor.java:477) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructScalar.construct(Constructor.java:365) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182) ~[snakeyaml-1.11.jar:na]
    at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.constructJavaBean2ndStep(Constructor.java:296) ~[snakeyaml-1.11.jar:na]
    ... 16 common frames omitted
[volker@ose cassandra-hawkular]$ sudo docker run -i -t openshift/origin-metrics-cassandra

Create Standalone Heapster Option

For the HPA, we should have a standalone heapster option.

This could mean either having a standalone heapster template, or a parameter sent to the deployer to only deploy heapster and not hawkular metrics and cassandra.

Met "Could not connect to Cassandra cluster" in hawkular-metrics Pod

According to docs https://github.com/openshift/origin-metrics/blob/master/README.md , tried it against AWS instance devenv_fedora_2444.

[chunchen@F17-CCY daily]$ oc get po
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-omjzh 1/1 Running 0 17m
hawkular-metrics-7ovec 1/1 Running 0 17m
heapster-8w0l6 1/1 Running 1 17m

[chunchen@F17-CCY origin-metrics]$ oc logs hawkular-metrics-7ovec
<----------------------------------------------------snip------------------------------------------------------>
05:07:39,615 INFO [org.jboss.weld.deployer](MSC service thread 1-4) JBAS016002: Processing weld deployment hawkular-metrics-api-jaxrs.war
05:07:39,724 INFO [org.jboss.weld.deployer](MSC service thread 1-3) JBAS016005: Starting Services for CDI deployment: hawkular-metrics-api-jaxrs.war
05:07:39,790 INFO [org.jboss.weld.Version](MSC service thread 1-3) WELD-000900 1.1.29 (redhat)
05:07:39,816 INFO [org.jboss.weld.deployer](MSC service thread 1-4) JBAS016008: Starting weld service for deployment hawkular-metrics-api-jaxrs.war
05:07:41,189 INFO org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle HAWKMETRICS200002: Initializing metrics service
05:07:41,260 INFO [org.jboss.web](ServerService Thread Pool -- 50) JBAS018210: Register web context: /hawkular/metrics
05:07:41,645 INFO [org.jboss.as.server](ServerService Thread Pool -- 28) JBAS015859: Deployed "hawkular-metrics-api-jaxrs.war" (runtime-name : "hawkular-metrics-api-jaxrs.war")
05:07:41,647 INFO [org.jboss.as.server](ServerService Thread Pool -- 28) JBAS015859: Deployed "activemq-rar.rar" (runtime-name : "activemq-rar.rar")
05:07:41,747 INFO [org.jboss.as](Controller Boot Thread) JBAS015961: Http management interface listening on http://127.0.0.1:9990/management
05:07:41,752 INFO [org.jboss.as](Controller Boot Thread) JBAS015951: Admin console listening on http://127.0.0.1:9990
05:07:41,752 INFO [org.jboss.as](Controller Boot Thread) JBAS015874: JBoss EAP 6.4.1.GA (AS 7.5.1.Final-redhat-3) started in 9927ms - Started 267 of 303 services (62 services are lazy, passive or on-demand)
05:07:41,826 INFO com.datastax.driver.core.NettyUtil Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
05:07:41,943 INFO org.jboss.resteasy.cdi.i18n RESTEASY006050: Found BeanManager at java:comp/BeanManager
05:07:42,188 INFO org.hibernate.validator.internal.util.Version HV000001: Hibernate Validator 4.3.2.Final-redhat-2
05:07:42,421 WARN org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.252.165:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.252.165:9042] Channel has been closed))
05:07:42,418 WARN io.netty.util.concurrent.DefaultPromise An exception was thrown by com.datastax.driver.core.Connection$9.operationComplete(): java.util.concurrent.RejectedExecutionException: Task com.datastax.driver.core.Connection$9$1@48557a04 rejected from java.util.concurrent.ThreadPoolExecutor@706d85fa[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) [rt.jar:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) [rt.jar:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) [rt.jar:1.8.0_45]
at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:484) [guava-16.0.1.jar:]
at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:566) [cassandra-driver-core-2.2.0-rc2.jar:]
at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:542) [cassandra-driver-core-2.2.0-rc2.jar:]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.PendingWriteQueue.safeFail(PendingWriteQueue.java:252) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.PendingWriteQueue.removeAndFailAll(PendingWriteQueue.java:112) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1256) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:676) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final.jar:4.0.27.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_45]
<----------------------------------------------------snip------------------------------------------------------>
05:08:18,906 INFO org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle HAWKMETRICS200002: Initializing metrics service
05:08:27,668 INFO com.datastax.driver.core.policies.DCAwareRoundRobinPolicy Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
05:08:27,669 INFO com.datastax.driver.core.Cluster New Cassandra host hawkular-cassandra/172.30.252.165:9042 added
05:08:27,851 INFO org.hawkular.metrics.schema.SchemaManager HAWKMETRICS300002: Creating schema for keyspace hawkular_metrics
05:08:36,623 INFO org.hawkular.metrics.core.impl.MetricsServiceImpl HAWKMETRICS100001: Using a key space of 'hawkular_metrics'
05:08:36,915 INFO org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle HAWKMETRICS200005: Metrics service started

Can anyone help to have a look?
Thanks!

Building heapster-base fails.

Building the heapster-base image fails on "go get -d k8s.io/heapster"

[root@gitlab orig]# git clone https://github.com/openshift/origin-metrics.git
Cloning into 'origin-metrics'...
remote: Counting objects: 263, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 263 (delta 5), reused 0 (delta 0), pack-reused 246
Receiving objects: 100% (263/263), 229.12 KiB | 0 bytes/s, done.
Resolving deltas: 100% (135/135), done.
[root@gitlab orig]# cd origin-metrics/
[root@gitlab origin-metrics]# docker build heapster-base
Sending build context to Docker daemon 3.072 kB
Step 0 : FROM centos:centos7
---> ce20c473cd8a
Step 1 : MAINTAINER Hawkular Metrics [email protected]
---> Using cache
---> 2931e709de94
Step 2 : EXPOSE 8082
---> Using cache
---> 411ea865a710
Step 3 : ENV HEAPSTER_COMMIT af4752e
---> Running in 6c6b38d6a564
---> 5377cdf18dbd
Removing intermediate container 6c6b38d6a564
Step 4 : RUN yum install -y -q go git wget make && yum clean all && export GOPATH=/tmp/gopath && export PATH=$PATH:$GOPATH/bin && mkdir -p $GOPATH && cd $GOPATH && go get -d k8s.io/heapster && cd src/k8s.io/heapster && git checkout $HEAPSTER_COMMIT && make && cp heapster /opt && rm -rf $GOPATH && yum remove -y -q go git wget make
---> Running in e21912e46003
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
warning: /var/cache/yum/x86_64/7/base/packages/fipscheck-1.4.1-5.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Public key for fipscheck-1.4.1-5.el7.x86_64.rpm is not installed
Public key for glibc-common-2.17-106.el7_2.1.x86_64.rpm is not installed
Importing GPG key 0xF4A80EB5:
Userid : "CentOS-7 Key (CentOS 7 Official Signing Key) [email protected]"
Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5
Package : centos-release-7-1.1503.el7.centos.2.8.x86_64 (@CentOS)
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
install-info: No such file or directory for /usr/share/info/wget.info.gz
Loaded plugins: fastestmirror
Cleaning repos: base extras systemdcontainer updates
Cleaning up everything
Cleaning up list of fastest mirrors
package k8s.io/heapster
imports github.com/bigdatadev/goryman
imports github.com/golang/protobuf/proto
imports github.com/coreos/etcd/client
imports github.com/coreos/fleet/client
imports github.com/coreos/fleet/pkg
imports github.com/coreos/fleet/registry
imports github.com/emicklei/go-restful
imports github.com/golang/glog
imports github.com/google/cadvisor/client
imports github.com/google/cadvisor/info/v1
imports github.com/hawkular/hawkular-client-go/metrics
imports github.com/influxdb/influxdb/client
imports github.com/optiopay/kafka
imports github.com/optiopay/kafka/proto
imports github.com/golang/snappy
imports google.golang.org/cloud/compute/metadata
imports golang.org/x/net/context
imports k8s.io/kubernetes/pkg/api
imports github.com/davecgh/go-spew/spew
imports github.com/ghodss/yaml
imports gopkg.in/yaml.v2
imports github.com/google/gofuzz
imports github.com/ugorji/go/codec
imports k8s.io/kubernetes/pkg/api/unversioned
imports k8s.io/kubernetes/pkg/fields
imports k8s.io/kubernetes/pkg/labels
imports k8s.io/kubernetes/pkg/api/resource
imports github.com/spf13/pflag
imports speter.net/go/exp/math/dec/inf
imports k8s.io/kubernetes/pkg/auth/user
imports k8s.io/kubernetes/pkg/util
imports github.com/docker/libcontainer/cgroups/fs
imports github.com/docker/docker/pkg/mount
imports github.com/docker/docker/pkg/units
imports github.com/docker/docker/pkg/units
imports github.com/docker/docker/pkg/units: cannot find package "github.com/docker/docker/pkg/units" in any of:
/usr/lib/golang/src/github.com/docker/docker/pkg/units (from $GOROOT)
/tmp/gopath/src/github.com/docker/docker/pkg/units (from $GOPATH)
The command '/bin/sh -c yum install -y -q go git wget make && yum clean all && export GOPATH=/tmp/gopath && export PATH=$PATH:$GOPATH/bin && mkdir -p $GOPATH && cd $GOPATH && go get -d k8s.io/heapster && cd src/k8s.io/heapster && git checkout $HEAPSTER_COMMIT && make && cp heapster /opt && rm -rf $GOPATH && yum remove -y -q go git wget make' returned a non-zero code: 1

cpu usage metrics

Hello,

I am trying to understand how cpu usage is calculated in hawkular.
My query is

curl -H "Authorization: Bearer VLN03sHe6cpdf4eObK2eBPhDjyYyjYhN-jjCJvHL28Y" -H "Hawkular-tenant: ametrics" -H "Accept: application/json" -X GET https://172.30.231.219/hawkular/metrics/counters/data?'metrics=hawkular-cassandra-1%2Ff0183cd1-832b-11e5-ac21-005056042fa2%2Fcpu%2Fusage&bucketDuration=8h'

I get the following result

[{"start":1446814142077,"end":1446842942077,"min":3.49936742588E12,"avg":3.80892379310292E12,"median":3.8093440641764077E12,"max":4.114033428162E12,"percentile95th":4.0840905122976865E12,"samples":5757,"empty":false}]

Could you please explain what min, max, avg here mean in terms of cpu usage?

Thanks,
Shilpa

service account not found error

Hello,
Since we faced an issue kubernetes-retired/heapster#658, we decided to upgrade our old installation. I noticed that you are now using your own heapster image. I followed the installation manual and this leads to the hawkular pod going in pending state forever. When i do describe rc for hawkular metrics, it shows me the following error
Error creating: Pod "hawkular-metrics-" is forbidden: service account cmetrics/hawkular was not found, retry after the service account is created

Kindly help.

Regards,
Shilpa

Heapster keeps restarting with no reason

Following discussion on #62 and kubernetes-retired/heapster#925:

We have updated 2 clusters from OS 1.1.0 to 1.1.3 directly. One of them is working correctly, but on the last one, metrics have "holes":

screenshot 2016-02-27 16 31 18

Apparently, the heapster pod is being deleted on a regular basis (but not a stable frequency). Every 2-5 minutes, I can see these events:

3:41:08 PM  heapster-a1hjp  Pod Normal  Started     Started container with docker id 42d99f4118df
3:41:07 PM  heapster-a1hjp  Pod Normal  Created     Created container with docker id 42d99f4118df
3:41:06 PM  heapster-a1hjp  Pod Normal  Pulled  Successfully pulled image "docker.io/openshift/origin-metrics-heapster:latest"
3:41:05 PM  hawkular-metrics-41gx2  Pod Normal  Started     Started container with docker id 75c76cd6921b
3:41:05 PM  hawkular-metrics-41gx2  Pod Normal  Created     Created container with docker id 75c76cd6921b
3:41:04 PM  heapster-a1hjp  Pod Normal  Pulling     pulling image "docker.io/openshift/origin-metrics-heapster:latest"
3:41:04 PM  hawkular-metrics-41gx2  Pod Normal  Pulled  Successfully pulled image "docker.io/openshift/origin-metrics-hawkular-metrics:latest"
3:41:03 PM  hawkular-metrics-41gx2  Pod Normal  Pulling     pulling image "docker.io/openshift/origin-metrics-hawkular-metrics:latest"
3:41:02 PM  hawkular-metrics-paek6  Pod Normal  Killing     Killing container with docker id 8e5ac35cd912: Need to kill pod.
3:41:00 PM  heapster-a1hjp  Pod Normal  Scheduled   Successfully assigned heapster-a1hjp to node-1
3:41:00 PM  heapster-ifp3k  Pod Normal  Killing     Killing container with docker id 968b35a9fc16: Need to kill pod.
3:41:00 PM  hawkular-metrics    ReplicationController   Normal  SuccessfulCreate    (events with common reason combined) (142 times in the last 4 hours, 31 minutes)
3:41:00 PM  hawkular-metrics-41gx2  Pod Normal  Scheduled   Successfully assigned hawkular-metrics-41gx2 to node-1
3:40:59 PM  heapster    ReplicationController   Normal  SuccessfulCreate    (events with common reason combined) (144 times in the last 4 hours, 34 minutes)

I have deployed (and redeployed) metrics with a single cassandra, a 20Gb pv, and a signed certificate.
The same configuration is working well on the first cluster.

There is no error in the node's log, nor docker container logs.

tags and label based search

Hello,

I am trying to retrieve metrics based on labels. I want to make sure that it only returns me records when all search criterias (specified as part of tags) match.

For eg, in the case below, I want to make sure that if my label "labels:metrics-infra:hawkular-cassandra" is not found, it should not return me anything.

However in my request, even if the label part doesn't match,or I give some incorrect label, it returns me data(i think based on just the tag tags=descriptor_name:cpu/usage).
Is there a way to avoid this?

curl -H "Authorization: Bearer itvTorMGGX5tg1RuBFa0dErA3NdEN8ZhBeGGJ_byfbo" -H "Hawkular-tenant: ametrics" -H "Accept: application/json" -X GET https://172.30.231.219/hawkular/metrics/counters/data?'tags=descriptor_name:cpu/usage&tags=container_name:hawkular-cassandra&tags=labels:metrics-infra:hawkular-cassandra&bucketDuration=8h' --insecure

Thanks,
Shilpa

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.