Giter Club home page Giter Club logo

kubernetes-graphite-cluster's Introduction

kubernetes-graphite-cluster

A deployment-ready graphite cluster on top of Kubernetes. Find the full details here

Contents:

  1. A statsd proxy deployment and service for metric collection
  2. A statsd daemon deployment and service for metric aggregation and shipping
  3. Carbon relay deployment and service to spread metrics across several Graphite data nodes
  4. Graphite data nodes as a stateful set with persistent volumes
  5. Graphite query node to be used as a query gateway to the data nodes

Requirements:

  1. Kubernetes version 1.5.X (We're using StatefulSet)
  2. kubectl configured to work with your Kubernetes API
  3. Tested on Kubernetes 1.5.X/1.6.X (Without RBAC) on top of AWS/GKE
  4. Optional - Access to your own docker repository to store your own images. That's relevant if you don't want to use the default images offered here.

Environment Variables:

Name Default Value Purpose
DOCKER_REPOSITORY nanit Change it if you want to build and use custom docker repository. nanit images are public so leaving it as it is should work out of the box.
SUDO sudo Should docker commands be prefixed with sudo. Change to "" to omit sudo.
STATSD_PROXY_REPLICAS None Number of replicas for statsd proxy
STATSD_DAEMON_REPLICAS None Number of StatsD daemons running behind the proxies.
CARBON_RELAY_REPLICAS None Number of replicas for carbon relay
GRAPHITE_NODE_REPLICAS None The number of Graphite data nodes in the cluster. This number affects both carbon relay and graphite master configuration.
GRAPHITE_NODE_DISK_SIZE None The size of the persistent disk to be allocated for each Graphite node.
GRAPHITE_NODE_CURATOR_RETENTION None Set this variable to run a cronjob which deletes metrics that haven't been written for X days. Leaving it blank will not run the curator
GRAPHITE_NODE_STORAGE_CLASS None The storage class for the persistent volumen claims of the Graphite node stateful set
GRAPHITE_MASTER_REPLICAS None Number of replicas for graphite query node

Deployment:

  1. Clone this repository
  2. Run:
export DOCKER_REPOSITORY=nanit && \
export STATSD_PROXY_REPLICAS=3 && \
export STATSD_DAEMON_REPLICAS=2 && \
export CARBON_RELAY_REPLICAS=3 && \
export GRAPHITE_NODE_REPLICAS=3 && \
export GRAPHITE_NODE_DISK_SIZE=30G && \
export GRAPHITE_NODE_CURATOR_RETENTION=5 && \
export GRAPHITE_MASTER_REPLICAS=1 && \
export GRAPHITE_NODE_STORAGE_CLASS=default && \
export STATSD_PROXY_ADDITIONAL_YAML="" && \
export STATSD_DAEMON_ADDITIONAL_YAML="" && \
export CARBON_RELAY_ADDITIONAL_YAML="" && \
export GRAPHITE_NODE_ADDITIONAL_YAML="" && \
export SUDO="" && \
make deploy

Usage:

After the deployment is done there are two endpoints of interest:

  1. statsd:8125 is the host for your metrics collection. It points the statsd proxies.
  2. graphite:80 is the host for you metrics queries. It points to the graphite query node which queries all data nodes in the cluster.

Run kubectl get pods,statefulsets,svc and expect to see the following resources:

K8s resources on a clean cluster

The replicas of each resource may change according to your environment variables of course.

Verifying The Deployment:

To verify everything works as expected just paste the following into your terminal:

POD_NAME=$(kubectl get pods -l app=statsd -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $POD_NAME bash 
for i in {1..10}
do
  echo "test_counter:1|c" | nc -w1 -u statsd 8125
  sleep 1
done

apk --update add curl
curl 'graphite/render?target=stats.counters.test_counter.count&from=-10min&format=json'

You should see a lot of null values along with your few increments at the end.

Building your own images

If you want to build use your own images make sure to change the DOCKER_REPOSITORY environment variable to your own docker repository. It will build the images, push them to your docker repository and use them to create all the needed kubernetes deployments.

Changing an active cluster configuration

Graphite nodes and StatsD daemons are deployed as StatefulSets. The StatsD proxies continuously watch the Kubernetes API for StatsD daemon endpoints and updates the configuration. Both Graphite master and carbon relays continuously watch the Kubernetes API for Graphite nodes endpoints and update the configuration.

That means you can scale each part independently, and the system reacts to your changes by updating its config file accordingly.

Acknowledgement

  1. I have learnt a lot about Graphite clustering from this excellent article
  2. The docker images for the graphite nodes are based on this repository

kubernetes-graphite-cluster's People

Contributors

erez-rabih avatar mixecan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-graphite-cluster's Issues

Graphite default user/pass

It's not clear what the default Graphite user/pass is, and the default that Graphite documents doesn't work. What is the default for the Graphite administrative panel user/pass?

First-run database migrations required

I just set up a cluster per the recommendations here and found that I needed to set up the sqlite database for a first run lest specific queries fail such as graphite/metrics/find?query=*:

# curl graphite/metrics/find\?query=\*
<body style="background-color: #666666; color: black;">
<center>
<h2 style='font-family: "Arial", sans-serif'>
<p>Graphite encountered an unexpected error while handling your request.</p>
<p>Please contact your site administrator if the problem persists.</p>
</h2>
<br/>
<div style="width: 50%; text-align: center; font-family: monospace; background-color: black; font-weight: bold; color: #ff4422;">

</div>

<div style="width: 70%; text-align: left; background-color: black; color: #44ff22; border: thin solid gray;">
<pre>
Traceback (most recent call last):
  File &quot;/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py&quot;, line 149, in get_response
    response = self.process_exception_by_middleware(e, request)
  File &quot;/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py&quot;, line 147, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File &quot;/opt/graphite/webapp/graphite/metrics/views.py&quot;, line 150, in find_view
    profile = getProfile(request)
  File &quot;/opt/graphite/webapp/graphite/user_util.py&quot;, line 25, in getProfile
    return default_profile()
  File &quot;/opt/graphite/webapp/graphite/user_util.py&quot;, line 41, in default_profile
    &#39;password&#39;: &#39;!&#39;})
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py&quot;, line 122, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/models/query.py&quot;, line 465, in get_or_create
    return self.get(**lookup), False
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/models/query.py&quot;, line 381, in get
    num = len(clone)
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/models/query.py&quot;, line 240, in __len__
    self._fetch_all()
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/models/query.py&quot;, line 1074, in _fetch_all
    self._result_cache = list(self.iterator())
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/models/query.py&quot;, line 52, in __iter__
    results = compiler.execute_sql()
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py&quot;, line 848, in execute_sql
    cursor.execute(sql, params)
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/backends/utils.py&quot;, line 64, in execute
    return self.cursor.execute(sql, params)
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/utils.py&quot;, line 95, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/backends/utils.py&quot;, line 64, in execute
    return self.cursor.execute(sql, params)
  File &quot;/usr/local/lib/python2.7/dist-packages/django/db/backends/sqlite3/base.py&quot;, line 323, in execute
    return Database.Cursor.execute(self, query, params)
OperationalError: no such table: auth_user

</pre>
</div>

</center>

Luckily, it wasn't too difficult to figure out how to set up the schema for a first run:

PYTHONPATH=/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb

I'm going to propose that this be done immediately following the creation of the database file or inside a file missing or zero length test in the entry point.

Test on other providers than AWS

Currently the setup has only been tested on AWS.
It would be nice to have it tested and adjusted to other k8s configurations as well.

Deployment fails on Kubernetes 1.6.2 deployed on CentOS

I have a Kube 1.6.2 deployed over CentOS VMs using Kubeadm.

When I tried to deploy the carbon/graphite cluster, I got the following errors in the Make progress:

...
...
Step 22 : RUN chmod 0664 /opt/graphite/storage/graphite.db
 ---> Running in b4297ad8b242
 ---> e468c1e48708
Removing intermediate container b4297ad8b242
Step 23 : RUN cp /src/graphite-web/webapp/manage.py /opt/graphite/webapp
 ---> Running in 8454921eed63
 ---> fc42a56abfc4
Removing intermediate container 8454921eed63
Step 24 : RUN cd /opt/graphite/webapp/ && python manage.py migrate --run-syncdb --noinput
 ---> Running in b01ad861d148
Traceback (most recent call last):
  File "manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 302, in execute
    settings.INSTALLED_APPS
  File "/usr/local/lib/python2.7/dist-packages/django/conf/__init__.py", line 55, in __getattr__
    self._setup(name)
  File "/usr/local/lib/python2.7/dist-packages/django/conf/__init__.py", line 43, in _setup
    self._wrapped = Settings(settings_module)
  File "/usr/local/lib/python2.7/dist-packages/django/conf/__init__.py", line 99, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/opt/graphite/webapp/graphite/settings.py", line 153, in <module>
    from graphite.local_settings import *  # noqa
  File "/opt/graphite/webapp/graphite/local_settings.py", line 246
    CLUSTER_SERVERS = [@@GRAPHITE_NODES@@]
                       ^
SyntaxError: invalid syntax
The command '/bin/sh -c cd /opt/graphite/webapp/ && python manage.py migrate --run-syncdb --noinput' returned a non-zero code: 1
make: *** [docker-graphite-master] Error 1 

This happens in the carbon-relay pod logs:

NAMESPACE NAME READY STATUS RESTARTS AGE
default carbon-relay-2782403225-cvtk3 0/1 CrashLoopBackOff 5 23m
default carbon-relay-2782403225-xf7nk 0/1 CrashLoopBackOff 9 23m
default carbon-relay-2782403225-zmlf5 0/1 ErrImagePull 0 23m

Log:

kubectl logs carbon-relay-2782403225-xf7nk

+ /set-cluster-nodes.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to kubernetes port 443: Connection refused
+ exec /opt/graphite/bin/carbon-relay.py --debug --logdir=/var/log/carbon start
Traceback (most recent call last):
Starting carbon-relay (instance a)
  File "/opt/graphite/bin/carbon-relay.py", line 32, in <module>
    run_twistd_plugin(__file__)
  File "/opt/graphite/lib/carbon/util.py", line 96, in run_twistd_plugin
    runApp(config)
  File "/usr/local/lib/python2.7/dist-packages/twisted/scripts/twistd.py", line 23, in runApp
    _SomeApplicationRunner(config).run()
  File "/usr/local/lib/python2.7/dist-packages/twisted/application/app.py", line 376, in run
    self.application = self.createOrGetApplication()
  File "/usr/local/lib/python2.7/dist-packages/twisted/application/app.py", line 436, in createOrGetApplication
    ser = plg.makeService(self.config.subOptions)
  File "/opt/graphite/lib/twisted/plugins/carbon_relay_plugin.py", line 21, in makeService
    return service.createRelayService(options)
  File "/opt/graphite/lib/carbon/service.py", line 135, in createRelayService
    setupPipeline(['relay'], root_service, settings)
  File "/opt/graphite/lib/carbon/service.py", line 86, in setupPipeline
    setupRelayProcessor(root_service, settings)
  File "/opt/graphite/lib/carbon/service.py", line 175, in setupRelayProcessor
    for destination in util.parseDestinations(settings.DESTINATIONS):
  File "/opt/graphite/lib/carbon/util.py", line 121, in parseDestinations
    return [parseDestination(dest_string) for dest_string in destination_strings]
  File "/opt/graphite/lib/carbon/util.py", line 117, in parseDestination
    return server, int(port), instance
ValueError: invalid literal for int() with base 10: '@@GRAPHITE_NODES@@'


OSS License?

I'm interested in leveraging your work. Are you planning on applying an Open Source License?

RBAC support?

I set up RBAC support over at bittorrent@ca441f7 along with a change to use EBS as our EFS volumes didn't have enough io.

Is there any interest in formatting the RBAC portions of that commit into a PR? If so, I would be happy to do so.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.