Giter Club home page Giter Club logo

kubernetes-nats-cluster's Introduction

kubernetes-nats-cluster

NATS cluster on top of Kubernetes made easy.

THIS PROJECT HAS BEEN ARCHIVED. SEE https://github.com/nats-io/nats-operator

NOTE: This repository provides a configurable way to deploy secure, available and scalable NATS clusters. However, a smarter solution in on the way (see #5).

Pre-requisites

  • Kubernetes cluster v1.8+ - tested with v1.9.0 on top of Vagrant + CoreOS
  • At least 3 nodes available (see Pod anti-affinity)
  • kubectl configured to access your cluster master API Server
  • openssl for TLS certificate generation

Deploy

We will be deploying a cluster of 3 NATS instances, with the following set-up:

  • TLS on for clients, but not clustering because peer-auth requires real SANS DNS in certificate
  • NATS client credentials: nats_client_user:nats_client_pwd
  • NATS route/cluster credentials: nats_route_user:nats_route_pwd
  • Logging: debug:false, trace:true, logtime:true

First, make sure to change nats.conf according to your needs. Then create a Kubernetes configmap to store it:

kubectl create configmap nats-config --from-file nats.conf

Next, we need to generate valid TLS artifacts:

openssl genrsa -out ca-key.pem 2048
openssl req -x509 -new -nodes -key ca-key.pem -days 10000 -out ca.pem -subj "/CN=kube-ca"
openssl genrsa -out nats-key.pem 2048
openssl req -new -key nats-key.pem -out nats.csr -subj "/CN=kube-nats" -config ssl.cnf
openssl x509 -req -in nats.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out nats.pem -days 3650 -extensions v3_req -extfile ssl.cnf

Then, it's time to create a couple Kubernetes secrets to store the TLS artifacts:

  • tls-nats-server for the NATS server TLS setup
  • tls-nats-client for NATS client apps setup - one will need it to validate the self-signed certificate used to secure NATS server
kubectl create secret generic tls-nats-server --from-file nats.pem --from-file nats-key.pem --from-file ca.pem
kubectl create secret generic tls-nats-client --from-file ca.pem

ATTENTION: Both using self-signed certificates and using the same certificates for securing client and cluster connections is a significant security compromise. But for the sake of showing how it can be done, I'm fine with doing just that. In an ideal scenario, there should be:

  • One centralized PKI/CA
  • One certificate for securing NATS route/cluster connections
  • One certificate for securing NATS client connections
  • TLS route/cluster authentication should be enforced, so one TLS certificate per route/cluster peer
  • TLS client authentication should be enforced, so one TLS certificate per client

And finally, we deploy NATS:

kubectl create -f nats.yml

Logs should be enough to make sure everything is working as expected:

$ kubectl logs -f nats-0
[1] 2017/12/17 12:38:37.801139 [INF] Starting nats-server version 1.0.4
[1] 2017/12/17 12:38:37.801449 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2017/12/17 12:38:37.801580 [INF] Listening for client connections on 0.0.0.0:4242
[1] 2017/12/17 12:38:37.801772 [INF] TLS required for client connections
[1] 2017/12/17 12:38:37.801778 [INF] Server is ready
[1] 2017/12/17 12:38:37.802078 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2017/12/17 12:38:38.874497 [TRC] 10.244.1.3:33494 - rid:1 - ->> [CONNECT {"verbose":false,"pedantic":false,"user":"nats_route_user","pass":"nats_route_pwd","tls_required":true,"name":"KGMPnL89We3gFLEjmp8S5J"}]
[1] 2017/12/17 12:38:38.956806 [TRC] 10.244.74.2:46018 - rid:3 - ->> [CONNECT {"verbose":false,"pedantic":false,"user":"nats_route_user","pass":"nats_route_pwd","tls_required":true,"name":"Skc5mx9enWrGPIQhyE7uzR"}]
[1] 2017/12/17 12:38:39.951160 [TRC] 10.244.1.4:46242 - rid:4 - ->> [CONNECT {"verbose":false,"pedantic":false,"user":"nats_route_user","pass":"nats_route_pwd","tls_required":true,"name":"0kaCfF3BU8g92snOe34251"}]
[1] 2017/12/17 12:40:38.956203 [TRC] 10.244.74.2:46018 - rid:3 - <<- [PING]
[1] 2017/12/17 12:40:38.958279 [TRC] 10.244.74.2:46018 - rid:3 - ->> [PING]
[1] 2017/12/17 12:40:38.958300 [TRC] 10.244.74.2:46018 - rid:3 - <<- [PONG]
[1] 2017/12/17 12:40:38.961791 [TRC] 10.244.74.2:46018 - rid:3 - ->> [PONG]
[1] 2017/12/17 12:40:39.951421 [TRC] 10.244.1.4:46242 - rid:4 - <<- [PING]
[1] 2017/12/17 12:40:39.952578 [TRC] 10.244.1.4:46242 - rid:4 - ->> [PONG]
[1] 2017/12/17 12:40:39.952594 [TRC] 10.244.1.4:46242 - rid:4 - ->> [PING]
[1] 2017/12/17 12:40:39.952598 [TRC] 10.244.1.4:46242 - rid:4 - <<- [PONG]

Scale

WARNING: Due to the Pod anti-affinity rule, for scaling up to n NATS instances, one needs n available Kubernetes nodes.

kubectl scale statefulsets nats --replicas 5

Did it work?

NAME             TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
svc/kubernetes   ClusterIP   10.100.0.1   <none>        443/TCP                      1h
svc/nats         ClusterIP   None         <none>        4222/TCP,6222/TCP,8222/TCP   4m

NAME        READY     STATUS    RESTARTS   AGE
po/nats-0   1/1       Running   0          4m
po/nats-1   1/1       Running   0          4m
po/nats-2   1/1       Running   0          4m
po/nats-3   1/1       Running   0          7s
po/nats-4   1/1       Running   0          6s

Access the service

Don't forget that services in Kubernetes are only acessible from containers in the cluster.

In this case, we're using a headless service.

Just point your client apps to:

nats:4222

One of the main advantages of running NATS on top of Kubernetes is how resilient the cluster becomes, particularly during node restarts. However if all NATS pods are scheduled onto the same node(s), this advantage decreases significantly and may even result in service downtime.

It is then highly recommended that one adopts pod anti-affinity in order to increase availability. This is enabled by default (see nats.yml).

kubernetes-nats-cluster's People

Contributors

bmcustodio avatar kozlovic avatar pires avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-nats-cluster's Issues

Possible issue with tls-volume and config-volume mounted with common root dir

May or may not be an issue, but I found out that a process will not get access
to tls-volume after a second or two of startup when the secret volume is
mounted with the same root directory than another.

In your case, the tls-volume is mounted as /etc/nats/tls and the config
volume is mounted as /etc/nats. This has caused unexplained issue on
nats-io/nats-streaming-server#458. The way
to resolve this was to mount the config volume on /nats as opposed to /etc/nats.

Updating to gnatsd 1.0.4 causes cryptic message

I attempted to update the binary gnatsd and got the following message from k8s on container crash loop:

kubectl logs -f po/nats-723182443-x4xb2
/gnatsd: line 1: 404:: not found

That's all that is displayed. Any thoughts or insights?

Add TLS support

With Kubernetes internal DNS domain, it should be pretty straightforward to enable TLS, both for client, cluster and monitoring services.

When seed is self, the server no longer joins the cluster

I have followed all the steps in the readme, however I have run into an issue. It appears as though NATS gets confused when it tries to cluster to itself. I get the error: "Detected route to self, ignoring "nats://ruser:T0pS3cr3t@nats:6222"" in the logs and the instance stops trying to contact the cluster. I think what is happening is that NATS isn't using all the addresses of the DNS and therefore randomly selecting an A record from the results. It is getting its own IP address and tries to connect to itself. Is this a problem you have also noticed, or does it eventually work itself out?

tls

hi ,tls donot work

Image for 1.0.0?

It would be useful to kind of "backport" 7f58995 to at least a previous version of NATS (e.g., 1.0.0). This would enable us to better test the upgrade scenario in pires/nats-operator while a new version of NATS isn't released.

In order not to mess with master now that it targets gnatsd-1.0.2, I'd suggest to either...

  • Create a gnatsd-1.0.0 branch, replace the gnatsd binary and tag so that Quay picks it up.
  • Manually create and push such an image to Quay.

I've set up what is necessary at

https://github.com/brunomcustodio/kubernetes-nats-cluster/tree/gnatsd-1.0.0

in case this sounds like a good idea. If you don't want to go through the trouble of building the image you can docker pull quay.io/brunomcustodio/docker-nats:1.0.0, re-tag and push to Quay.

Issue: No clusters formed pods seem to always route just to themselves

Hi

Would you be able to deploy this to a kubernetes cluster and confirm that the nats pods all form a cluster please? When I try it it doesn't work! I've tried to give you as much details as I can below - but if you need more info just let me know. I'm keen to get nats working in a reliable cluster on kubernetes.

What goes wrong:

When I deploy 1 node it's fine. This is expected: the liveness probe returns without an error because there's only one service record for 'nats'.

Then I scale up the replica count to e.g. 7. (I've tried 3, 5 and 7). Each time, the individual nodes don't cluster together (by this I mean checking any of the /routez gives no routes). They last for a minute or so each, and then each get killed and restarted (and this repeats). This is because the liveness probe's isFirstNatsInstanceInCluster returns false, and then a get request to localhost:8222/routez returns no routes and so the probe exits with an error - which causes kubernetes to restart the pod. I haven't seen any evidence of clustering at all I'm afraid.

I've also done:

kubectl exec nats-[choose-one-of-the-pods] -- /route_checker

By executing the probe myself I've confirmed the above - that the probe always checks /routez and gets no routes. Adding a fmt.Printf statement confirms that data.routez is always zero.
I see the same thing if I do:

kubectl port-forward nats-[choose-one-of-the-pods] 8222 &

... and then go to http://localhost:8222/routez in my browser. No routes as expected.

Logs from one of the pods:

2016-11-25T23:35:01.942560989Z [10] 2016/11/25 23:35:01.942448 [INF] Starting nats-server version 0.9.4
2016-11-25T23:35:01.942689899Z [10] 2016/11/25 23:35:01.942667 [DBG] Go build version go1.7
2016-11-25T23:35:01.942730791Z [10] 2016/11/25 23:35:01.942709 [INF] Starting http monitor on 0.0.0.0:8222
2016-11-25T23:35:01.942825927Z [10] 2016/11/25 23:35:01.942799 [INF] Listening for client connections on 0.0.0.0:4222
2016-11-25T23:35:01.942890205Z [10] 2016/11/25 23:35:01.942874 [INF] TLS required for client connections
2016-11-25T23:35:01.942938460Z [10] 2016/11/25 23:35:01.942908 [DBG] Server id is HnKKN1x0nOinzD1NJhlAo8
2016-11-25T23:35:01.942961863Z [10] 2016/11/25 23:35:01.942950 [INF] Server is ready
2016-11-25T23:35:01.943291947Z [10] 2016/11/25 23:35:01.943237 [INF] Listening for route connections on 0.0.0.0:6222
2016-11-25T23:35:01.943398505Z [10] 2016/11/25 23:35:01.943381 [DBG] Trying to connect to route on nats:6222
2016-11-25T23:35:01.945178570Z [10] 2016/11/25 23:35:01.945142 [DBG] 10.124.3.35:6222 - rid:1 - Route connection created
2016-11-25T23:35:01.945225122Z [10] 2016/11/25 23:35:01.945210 [DBG] 10.124.3.35:6222 - rid:1 - Route connect msg sent
2016-11-25T23:35:01.945442564Z [10] 2016/11/25 23:35:01.945409 [DBG] 10.124.3.35:60662 - rid:2 - Route connection created
2016-11-25T23:35:01.945569664Z [10] 2016/11/25 23:35:01.945549 [TRC] 10.124.3.35:60662 - rid:2 - ->> [CONNECT {"verbose":false,"pedantic":false,"user":"ruser","pass":"T0pS3cr3t","tls_required":false,"name":"HnKKN1x0nOinzD1NJhlAo8"}]
2016-11-25T23:35:01.945697424Z [10] 2016/11/25 23:35:01.945666 [DBG] 10.124.3.35:60662 - rid:2 - Router connection closed
2016-11-25T23:35:01.945842955Z [10] 2016/11/25 23:35:01.945824 [DBG] 10.124.3.35:6222 - rid:1 - Router connection closed
2016-11-25T23:35:01.945942765Z [10] 2016/11/25 23:35:01.945923 [DBG] Detected route to self, ignoring "nats://ruser:T0pS3cr3t@nats:6222"

To me this looks like it's connecting to itself. Perhaps 'nats' in --routes nats://$USER:$PASS@$SVC:6222 in run.sh is always turning into the ip of itself, rather than what I thought would happen where it returns either a random one or a list. (Not sure how to check this I'm afraid - from go, net.LookupSRV definitely returns a list as I've logged this to check. I'm not sure how to check from the command line though.)

More details:

  • I'm running this on google container engine.
  • I've simply pulled the master repo and ran it as is.
  • I've tried both the tls and non-tls versions. I also tried removing the username and password auth from run.sh.

Thanks for the useful repo and the help.

Mike

PS:

  • I also had to change :
server     = flag.String("server", "localhost", "NATS server to query")

to:

server     = flag.String("server", "localhost:8222", "NATS server to query")

because the http get request in route_checker to localhost/routez would fail without the port number.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.