mirantis / k8s-netchecker-server Goto Github PK

View Code? Open in Web Editor NEW

67.0 9.0 19.0 316 KB

Basic network checker service to check DNS and connectivity in kubernetes cluster

License: Apache License 2.0

Makefile 4.94% Go 84.11% Smarty 0.56% Shell 10.39%

netchecker-server k8s-netchecker-server network-checker golang kubernetes

k8s-netchecker-server's Introduction

Status

What it is and how it works

Network checker is a Kubernetes application. Its main purpose is checking of connectivity between the cluster's nodes. Network checker consists of two parts: server (this repository) and agent (developed here). Agents are deployed on every Kubernetes node using Daemonset. Agents come in two flavors - and default setup includes two corresponding daemonsets. The difference between them is that "Agent-hostnet" is tapped into host network namespace via supplying hostNetwork: True key-value for the corresponding Pod's specification. As shown on the diagram, both daemonsets are enabled for each node meaning exactly one pod of each kind will be deployed on each node.

The agents periodically gather network related information (e.g. interfaces' info, results of nslookup, results of latencies measurement, etc.) and send it to the server as periodic agent reports. Report includes agent pod name and its node name so that the report is uniquely identified using them.

The server is deployed in a dedicated pod using Deployment and exposed inside of the cluster via Kubernetes service resource. Thus, every agent can access the server by the service's DNS name.

Server processes the incoming agent data (agents' reports) and store it in a persistent data storage. Server is capable to use either Kubernetes third party resources (TPR) or etcd as a persistent data storage:

TPR. New data type called agent was added into TPR, Kubernetes API was extended with this new type, and all agent data is stored using it. When using TPR, the server is vulnerable to date change issue. The issue was solved by using etcd and its TTL feature. Please also note that TPR is deprecated starting from Kubernetes v.1.7 and can be removed in future Kubrenetes versions. It will not be supported in Netchecker then. No migration to Kubernetes CRD (substitution for TPR) is planned either.
etcd. The recommended storage provider. When using etcd, the server is resistant to issues described in TPR section. Agent data is stored in etcd in this case, under /netchecker path.

Server also calculates metrics based on agent data. Metrics data is stored in server's memory for now - this implicates loss of metrics data when server application is shutdown or restarted; it is going to be reworked by moving to a persistent storage (to etcd only) in future.

Server provides HTTP RESTful interface which currently includes the following requests (verb - URI designator - meaning of the operation):

GET/POST - /api/v1/agents/{agent_name} - get, create/update agent's data record in a persistant storage.
GET - /api/v1/agents/ - get the whole agent data dump.
GET - /api/v1/connectivity_check - get result of connectivity check between the server and the agents.
GET - /metrics - get the network checker metrics.

The main logic of network checking is implemented behind connectivity_check endpoint. It is the only user-facing URI. In order to determine whether connectivity is present between the server and agents, former retrieves the list of pods using Kubernetes API (filtering by labels netchecker-agent and netchecker-agent-hostnet), then analyses stored agent data. Success of the checking is determined based on two criteria. First - there is an entry in the stored data for the each retrieved agent's pod; it means an agent request has got through the network to the server. Consequently, link is established and active within the agent-server pair. Second - difference between the time of the check and the time when the data was received from particular agent must not exceed two periods of agent's reporting (there is a field in the payload holding the report interval). In opposite case, it will indicate that connection is lost and requests are not coming through. In case of using etcd, period of agent's data obsolescence is set explicitly in parameters to the server (-report-ttl parameter, in seconds). Let us remember that each agent corresponds to one particular pod, unique for particular node, so connection between agents and server means connection between the corresponding nodes.

Results of the connectivity check which are represented in response from the endpoint particularly indicate possible connectivity issue (e.g. there is an Absent field listing agents which haven't reported at all and Outdated one listing those which reports are out of data obsolescence period).

One aspect of functioning of network checker is worth mentioning. Payloads sent by the agents are of relatively small byte size which in some cases can be less than MTU value set for the cluster's network links. When this happens, the network checker will not catch problems with network packet's fragmentation. For that reason, special option can be used with the agent application - -zeroextenderlength. By default, it has value of 1500. The parameter tells the agent to extend each payload by given length to exceed packet fragmentation trigger threshold. This dummy data has no effect on the server's processing of the agent's requests (reports).

Usage

To start the server inside Kubernetes pod using Kubernetes TPR as a persistent storage and listen on port 8081, use the following command:

server -v 5 -logtostderr -kubeproxyinit -endpoint 0.0.0.0:8081

To start the server using etcd as a persistent storage, use the following setting:

-kubeproxyinit=false

Also, a few parameters are required to establish the connection with etcd:

-etcd-endpoints=https://192.0.10.11:4001,https://192.0.10.12:4001
-etcd-key=/var/lib/etcd/client.key (optional, ommited when using http)
-etcd-cert=/var/lib/etcd/client.pem (optional, ommited when using http)
-etcd-ca=/var/lib/etcd/ca.pem (optional, can be ommited even when using https)

For other possibilities regarding testing, code and Docker images building etc. please refer to the Makefile.

Deployment in Kubernetes cluster

In order to deploy the application, two options can be used.

First - using ./examples/deploy.sh script. Users must provide all the needed environment variables (e.g. name and tag for Docker images) before running the script.

Second - deploy as a helm chart. If users have Helm installed on their Kubernetes cluster they can build the chart from its description (./helm-chart/) and then deploy it (please, use Helm's documentation for details).

Additional documentation

Metrics - metrics and Prometheus configuration how to.

k8s-netchecker-server's People

Contributors

Stargazers

Watchers

Forkers

aateem aarzhanov falkerson alexeykasatkin yangpj adidenko anilreddyv xenolog olmoser mahzoun vrovachev skonefal andaok bumpo avaussant eip-work cccfs

k8s-netchecker-server's Issues

Fix "GoLint/Comments/DocComments" issue in pkg/utils/data.go

exported type CheckConnectivityInfo should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/data.go#issue_58db9b744a7c20000100001c

Fix "GoLint/Comments/DocComments" issue in pkg/utils/data.go

exported type AgentInfo should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/data.go#issue_58db9b744a7c20000100001b

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.UpdateAgents should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000022

Add handler for retrieving of particular agent

Fix "GoLint/Comments/DocComments" issue in pkg/utils/k8s.go

exported method KubeProxy.SetupClientSet should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/k8s.go#issue_58db9b754a7c20000100002d

Add support of scalability

At the moment the server does not support horizontal scaling. So we need to add a possibility to run several server applications under single Kubernetes service (behind load-balancer). In order to do so we need to separate the data storage from the server application as a first step.

Test ticket from Code Climate

Add possibility to customize artifacts building

E.g. user should be able to choose custom name and tag while building release docker image via make tooling.

Fix "GoLint/Range-Loop/" issue in pkg/utils/handler.go

should omit 2nd value from range; this loop is equivalent to for agentName := range ...

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000028

Add example script for deployment on k8s cluster

Fix "GoLint/Comments/DocComments" issue in pkg/utils/k8s.go

exported var AgentLabelValues should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/k8s.go#issue_58db9b754a7c20000100002a

Research possibility to for publishing the application as Helm chart

make doesn't provide any help info

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.GetAgents should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000023

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.ConnectivityCheck should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000025

Fix "GoLint/Comments/DocComments" issue in pkg/utils/k8s.go

exported type Proxy should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/k8s.go#issue_58db9b754a7c20000100002b

Update requirements for k8s client code

Fix "GoLint/Naming/MixedCaps" issue in pkg/utils/handler.go

don't use leading k in Go names; var kProxy should be proxy

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58d6f2c80193fc000100001e

Fix "GoLint/Errors/" issue in pkg/utils/utils.go

should replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...)

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/utils.go#issue_58db9b754a7c200001000030

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.CleanCache should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000027

Add ability to track last connectivity issue (or even issues history) for every particular agent.

Now netchecker server remembers (and returns) only current connectivity status with its every known agent. So, flapping issues (or even long connectivity issue after the connection is restored) can be missed.
Need to add an ability to track connectivity status changes so that user can see there was a connectivity issue(s) in the past.

Add cleanup of stalled agents' data from the cache

Investigate the way to do graceful cleanup of agents' data which are no more present in the response from k8s API.

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.AddMiddleware should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000021

Refactor server repetitive error handling

There is a lot of duplicated code of handling marshaling/unmarshaling, writing response, reading request data errors inside handler functions

Create documentation

We should have a documentation describing each endpoint in the API, how connectivity check works, etc.

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported function NewHandler should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c20000100001e

Fix "GoLint/Comments/DocComments" issue in pkg/utils/k8s.go

exported type KubeProxy should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/k8s.go#issue_58db9b754a7c20000100002c

Make possible to test custom payload size

By default agent sends small amount of data which is lower than default MTU size (1500). So if cluster has problems with MTU configuration, netchecker will not detect those since it will not trigger packet fragmentation.

We can implement a separate parameter which would simply add some payload to agent JSON report which should be then discarded on the server side. This will allow netchecker to simply test network MTU and packet fragmentation.

It should be possible to configure this parameter during the runtime and see the current size of additional payload in agents' report.

Fix "GoLint/Comments/DocComments" issue in pkg/utils/k8s.go

exported method KubeProxy.Pods should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/k8s.go#issue_58db9b754a7c20000100002e

Test ticket from Code Climate

Reorganize source code

Now all Go source files lay in the root of the project which makes navigation through it not very convenient. All source files should be stored in separate directory, sub-package and cmd directories preferably.

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.SetupRouter should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000020

Use negroni.Recovery for processing 'InternalServerError' responses

Internal code errors in the handlers are managed separately for each possible case which leads to code redundancy need to take into account that every case. Using single point of processing will eliminate those drawbacks

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.GetSingleAgent should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000024

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported method Handler.CheckAgents should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c200001000026

Fix "GoLint/Errors/" issue in pkg/utils/utils.go

should replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...)

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/utils.go#issue_58db9b754a7c200001000031

Return error response when POSTing inappropriate data for agent

When the request's data for updateAgents handler cannot be read from the body or decoded into Golang struct error is not propagated to the client, the agent cache is updated by the struct's zero value, and response has successful status. In such cases clients should be notified by response that their request is malformed.

Fix "GoLint/Comments/DocComments" issue in pkg/utils/handler.go

exported type Handler should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/handler.go#issue_58db9b744a7c20000100001d

Fix "GoLint/Errors/" issue in pkg/utils/utils.go

should replace errors.New(fmt.Sprintf(...)) with fmt.Errorf(...)

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/utils.go#issue_58db9b754a7c20000100002f

Add examples of k8s resources

Fix "GoLint/Comments/DocComments" issue in pkg/utils/k8s.go

exported const AgentLabelKey should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/k8s.go#issue_58db9b754a7c200001000029

Refactor unit test to get rid of duplicated code

Some chunks of functionality (e.g. checking for status code of a response) could be moved to separate functions and reused in the test functions.

Fix "GoLint/Comments/DocComments" issue in pkg/utils/data.go

exported type AgentInfo should have comment or be unexported

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/data.go#issue_58db9b744a7c20000100001b

Add logging for HTTP requests

Return Internal Server Error response as result of GET agents request if error occurs inside the handler func

Now error is only logged and empty body is returned with 200 OK status code

Fix "GoVet/BugRisk" issue in pkg/utils/data.go

struct field tag json="message" not compatible with reflect.StructTag.Get: bad syntax for struct tag pair

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/data.go#issue_58db9b754a7c200001000035

Fix "GoVet/BugRisk" issue in pkg/utils/data.go

struct field tag json="outdated,omitempty" not compatible with reflect.StructTag.Get: bad syntax for struct tag pair

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/data.go#issue_58db9b754a7c200001000036

Fix gofmt issue (goreportcard)

https://goreportcard.com/report/github.com/Mirantis/k8s-netchecker-server#gofmt

gofmt 83%

Gofmt formats Go programs. We run gofmt -s on your code, where -s is for the "simplify" command
k8s-netchecker-server/pkg/utils/handler.go
Line 1: warning: file is not gofmted with -s (gofmt)

Fix "GoVet/BugRisk" issue in pkg/utils/data.go

struct field tag json="absent,omitempty" not compatible with reflect.StructTag.Get: bad syntax for struct tag pair

https://codeclimate.com/github/Mirantis/k8s-netchecker-server/pkg/utils/data.go#issue_58db9b754a7c200001000037

mirantis / k8s-netchecker-server Goto Github PK

k8s-netchecker-server's Introduction

Status

What it is and how it works

Usage

Deployment in Kubernetes cluster

Additional documentation

k8s-netchecker-server's People

Contributors

Stargazers

Watchers

Forkers

k8s-netchecker-server's Issues

Recommend Projects

Recommend Topics

Recommend Org