cloudfoundry-incubator / kubo-release Goto Github PK

View Code? Open in Web Editor NEW

160.0 58.0 76.0 26.5 MB

Kubernetes BOSH release

Home Page: https://www.cloudfoundry.org/container-runtime/

License: Apache License 2.0

Shell 34.78% HTML 18.75% Go 8.10% Ruby 38.37%

kubernetes k8s bosh-release cloud-foundry bosh kubo-release cloud-foundry-container-runtime cfcr-cluster

kubo-release's People

Contributors

Stargazers

Watchers

kubo-release's Issues

Add proxy testing on CI for vsphere

Rationale - We already test proxy as part of our CI on GCP. We should do same on Vsphere.

Acceptance Criteria

Verify that CI pipeline for Vsphere includes Proxy job

kubernetes-controller-manager failing to start when running on bosh-lite

When deploy kubo-release 0.7.0 on bosh-lite the master nodes are not able to start (more specifically the kubernetes-controller-manager job).

Logs below:

==> /var/vcap/sys/log/kubernetes-controller-manager/kubernetes_controller_manager_ctl.stderr.log <==
+ declare pid=5750
+ ps -p 5750
+ __log 'Removing stale pidfile'
+ echo 'Removing stale pidfile'
+ rm /var/vcap/sys/run/kubernetes/kubernetes_controller_manager.pid
+ echo 5760
+ start_kubernetes_controller_manager
+ '[' -f /sys/class/dmi/id/product_serial ']'
+ chmod a+r /sys/class/dmi/id/product_serial
chmod: changing permissions of '/sys/class/dmi/id/product_serial': Read-only file system

==> /var/vcap/sys/log/kubernetes-controller-manager/kubernetes_controller_manager_ctl.stdout.log <==
------------ STARTING kubernetes_controller_manager_ctl at Tue Sep 12 07:32:00 UTC 2017 --------------
Removing stale pidfile

I was able to work around the issue by, during a bosh deploy, removing the following line from the control script (by using bosh ssh and vi).

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

Optionally include the K8s dashboard as part of the cluster deployment

Our kubernetes-system-specs job is now achieving two things:
1 - Installing kubedns which is crucial for pod-2-pod communication
2 - Installing the kubernetes dashboard UI, which can be optional

Let's split those into separate jobs, making the dashboard optional, effectively deploying it via a bosh errand.

SSL for Service Cluster IP

The service cluster ip is 10.100.200.1 and ssl used is same one for api. It looks like we need seperate ssl certs for service cluster ip for kubernetes

$ kubectl get services
NAME         CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.100.200.1   <none>        443/TCP   57m

It causes issues for things that use kubernetes services internally


2017-06-23T20:08:47.881115281Z [main] 2017/06/23 20:08:47 Cannot initialize Kubernetes connection: Get https://10.100.200.1:443/api: x509: certificate is valid for 10.244.243.3, not 10.100.200.1

Verify that several Docker packages (e.g. data services) work on a Kubo cluster

Support sync multiple ports to routing api per services

The route-sync only sync the first exported port of the k8s service, either tcp-route-sync or http-route-sync, but if the k8s service has two exported ports, only the first port synced to the CF routing service. It'll be more reasonable to sync all exported ports.
BTW, is there any channel to keep track of the project status and collaborate with you guys ?

Please configure GITBOT

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

[Route Sync] Handling of HTTP route collisions

Route sync broadcasts HTTP routes to the GoRouter via NATS (for k8s services tagged http-route-sync). This bypasses Cloud Controller, causing potential for collisions.

The PoC was implemented this way. Need to address for production readiness.

Ability to deploy KuBOSH on AWS

Please configure GITBOT

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

Remove syslog-forwarding-setup job from the release

The syslog-forwarding-setup does not provide any additional functionality compared to syslog-release and should be removed.

Verification of HA Failover Scenarios (Single AZ) for Master VMs

Support for persistent volumes on AWS

Upgrade to Kubernetes 1.6.6

This issue will be used to track steps required and automate it in future.

[#146530055]

As an OSS developer, I can deploy Ingress Controller to create routes for my K8s applications

Rationale:
-Some IAAS (e.g. Openstack) doesn't have concept of LB. Therefore an alternative way to expose app routes is needed.

An Ingress controller can be deployed within K8 cluster, allowing user to define ingress routes to/from their services.
https://kubernetes.io/docs/concepts/services-networking/ingress/

In environments other than GCE/GKE, you need to deploy a controller as a pod.
https://github.com/kubernetes/ingress/tree/master/controllers

An Ingress Controller is a daemon, deployed as a Kubernetes Pod, that watches the apiserver's /ingresses endpoint for updates to the Ingress resource. Its job is to satisfy requests for ingress.

Creation of a standalone, BOSH-managed, k8s cluster with no dependency on Cloud Foundry

Bump to Kubernetes core 1.6.6

test

[Route Sync] Cleanup of deleted TCP routes / services

Once a k8s service tagged tcp-route-sync is deleted, route sync will stop propagating the route to the TCP router. However, the CF route/domain record is not automatically removed.

The PoC was implemented this way. Need to address this for production readiness.

Support for persistent volumes on OpenStack

As an OSS developer, I can include an existing etcd v3 bosh release as part of my kubo-release

Validate that Service deployment “type:LoadBalancer” works as intended on GCP

[Route Sync] K8s services with multiple node ports will be routed via the same port/router

kubo-release/src/route-sync/kubernetes/source.go

Lines 40 to 53 in 70b246a

 for _, port := range service.Spec.Ports { 

 if !isValidPort(port) { 

 continue 

 } 

 portLabel, _ := strconv.Atoi(service.ObjectMeta.Labels["tcp-route-sync"]) 

 if portLabel == 0 { 

 continue 

 } 

 frontendPort := route.Port(portLabel) 

 nodePort := route.Port(port.NodePort) 

 backends := getBackends(ips, nodePort) 

 tcp := &route.TCP{Frontend: frontendPort, Backends: backends} 

 routes = append(routes, tcp) 

 }

Kubo 0.7.0: missing file in kubo-deployment/manifests/ops-files/ - k8s_master_static_ip_vsphere.yml

In the last pull of kubo-deployment (git clone https://github.com/cloudfoundry-incubator/kubo-deployment), the following file is missing in /root/kubo-deployment/manifests/ops-files/:
k8s_master_static_ip_vsphere.yml

content of the file:

type: replace

path: /networks/type=manual/subnets/0/static

value:
- ((kubernetes_master_host))

result is without this while, K8s cluster deployment fails (when a static IP for Master node is specified).

Can you push the file in the repository?
thanks

test2

Consider switching the authorization mode from ABAC to RBAC

The kubernetes-apiserver control script harcodes ABAC as the kubernetes authorization mode. ABAC is difficult to manage as the API server must be restarted in order to apply any change to the policy file. OTOH, RBAC permission policies are configured using kubectl or the Kubernetes API directly, without the need to modify the manifest (and to restart the api server). ABAC, also, is starting to be considered legacy on versions > 1.6.

Can you please consider switching the authorization mode from ABAC to RBAC? Or at least, make it configurable so those of use who want to use RBAC can enable it via manifest?

Timeout mounting nfs volume into containers

Hi:

I've deployed kubo-release to AWS successfully following this guild, and It's working fine for simple pods. Hurray!

However, When I'm trying to follow the NFS example from kubernetes here,
I can't mount a shared NFS volume into two containers.

The only one thing I changed in the example to make it simple for AWS is to replace the volumes in nfs-server-rc.yaml to directly use an EBS volume I've already created.

-            name: mypvc
+            name: ebs
       volumes:
-        - name: mypvc
-          persistentVolumeClaim:
-            claimName: nfs-pv-provisioning-demo
+        - name: ebs
+          awsElasticBlockStore:
+            volumeID: vol-0a6ed6179d.....
+            fsType: ext4

After creating the busybox-rc, one of the containers started up successfully while the other one failed after a while with errors like below:

Unable to mount volumes for pod "nfs-busybox-pp6nm_default(55ef084c-a499-11e7-b79a-02d7b3763e4e)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-pp6nm". list of unattached/unmounted volumes=[nfs]

The successful container seems working as expected with the nfs volume mounted as expected if I ssh into it with kubectl exec -it nfs-busybox-n0mlk sh and run mount afterwards.

10.100.200.124:/ on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.20.2.10,local_lock=none,addr=10.100.200.124)

I go to the master and the worker vms and find some errors as below might look interesting:

master node:

E0928 18:35:41.409327    7632 routecontroller.go:96] Couldn't reconcile node routes: error listing routes: unable to find route table for AWS cluster: kubernetes

worker node:

E0928 22:11:11.296393    8278 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/nfs/55ef084c-a499-11e7-b79a-02d7b3763e4e-nfs\" (\"55ef084c-a499-11e7-b79a-02d7b3763e4e\")" failed. No retries permitted until 2017-09-28 22:11:11.796361023 +0000 UTC (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "nfs" (UniqueName: "kubernetes.io/nfs/55ef084c-a499-11e7-b79a-02d7b3763e4e-nfs") pod "nfs-busybox-pp6nm" (UID: "55ef084c-a499-11e7-b79a-02d7b3763e4e") : mount failed: exit status 32
Mounting command: mount
Mounting arguments: 10.100.200.124:/ /var/lib/kubelet/pods/55ef084c-a499-11e7-b79a-02d7b3763e4e/volumes/kubernetes.io~nfs/nfs nfs []
Output: mount.nfs: Connection timed out

I'm not 100% sure the errors in the master node is linked to this error because it seems the errors message has been there since the cluster was created.

Any thoughts? Thanks!

Open unauthenticated/unauthorized access to kubernetes cluster

Kubo deploys kubernetes-dashboard as a part of kubernetes cluster deployment. The dashboard is exposed via NodePort type of service and the dashboard itself does not require authentication or authorization. kubernetes-dashboard nodePort is always the same when kube cluster is deployed with Kubo ODB (in my case, the port is always 31000). So, the service owner and anyone else can access the dashboard by going to http://<node-ip>:31000. The dashboard allows create, remove or modify applications to unauthenticated users, with is a major security issue.

reboot of worker node hosting kube-dns results in kube-dns CrashLoopBackOff mode

kube-dns running on worker-node-1

procedure:
bosh ssh to worker-node-1
then 'sudo su'
then 'init 6'

result:
kube-dns will end up in CrashLoopBackOff mode:

$ kubectl get pod -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
heapster-1569517067-d4jhg 1/1 Running 1 1h 10.200.5.4 10.40.207.98
kube-dns-3329716278-kzzn4 1/3 CrashLoopBackOff 26 32m 10.200.5.5 10.40.207.98
kubernetes-dashboard-1367211859-cq6rp 1/1 Running 0 32m 10.200.10.3 10.40.207.100
monitoring-influxdb-564852376-rmgr9 1/1 Running 0 1h 10.200.10.2 10.40.207.100

bump to kubernetes core 1.6.6

test

Please configure GITBOT

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

Kubo 0.7.0 (vSphere): all K8s cluster shut down and then powered up: etcd nodes are in failing state

I make this very simple test:

-deploy K8s cluster: OK
-bosh instances: OK
Instance Process State AZ IPs
etcd/5eb70526-4522-44c6-8ceb-96ce1ac70e1a running z1 10.40.207.94
etcd/ed79acab-7a95-4285-8c04-2eda607af558 running z1 10.40.207.93
etcd/f96ef4cf-042d-4d94-bcdb-75d823d8203d running z1 10.40.207.95
master-haproxy/bd9f1ff6-356e-47bd-a7ef-289085d40583 running z1 10.40.207.92
master/47962537-e72c-4141-863b-3135d20a56bb running z1 10.40.207.96
master/7fbbbc8d-d7a0-4fda-9deb-4980adc9709e running z1 10.40.207.97
worker-haproxy/5cab4451-b93a-47a2-b2ee-663ea925dd67 running z1 10.40.207.101
worker/1b99456d-c198-4279-9cf9-6893509d248c running z1 10.40.207.99
worker/3ace8643-2d99-4264-927a-1f0eaeb03743 running z1 10.40.207.100
worker/455ddbef-d0e2-40e6-b26d-c1a23af383a6 running z1 10.40.207.98

-on vCenter, I shut down all those VM
-on vCenter, restart all those VM

now I get this state:

Instance Process State AZ IPs
etcd/5eb70526-4522-44c6-8ceb-96ce1ac70e1a failing z1 10.40.207.94
etcd/ed79acab-7a95-4285-8c04-2eda607af558 failing z1 10.40.207.93
etcd/f96ef4cf-042d-4d94-bcdb-75d823d8203d failing z1 10.40.207.95
master-haproxy/bd9f1ff6-356e-47bd-a7ef-289085d40583 running z1 10.40.207.92
master/47962537-e72c-4141-863b-3135d20a56bb running z1 10.40.207.96
master/7fbbbc8d-d7a0-4fda-9deb-4980adc9709e running z1 10.40.207.97
worker-haproxy/5cab4451-b93a-47a2-b2ee-663ea925dd67 running z1 10.40.207.101
worker/1b99456d-c198-4279-9cf9-6893509d248c running z1 10.40.207.99
worker/3ace8643-2d99-4264-927a-1f0eaeb03743 running z1 10.40.207.100
worker/455ddbef-d0e2-40e6-b26d-c1a23af383a6 running z1 10.40.207.98

=> all etcd nodes are in failing state.

ideally, all the nodes should be restarted correctly.

(the workaround to solve this issue is to bosh restart one of the etcd node)

post-deploy check for kube-system-specs overly restrictive

The post-deploy script in question checks that there are exactly 6 running pods in the kube-system namespace. It's natural to want to add other things to that namespace, such as the registry addon, which is required for using things like Azure draft.

Verification of HA Failover Scenarios (Single AZ) for Worker VMs

vCenter password with special characters locks down my vCenter user

I deployed this release using kubo-deployment on vSphere and the deployment failed when updating the worker/0 vm with this error message: Error: Unknown CPI error 'Unknown' with message 'Cannot complete login due to an incorrect user name or password.' in 'create_disk' CPI method. Redeploying again (several times) resulted in the same error. Weird that bosh was able to create vms on a previous step but failed at this point.

I logged into my vCenter console and saw a bunch of login errors. But when I used the same user/pwd combination used by the bosh properties I was able to login without any problems.

When I looked into the kubernetes-controller-manager job logs I saw this error multiple times: F0820 18:41:39.739969 7530 controllermanager.go:176] error building controller context: cloud provider could not be initialized: could not init cloud provider "vsphere": ServerFaultCode: Cannot complete login due to an incorrect user name or password.

After some debugging I discovered that the vsphere cloud provider does NOT enclose the password inside double quotes, and my vCenter password contains special characters (the ; sign). So what it is happening is that the controller-manager is NOT using the full password (only the characters until the ; sign), and as a result, it locks down my user periodically (each time monit restart the process).

Support for Kubernetes Cloud Provider Package for IAAS-managed Volume Services (GCP)

Support for Kubo deployment with 0 etcd instances and external etcd configuration

An operator may wish to use a separate etcd cluster that is not provisioned by Kubo.

For this to happen we need to be able to:

Set the number of etcd instances to 0
Supply a list of etcd servers, e.g. a comma-separated list of http://ip:port

Issue submitted as requested via comment on proposal doc: https://docs.google.com/document/d/1ZOFD5nBQC_vh9CmKHOGT7ugtNaJQ1t03jkLVsyDOH6k/edit?usp=sharing

Fix mount path to point to some generic location instead of BOSH specific /var/vcap

Mount path points to location inside container, we don't need to point it to /var/vcap/...

See https://github.com/pivotal-cf-experimental/kubo-release/blob/1b154b7d4bdbe6124d2e2b20cfc90107ba889fcf/jobs/kubernetes-system-specs/templates/config/kubedns-rc.yml#L103

Validate that Service deployment “type:LoadBalancer” works as intended on VSphere

Bump to Kubernetes core 1.7.0

Ability add more node labels to tag certain nodes

[Route Sync] Parsing invalid CF port leads to undefined behaviour

See

kubo-release/src/route-sync/kubernetes/source.go

Lines 44 to 52 in 70b246a

 portLabel, _ := strconv.Atoi(service.ObjectMeta.Labels["tcp-route-sync"]) 

 if portLabel == 0 { 

 continue 

 } 

 frontendPort := route.Port(portLabel) 

 nodePort := route.Port(port.NodePort) 

 backends := getBackends(ips, nodePort) 

 tcp := &route.TCP{Frontend: frontendPort, Backends: backends} 

 routes = append(routes, tcp)

bosh -v
version 2.0.1-74fad57-2017-02-15T20:16:56Z

bosh: BOSH 261

Before

Using environment '192.168.1.252' as '?'

Task 3019. Done

Deployment 'k8s-cn-bj'

Instance                                     Process State  AZ  IPs           VM CID                                VM Type  Uptime  Load              CPU    CPU   CPU   CPU   Memory        Swap      System      Ephemeral   Persistent
                                                                                                                                     (1m, 5m, 15m)     Total  User  Sys   Wait  Usage         Usage     Disk Usage  Disk Usage  Disk Usage
etcd/5799fa5b-f068-4201-8cc8-a17596e0daa9    running        z1  192.168.1.66  146f3bbb-9b3f-4189-a707-01486408eef7  common   -       0.14, 0.07, 0.01  -      0.4%  0.3%  1.0%  5% (198 MB)   0% (0 B)  40% (32i%)  1% (0i%)    6% (0i%)
master/6de69941-ffe7-4de0-87a6-2ecd13182ea1  running        z1  192.168.1.63  c49c7cdc-d353-48d1-bb56-c8967c393726  common   -       0.00, 0.00, 0.00  -      1.6%  0.4%  0.0%  9% (354 MB)   0% (0 B)  40% (32i%)  3% (0i%)    40% (32i%)
master/a8209093-a2a4-4499-927c-acf4a433305b  running        z1  192.168.1.69  593f4973-2698-4d7f-9d2f-ebf9e2d84711  common   -       0.03, 0.04, 0.04  -      1.1%  0.5%  0.0%  10% (385 MB)  0% (0 B)  40% (32i%)  3% (0i%)    40% (32i%)
proxy/2263ec27-13d8-483b-965a-a56334adc19c   running        z1  192.168.1.68  90f2715d-7a7f-4694-ae0f-20ed041b4fc7  common   -       0.20, 0.34, 0.40  -      2.1%  9.9%  0.0%  4% (166 MB)   0% (0 B)  40% (32i%)  4% (0i%)    40% (32i%)
worker/3b7b66fd-91c0-4b60-ac53-789dfc918b48  running        z1  192.168.1.67  d9a5beb7-ba71-4242-8ea1-c563a9980023  worker   -       0.15, 0.11, 0.14  -      1.0%  0.7%  0.0%  14% (2.2 GB)  0% (0 B)  43% (33i%)  1% (0i%)    75% (90i%)
worker/59af0e9c-90dd-44ae-87e2-0c2b57f6e059  running        z1  192.168.1.64  20bf88a1-4f2b-4ec7-af4f-bd897ed700db  worker   -       0.02, 0.23, 0.26  -      1.0%  0.8%  0.0%  18% (3.0 GB)  0% (0 B)  43% (33i%)  1% (0i%)    80% (81i%)
worker/62817668-f2f8-42e7-96b3-9a868550ebd9  running        z1  192.168.1.71  02623c1f-aaa5-4dc6-824d-f28790ed2896  worker   -       0.02, 0.12, 0.14  -      0.6%  0.7%  0.0%  8% (1.3 GB)   0% (0 B)  99% (32i%)  1% (0i%)    70% (95i%)
worker/8d112de4-fe1a-41c3-b495-a0d6983623a0  running        z1  192.168.1.62  f5dd72af-94dd-45c3-9386-54f29f317d01  worker   -       0.16, 0.15, 0.15  -      1.1%  1.0%  0.0%  9% (1.5 GB)   0% (0 B)  43% (33i%)  1% (0i%)    93% (30i%)

8 vms

Succeeded

After

Using environment '192.168.1.252' as '?'

Task 3037. Done

Deployment 'k8s-cn-bj'

Instance                                     Process State  AZ  IPs           VM CID                                VM Type
etcd/5799fa5b-f068-4201-8cc8-a17596e0daa9    running        z1  192.168.1.66  146f3bbb-9b3f-4189-a707-01486408eef7  common
master/6de69941-ffe7-4de0-87a6-2ecd13182ea1  running        z1  192.168.1.63  c49c7cdc-d353-48d1-bb56-c8967c393726  common
master/a8209093-a2a4-4499-927c-acf4a433305b  running        z1  192.168.1.69  593f4973-2698-4d7f-9d2f-ebf9e2d84711  common
proxy/2263ec27-13d8-483b-965a-a56334adc19c   running        z1  192.168.1.68  90f2715d-7a7f-4694-ae0f-20ed041b4fc7  common
worker/3b7b66fd-91c0-4b60-ac53-789dfc918b48  running        z1  192.168.1.67  d9a5beb7-ba71-4242-8ea1-c563a9980023  worker
worker/59af0e9c-90dd-44ae-87e2-0c2b57f6e059  running        z1  192.168.1.64  20bf88a1-4f2b-4ec7-af4f-bd897ed700db  worker
worker/62817668-f2f8-42e7-96b3-9a868550ebd9  running        z1  192.168.1.72  51461431-480c-447c-b668-39c2d16c4d21  worker
worker/8d112de4-fe1a-41c3-b495-a0d6983623a0  running        z1  192.168.1.62  f5dd72af-94dd-45c3-9386-54f29f317d01  worker

8 vms

Succeeded

→ kubectl get nodes -o wide
NAME           STATUS     AGE       VERSION   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION
192.168.1.62   Ready      27d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.64   Ready      27d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.67   Ready      27d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.71   NotReady   10d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.72   Ready      1h        v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic

besides route-sync kept syncing the missing worker to the routing service

	for _, port := range service.Spec.Ports {
	if !isValidPort(port) {
	continue
	}
	portLabel, _ := strconv.Atoi(service.ObjectMeta.Labels["tcp-route-sync"])
	if portLabel == 0 {
	continue
	}
	frontendPort := route.Port(portLabel)
	nodePort := route.Port(port.NodePort)
	backends := getBackends(ips, nodePort)
	tcp := &route.TCP{Frontend: frontendPort, Backends: backends}
	routes = append(routes, tcp)
	}

cloudfoundry-incubator / kubo-release Goto Github PK

kubo-release's People

Contributors

Stargazers

Watchers

Forkers

kubo-release's Issues

Recommend Projects

Recommend Topics

Recommend Org