cloudfoundry-incubator / kubo-release Goto Github PK
View Code? Open in Web Editor NEWKubernetes BOSH release
Home Page: https://www.cloudfoundry.org/container-runtime/
License: Apache License 2.0
Kubernetes BOSH release
Home Page: https://www.cloudfoundry.org/container-runtime/
License: Apache License 2.0
Rationale - We already test proxy as part of our CI on GCP. We should do same on Vsphere.
Acceptance Criteria
When deploy kubo-release 0.7.0 on bosh-lite the master nodes are not able to start (more specifically the kubernetes-controller-manager job).
Logs below:
==> /var/vcap/sys/log/kubernetes-controller-manager/kubernetes_controller_manager_ctl.stderr.log <==
+ declare pid=5750
+ ps -p 5750
+ __log 'Removing stale pidfile'
+ echo 'Removing stale pidfile'
+ rm /var/vcap/sys/run/kubernetes/kubernetes_controller_manager.pid
+ echo 5760
+ start_kubernetes_controller_manager
+ '[' -f /sys/class/dmi/id/product_serial ']'
+ chmod a+r /sys/class/dmi/id/product_serial
chmod: changing permissions of '/sys/class/dmi/id/product_serial': Read-only file system
==> /var/vcap/sys/log/kubernetes-controller-manager/kubernetes_controller_manager_ctl.stdout.log <==
------------ STARTING kubernetes_controller_manager_ctl at Tue Sep 12 07:32:00 UTC 2017 --------------
Removing stale pidfile
I was able to work around the issue by, during a bosh deploy, removing the following line from the control script (by using bosh ssh and vi).
Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml
in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.
Steps:
config-production.yml
fileIf there are any questions, please reach out to [email protected].
Our kubernetes-system-specs
job is now achieving two things:
1 - Installing kubedns
which is crucial for pod-2-pod communication
2 - Installing the kubernetes dashboard UI, which can be optional
Let's split those into separate jobs, making the dashboard optional, effectively deploying it via a bosh errand.
The service cluster ip is 10.100.200.1 and ssl used is same one for api. It looks like we need seperate ssl certs for service cluster ip for kubernetes
$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.100.200.1 <none> 443/TCP 57m
It causes issues for things that use kubernetes services internally
2017-06-23T20:08:47.881115281Z [main] 2017/06/23 20:08:47 Cannot initialize Kubernetes connection: Get https://10.100.200.1:443/api: x509: certificate is valid for 10.244.243.3, not 10.100.200.1
The route-sync only sync the first exported port of the k8s service, either tcp-route-sync or http-route-sync, but if the k8s service has two exported ports, only the first port synced to the CF routing service. It'll be more reasonable to sync all exported ports.
BTW, is there any channel to keep track of the project status and collaborate with you guys ?
Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml
in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.
Steps:
config-production.yml
fileIf there are any questions, please reach out to [email protected].
Route sync broadcasts HTTP routes to the GoRouter via NATS (for k8s services tagged http-route-sync
). This bypasses Cloud Controller, causing potential for collisions.
The PoC was implemented this way. Need to address for production readiness.
Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml
in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.
Steps:
config-production.yml
fileIf there are any questions, please reach out to [email protected].
The syslog-forwarding-setup
does not provide any additional functionality compared to syslog-release
and should be removed.
This issue will be used to track steps required and automate it in future.
[#146530055]
Rationale:
-Some IAAS (e.g. Openstack) doesn't have concept of LB. Therefore an alternative way to expose app routes is needed.
In environments other than GCE/GKE, you need to deploy a controller as a pod.
https://github.com/kubernetes/ingress/tree/master/controllers
An Ingress Controller is a daemon, deployed as a Kubernetes Pod, that watches the apiserver's /ingresses endpoint for updates to the Ingress resource. Its job is to satisfy requests for ingress.
Once a k8s service tagged tcp-route-sync
is deleted, route sync will stop propagating the route to the TCP router. However, the CF route/domain record is not automatically removed.
The PoC was implemented this way. Need to address this for production readiness.
kubo-release/src/route-sync/kubernetes/source.go
Lines 40 to 53 in 70b246a
In the last pull of kubo-deployment (git clone https://github.com/cloudfoundry-incubator/kubo-deployment), the following file is missing in /root/kubo-deployment/manifests/ops-files/:
k8s_master_static_ip_vsphere.yml
content of the file:
type: replace
path: /networks/type=manual/subnets/0/static
value:
result is without this while, K8s cluster deployment fails (when a static IP for Master node is specified).
Can you push the file in the repository?
thanks
The kubernetes-apiserver control script harcodes ABAC as the kubernetes authorization mode. ABAC is difficult to manage as the API server must be restarted in order to apply any change to the policy file. OTOH, RBAC permission policies are configured using kubectl or the Kubernetes API directly, without the need to modify the manifest (and to restart the api server). ABAC, also, is starting to be considered legacy on versions > 1.6.
Can you please consider switching the authorization mode from ABAC to RBAC? Or at least, make it configurable so those of use who want to use RBAC can enable it via manifest?
Hi:
I've deployed kubo-release to AWS successfully following this guild, and It's working fine for simple pods. Hurray!
However, When I'm trying to follow the NFS example from kubernetes here,
I can't mount a shared NFS volume into two containers.
The only one thing I changed in the example to make it simple for AWS is to replace the volumes in nfs-server-rc.yaml to directly use an EBS volume I've already created.
- name: mypvc
+ name: ebs
volumes:
- - name: mypvc
- persistentVolumeClaim:
- claimName: nfs-pv-provisioning-demo
+ - name: ebs
+ awsElasticBlockStore:
+ volumeID: vol-0a6ed6179d.....
+ fsType: ext4
After creating the busybox-rc, one of the containers started up successfully while the other one failed after a while with errors like below:
Unable to mount volumes for pod "nfs-busybox-pp6nm_default(55ef084c-a499-11e7-b79a-02d7b3763e4e)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-pp6nm". list of unattached/unmounted volumes=[nfs]
The successful container seems working as expected with the nfs volume mounted as expected if I ssh into it with kubectl exec -it nfs-busybox-n0mlk sh
and run mount afterwards.
10.100.200.124:/ on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.20.2.10,local_lock=none,addr=10.100.200.124)
I go to the master and the worker vms and find some errors as below might look interesting:
master node:
E0928 18:35:41.409327 7632 routecontroller.go:96] Couldn't reconcile node routes: error listing routes: unable to find route table for AWS cluster: kubernetes
worker node:
E0928 22:11:11.296393 8278 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/nfs/55ef084c-a499-11e7-b79a-02d7b3763e4e-nfs\" (\"55ef084c-a499-11e7-b79a-02d7b3763e4e\")" failed. No retries permitted until 2017-09-28 22:11:11.796361023 +0000 UTC (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "nfs" (UniqueName: "kubernetes.io/nfs/55ef084c-a499-11e7-b79a-02d7b3763e4e-nfs") pod "nfs-busybox-pp6nm" (UID: "55ef084c-a499-11e7-b79a-02d7b3763e4e") : mount failed: exit status 32
Mounting command: mount
Mounting arguments: 10.100.200.124:/ /var/lib/kubelet/pods/55ef084c-a499-11e7-b79a-02d7b3763e4e/volumes/kubernetes.io~nfs/nfs nfs []
Output: mount.nfs: Connection timed out
I'm not 100% sure the errors in the master node is linked to this error because it seems the errors message has been there since the cluster was created.
Any thoughts? Thanks!
Kubo deploys kubernetes-dashboard
as a part of kubernetes cluster deployment. The dashboard is exposed via NodePort
type of service and the dashboard itself does not require authentication or authorization. kubernetes-dashboard
nodePort
is always the same when kube cluster is deployed with Kubo ODB (in my case, the port is always 31000). So, the service owner and anyone else can access the dashboard by going to http://<node-ip>:31000
. The dashboard allows create, remove or modify applications to unauthenticated users, with is a major security issue.
kube-dns running on worker-node-1
procedure:
bosh ssh to worker-node-1
then 'sudo su'
then 'init 6'
result:
kube-dns will end up in CrashLoopBackOff mode:
$ kubectl get pod -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
heapster-1569517067-d4jhg 1/1 Running 1 1h 10.200.5.4 10.40.207.98
kube-dns-3329716278-kzzn4 1/3 CrashLoopBackOff 26 32m 10.200.5.5 10.40.207.98
kubernetes-dashboard-1367211859-cq6rp 1/1 Running 0 32m 10.200.10.3 10.40.207.100
monitoring-influxdb-564852376-rmgr9 1/1 Running 0 1h 10.200.10.2 10.40.207.100
Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml
in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.
Steps:
config-production.yml
fileIf there are any questions, please reach out to [email protected].
I make this very simple test:
-deploy K8s cluster: OK
-bosh instances: OK
Instance Process State AZ IPs
etcd/5eb70526-4522-44c6-8ceb-96ce1ac70e1a running z1 10.40.207.94
etcd/ed79acab-7a95-4285-8c04-2eda607af558 running z1 10.40.207.93
etcd/f96ef4cf-042d-4d94-bcdb-75d823d8203d running z1 10.40.207.95
master-haproxy/bd9f1ff6-356e-47bd-a7ef-289085d40583 running z1 10.40.207.92
master/47962537-e72c-4141-863b-3135d20a56bb running z1 10.40.207.96
master/7fbbbc8d-d7a0-4fda-9deb-4980adc9709e running z1 10.40.207.97
worker-haproxy/5cab4451-b93a-47a2-b2ee-663ea925dd67 running z1 10.40.207.101
worker/1b99456d-c198-4279-9cf9-6893509d248c running z1 10.40.207.99
worker/3ace8643-2d99-4264-927a-1f0eaeb03743 running z1 10.40.207.100
worker/455ddbef-d0e2-40e6-b26d-c1a23af383a6 running z1 10.40.207.98
-on vCenter, I shut down all those VM
-on vCenter, restart all those VM
Instance Process State AZ IPs
etcd/5eb70526-4522-44c6-8ceb-96ce1ac70e1a failing z1 10.40.207.94
etcd/ed79acab-7a95-4285-8c04-2eda607af558 failing z1 10.40.207.93
etcd/f96ef4cf-042d-4d94-bcdb-75d823d8203d failing z1 10.40.207.95
master-haproxy/bd9f1ff6-356e-47bd-a7ef-289085d40583 running z1 10.40.207.92
master/47962537-e72c-4141-863b-3135d20a56bb running z1 10.40.207.96
master/7fbbbc8d-d7a0-4fda-9deb-4980adc9709e running z1 10.40.207.97
worker-haproxy/5cab4451-b93a-47a2-b2ee-663ea925dd67 running z1 10.40.207.101
worker/1b99456d-c198-4279-9cf9-6893509d248c running z1 10.40.207.99
worker/3ace8643-2d99-4264-927a-1f0eaeb03743 running z1 10.40.207.100
worker/455ddbef-d0e2-40e6-b26d-c1a23af383a6 running z1 10.40.207.98
=> all etcd nodes are in failing state.
ideally, all the nodes should be restarted correctly.
(the workaround to solve this issue is to bosh restart one of the etcd node)
The post-deploy script in question checks that there are exactly 6 running pods in the kube-system
namespace. It's natural to want to add other things to that namespace, such as the registry
addon, which is required for using things like Azure draft.
I deployed this release using kubo-deployment on vSphere and the deployment failed when updating the worker/0
vm with this error message: Error: Unknown CPI error 'Unknown' with message 'Cannot complete login due to an incorrect user name or password.' in 'create_disk' CPI method
. Redeploying again (several times) resulted in the same error. Weird that bosh was able to create vms on a previous step but failed at this point.
I logged into my vCenter console and saw a bunch of login errors. But when I used the same user/pwd combination used by the bosh properties I was able to login without any problems.
When I looked into the kubernetes-controller-manager
job logs I saw this error multiple times: F0820 18:41:39.739969 7530 controllermanager.go:176] error building controller context: cloud provider could not be initialized: could not init cloud provider "vsphere": ServerFaultCode: Cannot complete login due to an incorrect user name or password.
After some debugging I discovered that the vsphere cloud provider does NOT enclose the password inside double quotes, and my vCenter password contains special characters (the ;
sign). So what it is happening is that the controller-manager is NOT using the full password (only the characters until the ;
sign), and as a result, it locks down my user periodically (each time monit
restart the process).
An operator may wish to use a separate etcd cluster that is not provisioned by Kubo.
For this to happen we need to be able to:
Issue submitted as requested via comment on proposal doc: https://docs.google.com/document/d/1ZOFD5nBQC_vh9CmKHOGT7ugtNaJQ1t03jkLVsyDOH6k/edit?usp=sharing
Mount path points to location inside container, we don't need to point it to /var/vcap/...
See
kubo-release/src/route-sync/kubernetes/source.go
Lines 44 to 52 in 70b246a
Is their any intentions in the future to change the name of the release for kubo-etcd to something else? Right now it is named etcd and this may cause conflicts with the etcd-release from which has been forked.
Don't know what triggered the worker node auto recreate, do you ever had the issue?
bosh -v
version 2.0.1-74fad57-2017-02-15T20:16:56Z
bosh: BOSH 261
Before
Using environment '192.168.1.252' as '?'
Task 3019. Done
Deployment 'k8s-cn-bj'
Instance Process State AZ IPs VM CID VM Type Uptime Load CPU CPU CPU CPU Memory Swap System Ephemeral Persistent
(1m, 5m, 15m) Total User Sys Wait Usage Usage Disk Usage Disk Usage Disk Usage
etcd/5799fa5b-f068-4201-8cc8-a17596e0daa9 running z1 192.168.1.66 146f3bbb-9b3f-4189-a707-01486408eef7 common - 0.14, 0.07, 0.01 - 0.4% 0.3% 1.0% 5% (198 MB) 0% (0 B) 40% (32i%) 1% (0i%) 6% (0i%)
master/6de69941-ffe7-4de0-87a6-2ecd13182ea1 running z1 192.168.1.63 c49c7cdc-d353-48d1-bb56-c8967c393726 common - 0.00, 0.00, 0.00 - 1.6% 0.4% 0.0% 9% (354 MB) 0% (0 B) 40% (32i%) 3% (0i%) 40% (32i%)
master/a8209093-a2a4-4499-927c-acf4a433305b running z1 192.168.1.69 593f4973-2698-4d7f-9d2f-ebf9e2d84711 common - 0.03, 0.04, 0.04 - 1.1% 0.5% 0.0% 10% (385 MB) 0% (0 B) 40% (32i%) 3% (0i%) 40% (32i%)
proxy/2263ec27-13d8-483b-965a-a56334adc19c running z1 192.168.1.68 90f2715d-7a7f-4694-ae0f-20ed041b4fc7 common - 0.20, 0.34, 0.40 - 2.1% 9.9% 0.0% 4% (166 MB) 0% (0 B) 40% (32i%) 4% (0i%) 40% (32i%)
worker/3b7b66fd-91c0-4b60-ac53-789dfc918b48 running z1 192.168.1.67 d9a5beb7-ba71-4242-8ea1-c563a9980023 worker - 0.15, 0.11, 0.14 - 1.0% 0.7% 0.0% 14% (2.2 GB) 0% (0 B) 43% (33i%) 1% (0i%) 75% (90i%)
worker/59af0e9c-90dd-44ae-87e2-0c2b57f6e059 running z1 192.168.1.64 20bf88a1-4f2b-4ec7-af4f-bd897ed700db worker - 0.02, 0.23, 0.26 - 1.0% 0.8% 0.0% 18% (3.0 GB) 0% (0 B) 43% (33i%) 1% (0i%) 80% (81i%)
worker/62817668-f2f8-42e7-96b3-9a868550ebd9 running z1 192.168.1.71 02623c1f-aaa5-4dc6-824d-f28790ed2896 worker - 0.02, 0.12, 0.14 - 0.6% 0.7% 0.0% 8% (1.3 GB) 0% (0 B) 99% (32i%) 1% (0i%) 70% (95i%)
worker/8d112de4-fe1a-41c3-b495-a0d6983623a0 running z1 192.168.1.62 f5dd72af-94dd-45c3-9386-54f29f317d01 worker - 0.16, 0.15, 0.15 - 1.1% 1.0% 0.0% 9% (1.5 GB) 0% (0 B) 43% (33i%) 1% (0i%) 93% (30i%)
8 vms
Succeeded
After
Using environment '192.168.1.252' as '?'
Task 3037. Done
Deployment 'k8s-cn-bj'
Instance Process State AZ IPs VM CID VM Type
etcd/5799fa5b-f068-4201-8cc8-a17596e0daa9 running z1 192.168.1.66 146f3bbb-9b3f-4189-a707-01486408eef7 common
master/6de69941-ffe7-4de0-87a6-2ecd13182ea1 running z1 192.168.1.63 c49c7cdc-d353-48d1-bb56-c8967c393726 common
master/a8209093-a2a4-4499-927c-acf4a433305b running z1 192.168.1.69 593f4973-2698-4d7f-9d2f-ebf9e2d84711 common
proxy/2263ec27-13d8-483b-965a-a56334adc19c running z1 192.168.1.68 90f2715d-7a7f-4694-ae0f-20ed041b4fc7 common
worker/3b7b66fd-91c0-4b60-ac53-789dfc918b48 running z1 192.168.1.67 d9a5beb7-ba71-4242-8ea1-c563a9980023 worker
worker/59af0e9c-90dd-44ae-87e2-0c2b57f6e059 running z1 192.168.1.64 20bf88a1-4f2b-4ec7-af4f-bd897ed700db worker
worker/62817668-f2f8-42e7-96b3-9a868550ebd9 running z1 192.168.1.72 51461431-480c-447c-b668-39c2d16c4d21 worker
worker/8d112de4-fe1a-41c3-b495-a0d6983623a0 running z1 192.168.1.62 f5dd72af-94dd-45c3-9386-54f29f317d01 worker
8 vms
Succeeded
→ kubectl get nodes -o wide
NAME STATUS AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION
192.168.1.62 Ready 27d v1.6.1 <none> Ubuntu 14.04.5 LTS 4.4.0-62-generic
192.168.1.64 Ready 27d v1.6.1 <none> Ubuntu 14.04.5 LTS 4.4.0-62-generic
192.168.1.67 Ready 27d v1.6.1 <none> Ubuntu 14.04.5 LTS 4.4.0-62-generic
192.168.1.71 NotReady 10d v1.6.1 <none> Ubuntu 14.04.5 LTS 4.4.0-62-generic
192.168.1.72 Ready 1h v1.6.1 <none> Ubuntu 14.04.5 LTS 4.4.0-62-generic
besides route-sync kept syncing the missing worker to the routing service
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.