Giter Club home page Giter Club logo

kubo-release's People

Contributors

alex-slynko avatar altonf4 avatar andyliuliming avatar bsnchan avatar carlo-colombo avatar cf-london avatar cfcr avatar christianang avatar freynca avatar frodenas avatar iainsproat avatar jaimegag avatar jfmyers9 avatar jhvhs avatar johnsonj avatar karampok avatar manifaust avatar mkjelland avatar mordebites avatar neil-hickey avatar obeyler avatar professor avatar seanos11 avatar semanticallynull avatar shinji62 avatar srm09 avatar svrc avatar swalner-pivotal avatar tenczar avatar tvs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubo-release's Issues

Add proxy testing on CI for vsphere

Rationale - We already test proxy as part of our CI on GCP. We should do same on Vsphere.

Acceptance Criteria

  • Verify that CI pipeline for Vsphere includes Proxy job

kubernetes-controller-manager failing to start when running on bosh-lite

When deploy kubo-release 0.7.0 on bosh-lite the master nodes are not able to start (more specifically the kubernetes-controller-manager job).

Logs below:

==> /var/vcap/sys/log/kubernetes-controller-manager/kubernetes_controller_manager_ctl.stderr.log <==
+ declare pid=5750
+ ps -p 5750
+ __log 'Removing stale pidfile'
+ echo 'Removing stale pidfile'
+ rm /var/vcap/sys/run/kubernetes/kubernetes_controller_manager.pid
+ echo 5760
+ start_kubernetes_controller_manager
+ '[' -f /sys/class/dmi/id/product_serial ']'
+ chmod a+r /sys/class/dmi/id/product_serial
chmod: changing permissions of '/sys/class/dmi/id/product_serial': Read-only file system

==> /var/vcap/sys/log/kubernetes-controller-manager/kubernetes_controller_manager_ctl.stdout.log <==
------------ STARTING kubernetes_controller_manager_ctl at Tue Sep 12 07:32:00 UTC 2017 --------------
Removing stale pidfile

I was able to work around the issue by, during a bosh deploy, removing the following line from the control script (by using bosh ssh and vi).

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

  • Fork this repo: cfgitbot-config
  • Add your project to config-production.yml file
  • Submit a PR

If there are any questions, please reach out to [email protected].

Optionally include the K8s dashboard as part of the cluster deployment

Our kubernetes-system-specs job is now achieving two things:
1 - Installing kubedns which is crucial for pod-2-pod communication
2 - Installing the kubernetes dashboard UI, which can be optional

Let's split those into separate jobs, making the dashboard optional, effectively deploying it via a bosh errand.

SSL for Service Cluster IP

The service cluster ip is 10.100.200.1 and ssl used is same one for api. It looks like we need seperate ssl certs for service cluster ip for kubernetes

$ kubectl get services
NAME         CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.100.200.1   <none>        443/TCP   57m

It causes issues for things that use kubernetes services internally


2017-06-23T20:08:47.881115281Z [main] 2017/06/23 20:08:47 Cannot initialize Kubernetes connection: Get https://10.100.200.1:443/api: x509: certificate is valid for 10.244.243.3, not 10.100.200.1

Support sync multiple ports to routing api per services

The route-sync only sync the first exported port of the k8s service, either tcp-route-sync or http-route-sync, but if the k8s service has two exported ports, only the first port synced to the CF routing service. It'll be more reasonable to sync all exported ports.
BTW, is there any channel to keep track of the project status and collaborate with you guys ?

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

  • Fork this repo: cfgitbot-config
  • Add your project to config-production.yml file
  • Submit a PR

If there are any questions, please reach out to [email protected].

[Route Sync] Handling of HTTP route collisions

Route sync broadcasts HTTP routes to the GoRouter via NATS (for k8s services tagged http-route-sync). This bypasses Cloud Controller, causing potential for collisions.

The PoC was implemented this way. Need to address for production readiness.

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

  • Fork this repo: cfgitbot-config
  • Add your project to config-production.yml file
  • Submit a PR

If there are any questions, please reach out to [email protected].

As an OSS developer, I can deploy Ingress Controller to create routes for my K8s applications

Rationale:
-Some IAAS (e.g. Openstack) doesn't have concept of LB. Therefore an alternative way to expose app routes is needed.

In environments other than GCE/GKE, you need to deploy a controller as a pod.
https://github.com/kubernetes/ingress/tree/master/controllers

An Ingress Controller is a daemon, deployed as a Kubernetes Pod, that watches the apiserver's /ingresses endpoint for updates to the Ingress resource. Its job is to satisfy requests for ingress.

[Route Sync] Cleanup of deleted TCP routes / services

Once a k8s service tagged tcp-route-sync is deleted, route sync will stop propagating the route to the TCP router. However, the CF route/domain record is not automatically removed.

The PoC was implemented this way. Need to address this for production readiness.

[Route Sync] K8s services with multiple node ports will be routed via the same port/router

for _, port := range service.Spec.Ports {
if !isValidPort(port) {
continue
}
portLabel, _ := strconv.Atoi(service.ObjectMeta.Labels["tcp-route-sync"])
if portLabel == 0 {
continue
}
frontendPort := route.Port(portLabel)
nodePort := route.Port(port.NodePort)
backends := getBackends(ips, nodePort)
tcp := &route.TCP{Frontend: frontendPort, Backends: backends}
routes = append(routes, tcp)
}

Kubo 0.7.0: missing file in kubo-deployment/manifests/ops-files/ - k8s_master_static_ip_vsphere.yml

In the last pull of kubo-deployment (git clone https://github.com/cloudfoundry-incubator/kubo-deployment), the following file is missing in /root/kubo-deployment/manifests/ops-files/:
k8s_master_static_ip_vsphere.yml

content of the file:

  • type: replace

    path: /networks/type=manual/subnets/0/static

    value:

    • ((kubernetes_master_host))

result is without this while, K8s cluster deployment fails (when a static IP for Master node is specified).

Can you push the file in the repository?
thanks

Consider switching the authorization mode from ABAC to RBAC

The kubernetes-apiserver control script harcodes ABAC as the kubernetes authorization mode. ABAC is difficult to manage as the API server must be restarted in order to apply any change to the policy file. OTOH, RBAC permission policies are configured using kubectl or the Kubernetes API directly, without the need to modify the manifest (and to restart the api server). ABAC, also, is starting to be considered legacy on versions > 1.6.

Can you please consider switching the authorization mode from ABAC to RBAC? Or at least, make it configurable so those of use who want to use RBAC can enable it via manifest?

Timeout mounting nfs volume into containers

Hi:

I've deployed kubo-release to AWS successfully following this guild, and It's working fine for simple pods. Hurray!

However, When I'm trying to follow the NFS example from kubernetes here,
I can't mount a shared NFS volume into two containers.

The only one thing I changed in the example to make it simple for AWS is to replace the volumes in nfs-server-rc.yaml to directly use an EBS volume I've already created.

-            name: mypvc
+            name: ebs
       volumes:
-        - name: mypvc
-          persistentVolumeClaim:
-            claimName: nfs-pv-provisioning-demo
+        - name: ebs
+          awsElasticBlockStore:
+            volumeID: vol-0a6ed6179d.....
+            fsType: ext4

After creating the busybox-rc, one of the containers started up successfully while the other one failed after a while with errors like below:

Unable to mount volumes for pod "nfs-busybox-pp6nm_default(55ef084c-a499-11e7-b79a-02d7b3763e4e)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-pp6nm". list of unattached/unmounted volumes=[nfs]

The successful container seems working as expected with the nfs volume mounted as expected if I ssh into it with kubectl exec -it nfs-busybox-n0mlk sh and run mount afterwards.

10.100.200.124:/ on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.20.2.10,local_lock=none,addr=10.100.200.124)

I go to the master and the worker vms and find some errors as below might look interesting:

master node:

E0928 18:35:41.409327    7632 routecontroller.go:96] Couldn't reconcile node routes: error listing routes: unable to find route table for AWS cluster: kubernetes

worker node:

E0928 22:11:11.296393    8278 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/nfs/55ef084c-a499-11e7-b79a-02d7b3763e4e-nfs\" (\"55ef084c-a499-11e7-b79a-02d7b3763e4e\")" failed. No retries permitted until 2017-09-28 22:11:11.796361023 +0000 UTC (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "nfs" (UniqueName: "kubernetes.io/nfs/55ef084c-a499-11e7-b79a-02d7b3763e4e-nfs") pod "nfs-busybox-pp6nm" (UID: "55ef084c-a499-11e7-b79a-02d7b3763e4e") : mount failed: exit status 32
Mounting command: mount
Mounting arguments: 10.100.200.124:/ /var/lib/kubelet/pods/55ef084c-a499-11e7-b79a-02d7b3763e4e/volumes/kubernetes.io~nfs/nfs nfs []
Output: mount.nfs: Connection timed out

I'm not 100% sure the errors in the master node is linked to this error because it seems the errors message has been there since the cluster was created.

Any thoughts? Thanks!

Open unauthenticated/unauthorized access to kubernetes cluster

Kubo deploys kubernetes-dashboard as a part of kubernetes cluster deployment. The dashboard is exposed via NodePort type of service and the dashboard itself does not require authentication or authorization. kubernetes-dashboard nodePort is always the same when kube cluster is deployed with Kubo ODB (in my case, the port is always 31000). So, the service owner and anyone else can access the dashboard by going to http://<node-ip>:31000. The dashboard allows create, remove or modify applications to unauthenticated users, with is a major security issue.

reboot of worker node hosting kube-dns results in kube-dns CrashLoopBackOff mode

kube-dns running on worker-node-1

procedure:
bosh ssh to worker-node-1
then 'sudo su'
then 'init 6'

result:
kube-dns will end up in CrashLoopBackOff mode:

$ kubectl get pod -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
heapster-1569517067-d4jhg 1/1 Running 1 1h 10.200.5.4 10.40.207.98
kube-dns-3329716278-kzzn4 1/3 CrashLoopBackOff 26 32m 10.200.5.5 10.40.207.98
kubernetes-dashboard-1367211859-cq6rp 1/1 Running 0 32m 10.200.10.3 10.40.207.100
monitoring-influxdb-564852376-rmgr9 1/1 Running 0 1h 10.200.10.2 10.40.207.100

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

  • Fork this repo: cfgitbot-config
  • Add your project to config-production.yml file
  • Submit a PR

If there are any questions, please reach out to [email protected].

Kubo 0.7.0 (vSphere): all K8s cluster shut down and then powered up: etcd nodes are in failing state

I make this very simple test:

-deploy K8s cluster: OK
-bosh instances: OK
Instance Process State AZ IPs
etcd/5eb70526-4522-44c6-8ceb-96ce1ac70e1a running z1 10.40.207.94
etcd/ed79acab-7a95-4285-8c04-2eda607af558 running z1 10.40.207.93
etcd/f96ef4cf-042d-4d94-bcdb-75d823d8203d running z1 10.40.207.95
master-haproxy/bd9f1ff6-356e-47bd-a7ef-289085d40583 running z1 10.40.207.92
master/47962537-e72c-4141-863b-3135d20a56bb running z1 10.40.207.96
master/7fbbbc8d-d7a0-4fda-9deb-4980adc9709e running z1 10.40.207.97
worker-haproxy/5cab4451-b93a-47a2-b2ee-663ea925dd67 running z1 10.40.207.101
worker/1b99456d-c198-4279-9cf9-6893509d248c running z1 10.40.207.99
worker/3ace8643-2d99-4264-927a-1f0eaeb03743 running z1 10.40.207.100
worker/455ddbef-d0e2-40e6-b26d-c1a23af383a6 running z1 10.40.207.98

-on vCenter, I shut down all those VM
-on vCenter, restart all those VM

  • now I get this state:

Instance Process State AZ IPs
etcd/5eb70526-4522-44c6-8ceb-96ce1ac70e1a failing z1 10.40.207.94
etcd/ed79acab-7a95-4285-8c04-2eda607af558 failing z1 10.40.207.93
etcd/f96ef4cf-042d-4d94-bcdb-75d823d8203d failing z1 10.40.207.95
master-haproxy/bd9f1ff6-356e-47bd-a7ef-289085d40583 running z1 10.40.207.92
master/47962537-e72c-4141-863b-3135d20a56bb running z1 10.40.207.96
master/7fbbbc8d-d7a0-4fda-9deb-4980adc9709e running z1 10.40.207.97
worker-haproxy/5cab4451-b93a-47a2-b2ee-663ea925dd67 running z1 10.40.207.101
worker/1b99456d-c198-4279-9cf9-6893509d248c running z1 10.40.207.99
worker/3ace8643-2d99-4264-927a-1f0eaeb03743 running z1 10.40.207.100
worker/455ddbef-d0e2-40e6-b26d-c1a23af383a6 running z1 10.40.207.98

=> all etcd nodes are in failing state.

ideally, all the nodes should be restarted correctly.

(the workaround to solve this issue is to bosh restart one of the etcd node)

vCenter password with special characters locks down my vCenter user

I deployed this release using kubo-deployment on vSphere and the deployment failed when updating the worker/0 vm with this error message: Error: Unknown CPI error 'Unknown' with message 'Cannot complete login due to an incorrect user name or password.' in 'create_disk' CPI method. Redeploying again (several times) resulted in the same error. Weird that bosh was able to create vms on a previous step but failed at this point.

I logged into my vCenter console and saw a bunch of login errors. But when I used the same user/pwd combination used by the bosh properties I was able to login without any problems.

When I looked into the kubernetes-controller-manager job logs I saw this error multiple times: F0820 18:41:39.739969 7530 controllermanager.go:176] error building controller context: cloud provider could not be initialized: could not init cloud provider "vsphere": ServerFaultCode: Cannot complete login due to an incorrect user name or password.

After some debugging I discovered that the vsphere cloud provider does NOT enclose the password inside double quotes, and my vCenter password contains special characters (the ; sign). So what it is happening is that the controller-manager is NOT using the full password (only the characters until the ; sign), and as a result, it locks down my user periodically (each time monit restart the process).

kubo-etcd release name same as the etcd-release

Is their any intentions in the future to change the name of the release for kubo-etcd to something else? Right now it is named etcd and this may cause conflicts with the etcd-release from which has been forked.

worker node auto recreate

Don't know what triggered the worker node auto recreate, do you ever had the issue?

bosh -v
version 2.0.1-74fad57-2017-02-15T20:16:56Z

bosh: BOSH 261

Before

Using environment '192.168.1.252' as '?'

Task 3019. Done

Deployment 'k8s-cn-bj'

Instance                                     Process State  AZ  IPs           VM CID                                VM Type  Uptime  Load              CPU    CPU   CPU   CPU   Memory        Swap      System      Ephemeral   Persistent
                                                                                                                                     (1m, 5m, 15m)     Total  User  Sys   Wait  Usage         Usage     Disk Usage  Disk Usage  Disk Usage
etcd/5799fa5b-f068-4201-8cc8-a17596e0daa9    running        z1  192.168.1.66  146f3bbb-9b3f-4189-a707-01486408eef7  common   -       0.14, 0.07, 0.01  -      0.4%  0.3%  1.0%  5% (198 MB)   0% (0 B)  40% (32i%)  1% (0i%)    6% (0i%)
master/6de69941-ffe7-4de0-87a6-2ecd13182ea1  running        z1  192.168.1.63  c49c7cdc-d353-48d1-bb56-c8967c393726  common   -       0.00, 0.00, 0.00  -      1.6%  0.4%  0.0%  9% (354 MB)   0% (0 B)  40% (32i%)  3% (0i%)    40% (32i%)
master/a8209093-a2a4-4499-927c-acf4a433305b  running        z1  192.168.1.69  593f4973-2698-4d7f-9d2f-ebf9e2d84711  common   -       0.03, 0.04, 0.04  -      1.1%  0.5%  0.0%  10% (385 MB)  0% (0 B)  40% (32i%)  3% (0i%)    40% (32i%)
proxy/2263ec27-13d8-483b-965a-a56334adc19c   running        z1  192.168.1.68  90f2715d-7a7f-4694-ae0f-20ed041b4fc7  common   -       0.20, 0.34, 0.40  -      2.1%  9.9%  0.0%  4% (166 MB)   0% (0 B)  40% (32i%)  4% (0i%)    40% (32i%)
worker/3b7b66fd-91c0-4b60-ac53-789dfc918b48  running        z1  192.168.1.67  d9a5beb7-ba71-4242-8ea1-c563a9980023  worker   -       0.15, 0.11, 0.14  -      1.0%  0.7%  0.0%  14% (2.2 GB)  0% (0 B)  43% (33i%)  1% (0i%)    75% (90i%)
worker/59af0e9c-90dd-44ae-87e2-0c2b57f6e059  running        z1  192.168.1.64  20bf88a1-4f2b-4ec7-af4f-bd897ed700db  worker   -       0.02, 0.23, 0.26  -      1.0%  0.8%  0.0%  18% (3.0 GB)  0% (0 B)  43% (33i%)  1% (0i%)    80% (81i%)
worker/62817668-f2f8-42e7-96b3-9a868550ebd9  running        z1  192.168.1.71  02623c1f-aaa5-4dc6-824d-f28790ed2896  worker   -       0.02, 0.12, 0.14  -      0.6%  0.7%  0.0%  8% (1.3 GB)   0% (0 B)  99% (32i%)  1% (0i%)    70% (95i%)
worker/8d112de4-fe1a-41c3-b495-a0d6983623a0  running        z1  192.168.1.62  f5dd72af-94dd-45c3-9386-54f29f317d01  worker   -       0.16, 0.15, 0.15  -      1.1%  1.0%  0.0%  9% (1.5 GB)   0% (0 B)  43% (33i%)  1% (0i%)    93% (30i%)

8 vms

Succeeded

After

Using environment '192.168.1.252' as '?'

Task 3037. Done

Deployment 'k8s-cn-bj'

Instance                                     Process State  AZ  IPs           VM CID                                VM Type
etcd/5799fa5b-f068-4201-8cc8-a17596e0daa9    running        z1  192.168.1.66  146f3bbb-9b3f-4189-a707-01486408eef7  common
master/6de69941-ffe7-4de0-87a6-2ecd13182ea1  running        z1  192.168.1.63  c49c7cdc-d353-48d1-bb56-c8967c393726  common
master/a8209093-a2a4-4499-927c-acf4a433305b  running        z1  192.168.1.69  593f4973-2698-4d7f-9d2f-ebf9e2d84711  common
proxy/2263ec27-13d8-483b-965a-a56334adc19c   running        z1  192.168.1.68  90f2715d-7a7f-4694-ae0f-20ed041b4fc7  common
worker/3b7b66fd-91c0-4b60-ac53-789dfc918b48  running        z1  192.168.1.67  d9a5beb7-ba71-4242-8ea1-c563a9980023  worker
worker/59af0e9c-90dd-44ae-87e2-0c2b57f6e059  running        z1  192.168.1.64  20bf88a1-4f2b-4ec7-af4f-bd897ed700db  worker
worker/62817668-f2f8-42e7-96b3-9a868550ebd9  running        z1  192.168.1.72  51461431-480c-447c-b668-39c2d16c4d21  worker
worker/8d112de4-fe1a-41c3-b495-a0d6983623a0  running        z1  192.168.1.62  f5dd72af-94dd-45c3-9386-54f29f317d01  worker

8 vms

Succeeded

→ kubectl get nodes -o wide
NAME           STATUS     AGE       VERSION   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION
192.168.1.62   Ready      27d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.64   Ready      27d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.67   Ready      27d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.71   NotReady   10d       v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic
192.168.1.72   Ready      1h        v1.6.1    <none>        Ubuntu 14.04.5 LTS   4.4.0-62-generic

besides route-sync kept syncing the missing worker to the routing service

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.