gluster / gluster-kubernetes Goto Github PK

View Code? Open in Web Editor NEW

874.0 59.0 390.0 1.39 MB

GlusterFS Native Storage Service for Kubernetes

License: Apache License 2.0

Shell 95.05% Ruby 4.50% Makefile 0.46%

glusterfs storage heketi glusterfs-dynamic-provisioning kubernetes openshift deploy-tool

gluster-kubernetes's Introduction

gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes

gluster-kubernetes is a project to provide Kubernetes administrators a mechanism to easily deploy GlusterFS as a native storage service onto an existing Kubernetes cluster. Here, GlusterFS is managed and orchestrated like any other app in Kubernetes. This is a convenient way to unlock the power of dynamically provisioned, persistent GlusterFS volumes in Kubernetes.

Component Projects

Kubernetes, the container management system.
GlusterFS, the scale-out storage system.
heketi, the RESTful volume management interface for GlusterFS.

Presentations

You can find slides and videos of community presentations here.

>>> Video demo of the technology! <<<

Documentation

Quickstart

If you already have a Kubernetes cluster you wish to use, make sure it meets the prerequisites outlined in our setup guide.

This project includes a vagrant setup in the vagrant/ directory to spin up a Kubernetes cluster in VMs. To run the vagrant setup, you'll need to have the following pre-requisites on your machine:

4GB of memory
32GB of storage minimum, 112GB recommended
ansible
vagrant
libvirt or VirtualBox

To spin up the cluster, simply run ./up.sh in the vagrant/ directory.

NOTE: If you plan to run ./up.sh more than once the vagrant setup supports caching packages and container images. Please read the vagrant directory README for more information on how to configure and use the caching support.

Next, copy the deploy/ directory to the master node of the cluster.

You will have to provide your own topology file. A sample topology file is included in the deploy/ directory (default location that gk-deploy expects) which can be used as the topology for the vagrant libvirt setup. When creating your own topology file:

Make sure the topology file only lists block devices intended for heketi's use. heketi needs access to whole block devices (e.g. /dev/sdb, /dev/vdb) which it will partition and format.
The hostnames array is a bit misleading. manage should be a list of hostnames for the node, but storage should be a list of IP addresses on the node for backend storage communications.

If you used the provided vagrant libvirt setup, you can run:

$ vagrant ssh-config > ssh-config
$ scp -rF ssh-config ../deploy master:
$ vagrant ssh master
[vagrant@master]$ cd deploy
[vagrant@master]$ mv topology.json.sample topology.json

The following commands are meant to be run with administrative privileges (e.g. sudo su beforehand).

At this point, verify the Kubernetes installation by making sure all nodes are Ready:

$ kubectl get nodes
NAME      STATUS    AGE
master    Ready     22h
node0     Ready     22h
node1     Ready     22h
node2     Ready     22h

NOTE: To see the version of Kubernetes (which will change based on latest official releases) simply do kubectl version. This will help in troubleshooting.

Next, to deploy heketi and GlusterFS, run the following:

$ ./gk-deploy -g

If you already have a pre-existing GlusterFS cluster, you do not need the -g option.

After this completes, GlusterFS and heketi should now be installed and ready to go. You can set the HEKETI_CLI_SERVER environment variable as follows so that it can be read directly by heketi-cli or sent to something like curl:

$ export HEKETI_CLI_SERVER=$(kubectl get svc/heketi --template 'http://{{.spec.clusterIP}}:{{(index .spec.ports 0).port}}')

$ echo $HEKETI_CLI_SERVER
http://10.42.0.0:8080

$ curl $HEKETI_CLI_SERVER/hello
Hello from Heketi

Your Kubernetes cluster should look something like this:

$ kubectl get nodes,pods
NAME      STATUS    AGE
master    Ready     22h
node0     Ready     22h
node1     Ready     22h
node2     Ready     22h
NAME                               READY     STATUS              RESTARTS   AGE
glusterfs-node0-2509304327-vpce1   1/1       Running             0          1d
glusterfs-node1-3290690057-hhq92   1/1       Running             0          1d
glusterfs-node2-4072075787-okzjv   1/1       Running             0          1d
heketi-3017632314-yyngh            1/1       Running             0          1d

You should now also be able to use heketi-cli or any other client of the heketi REST API (like the GlusterFS volume plugin) to create/manage volumes and then mount those volumes to verify they're working. To see an example of how to use this with a Kubernetes application, see the following:

Hello World application using GlusterFS Dynamic Provisioning

Contact

The gluster-kubernetes developers hang out in #sig-storage on the Kubernetes Slack and on IRC channels in #gluster and #heketi at freenode network.

And, of course, you are always welcomed to reach us via Issues and Pull Requests on GitHub.

gluster-kubernetes's People

Contributors

Stargazers

Watchers

Forkers

obnoxxx mohamedashiqrh raghavendra-talur jarrpa ramkrsna screeley44 sroze ravishivt wattsteve jaohaohsuan humblec zhaixuepan msrccs bdurrow levenjm rcconsult sterburg lenartj andrewrothstein dnowba dindinw saravanapunith saravanastoragenetwork trifonnt sriv1211 johnkim76 frostman vredara pasjojo aland-zhang john-deng tdemaret fakod davidoster samuelsen qujinping ekuric wslaghekke rohit9211 amitkumarj441 ericsgagnon aslafy-z texasdude071 esorot gobindadas dougbtv mavahedinia hpgood brollb vbellur pronix plynte fabiormoura dragonme grebois mkarg75 bryanlarsen pkdetlefsen neoseele kwame998 phlogistonjohn pwc-asd ntcong gripenk johnstrunk dimthe lzbgt caiwenhao 0xgj ngocngv yangyuw nixpanic debianmaster astrisk bravecorvus heidsoft arikachen uasau anatolyrugalev ehayon zhengrongtan rarean labie toddrosner src-d daikon12 clustellar gemoya wangsz recall704 kubevirt luistoroe lgtoroe leexhwhy jondwaite gridl akinsbo dawchile chechiachang alvarosigz

gluster-kubernetes's Issues

Update deployment templates for Kube 1.5

heketi/heketi#622 introduces some changes for integrating with some cool Kube 1.5 features. We should decide if we want to go for them immediately or introduce a version check and creating another kube-template directory for 1.5.

Related: #87

Update README to reflect project intent

gluster-kubernetes should clarify that it is intended to facilitate the hyper-converged scenario of GlusterFS + heketi running within Kubernetes and on Kubernetes nodes.

Peer not in cluster state errors when gk-deploy is run

The current ansible playbook builds the /etc/hosts file in nodes by getting the IPs assigned to eth1 device.

On my machines, this is the ip addr output for eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:2a:84:0c brd ff:ff:ff:ff:ff:ff
inet 192.168.10.227/24 brd 192.168.10.255 scope global dynamic eth1
valid_lft 2296sec preferred_lft 2296sec
inet 192.168.10.101/24 brd 192.168.10.255 scope global secondary eth1
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe2a:840c/64 scope link
valid_lft forever preferred_lft forever

As it can be seen, this interface has two IPs. The one that is added to /etc/hosts is not the one in topology file and causes the gk-deploy script to fail.

Use DaemonSet for GlusterFS

When heketi/heketi#596 resolves and we get a new container image, update the GlusterFS definition to use DaemonSets instead of Deployments. For the OpenShift support, this means the use of Templates will also no longer be necessary.

labels are not deleted in abort()

Errors printed on gk-deploy

Always outputs error as gluster templates already present, When I try without -g.

invalid master/tasks/main.yml - undefined variable

Here is a new one

TASK [setup] *******************************************************************
ok: [master]

TASK [master : kubeadm init] ***************************************************
fatal: [master]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'ipv4'\n\nThe error appears to have been in '/home/screeley/git/gluster-kubernetes-dev/fix_metalink_error/gluster-kubernetes/vagrant/roles/master/tasks/main.yml': line 1, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: kubeadm init\n  ^ here\n"}
	to retry, use: --limit @/home/screeley/git/gluster-kubernetes-dev/fix_metalink_error/gluster-kubernetes/vagrant/site.retry

PLAY RECAP *********************************************************************
master                     : ok=24   changed=22   unreachable=0    failed=1   
node0                      : ok=23   changed=22   unreachable=0    failed=0   
node1                      : ok=23   changed=22   unreachable=0    failed=0   
node2                      : ok=23   changed=22   unreachable=0    failed=0   
node3                      : ok=23   changed=22   unreachable=0    failed=0   

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Operational Use Case: Remove a node from a cluster

See: heketi/heketi#636

What is the story for removing a node from Kubernetes without impacting the live storage volumes managed by Heketi?

discussion: convert json to yaml

Wondering if we should convert json to yaml, I believe OpenShift and Kube have settled on yaml as the official object spec def.

What say you @obnoxxx @jarrpa @wattsteve @erinboyd

I can do the conversion if everyone is in agreement

deploy-heketi DC is not getting deleted at times in abort function.

deploy-heketi stuck on "Adding device"

I managed to get the provided sample vagrant deployment working. I'm now trying to use the deploy logic on a pre-existing kubernetes deployment (built from https://github.com/att-comdev/halcyon-vagrant-kubernetes which uses Ubuntu for the nodes). In this setup, there are three nodes called node1, node2, node3 and I modified their Vagrantfile to add three drives each.

The .gk-deploy gets to the deploy-heketi stage, successfully adds the devices for node1. Then, it freezes on the second node and I have to kill the script. I'm not sure where to start debugging.

ubuntu@kube1:~/deploy$ export KUBECONFIG="/etc/kubernetes/admin.conf" && sudo -E bash ./gk-deploy -g -w 180
Using Kubernetes CLI.
Error from server: error when creating "./kube-templates/heketi-service-account.yaml": serviceaccounts "heketi-service-account" already exists
'storagenode' already has a value (glusterfs), and --overwrite is false
'storagenode' already has a value (glusterfs), and --overwrite is false
'storagenode' already has a value (glusterfs), and --overwrite is false
Error from server: error when creating "./kube-templates/glusterfs-daemonset.json": daemonsets.extensions "glusterfs" already exists
Waiting for GlusterFS pods to start ... OK
Error from server: error when creating "STDIN": services "deploy-heketi" already exists
Error from server: error when creating "STDIN": deployments.extensions "deploy-heketi" already exists
Waiting for deploy-heketi pod to start ... OK
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    17  100    17    0     0   2303      0 --:--:-- --:--:-- --:--:--  2428
        Found node 172.16.35.11 on cluster bb595497757b43a3895fb6f7ef3ec791
                Adding device /dev/sdb ... Unable to add device: Unable to execute command on glusterfs-92gtg:   Can't initialize physical volume "
/dev/sdb" of volume group "vg_0490a9019cb9648662b6d0e2d47041ac" without -ff
                Adding device /dev/sdc ... Unable to add device: Unable to execute command on glusterfs-92gtg:   Can't initialize physical volume "
/dev/sdc" of volume group "vg_defad24db8970ec3c474909c04dd8052" without -ff
                Adding device /dev/sda ... Unable to add device: Unable to execute command on glusterfs-92gtg:   Can't initialize physical volume "
/dev/sda" of volume group "vg_1adc96baa051963403b95d438fdc1d84" without -ff
        Found node 172.16.35.12 on cluster bb595497757b43a3895fb6f7ef3ec791
                Adding device /dev/sdb ...

If this bug/question should be reported to heketi, let me know.

Manage heketi.db with a pv/pvc

The actual process create a separated gluster volume to hold the heketi.db
But it could be managed via a pvc/pv, it simplifies the setup and have a unified way to manage all the gluster volumes (no exception).
For example, one usecase is that you can clean up the whole cluster by using only kubectl delete ...
and create it again from scratch without extra actions

As a poc, I tryied the following with success:

created the 'deploy-heketi' resource
boostraped the heketi.db with the topology file
created a glusterfsvolume + the pv/pvc
created the final heketi-deployment with the pvc mounted(modified template: heketi.yaml)
copied the heketi.db from deploy-heketi to this volume

$ kubctl get pv
NAME          CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM 
heketidbstorage  4Gi        RWX           Retain       Bound   glusterfs/heketi       

$ kubectl get pvc --namespace glusterfs
NAME      STATUS    VOLUME            CAPACITY   ACCESSMODES   AGE
heketi    Bound     heketidbstorage   4Gi        RWX

Would you be interested to move to a such configuration? and discuss the correct way to achieve this?

Use etcd3 instead of GlusterFS volume for heketi DB

When heketi/heketi#599 resolves, use an etcd3 cluster (running in kube) to store the heketi DB instead of a direct BoltDB in a GlusterFS volume. This should remove the need for an initial deploy-heketi pod. We could use this as reference.

Endpoint changes with k8s 1.5

In kubernetes 1.5, the "parameters" options in the storage class no longer use the "endpoints" parameter. As per kubernetes/kubernetes#34705:

When the persistent volumes are dynamically provisioned, the Gluster plugin automatically create an endpoint and a headless service in the name `gluster-dynamic-

Leaving it in will generate the below error when trying to create the PVC. This option needs to be removed from the Hello World storage class example. Also, gk-deploy should not even create the "heketi-storage-endpoints" at all since k8s handles the endpoint.

ubuntu@kube1:~$ kubectl describe pvc/gluster1
Name:           gluster1
Namespace:      default
Status:         Pending
Volume:
Labels:         <none>
Capacity:
Access Modes:
Events:
  FirstSeen     LastSeen        Count   From                            SubobjectPath   Type            Reason                  Message
  ---------     --------        -----   ----                            -------------   --------        ------                  -------
  10m           14s             41      {persistentvolume-controller }                  Warning         ProvisioningFailed      Failed
to provision volume with StorageClass "gluster-heketi": glusterfs: invalid option "endpoint" for volume plugin kubernetes.io/glusterfs

Use namespace for heketi/gluster resources

If all resources of Heketi and GlusterFS reside in one namespace, it might be easier to filter for them on metrics or logging.

Also to stop and remove all resources gracefully you then can just call:

kubectl delete namespace glusterfs

Re-run 'vagrant provision' fails at kubeadm preflight checks

up.sh --provider=virtualbox results in failure:

fatal: [node2]: FAILED! => {"changed": true, "cmd": ["yum", "-y", "install", "centos-release-gluster", "epel-release"], "delta": "0:00:15.836969", "end": "2016-12-08 14:20:52.772676", "failed": true, "rc": 1, "start": "2016-12-08 14:20:36.935707", "stderr": "http://ftp.osuosl.org/pub/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.sfo12.us.leaseweb.net/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttps://mirrors.lug.mtu.edu/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.math.princeton.edu/pub/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.cs.princeton.edu/pub/mirrors/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.us.leaseweb.net/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttps://mirror.redsox.cc/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttps://mirror.steadfast.net/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.symnds.com/distributions/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttps://mirror.chpc.utah.edu/pub/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://fedora-epel.mirror.lstn.net/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.cs.p

follow up with 'vagrant provision' it also fails with what appears to be some preflight checks:

TASK [master : kubeadm init] ***************************************************
fatal: [master]: FAILED! => {"changed": true, "cmd": ["kubeadm", "init", "--token=abcdef.1234567890abcdef", "--use-kubernetes-version=v1.4.5", "--api-advertise-addresses=192.168.10.90"], "delta": "0:00:00.245555", "end": "2016-12-08 14:28:28.534215", "failed": true, "rc": 2, "start": "2016-12-08 14:28:28.288660", "stderr": "preflight check errors:\n\tPort 6443 is in use\n\tPort 2379 is in use\n\tPort 8080 is in use\n\tPort 9898 is in use\n\tPort 10250 is in use\n\tPort 10251 is in use\n\tPort 10252 is in use\n\t/etc/kubernetes/manifests is not empty\n\t/etc/kubernetes/pki is not empty\n\t/var/lib/etcd is not empty\n\t/var/lib/kubelet is not empty\n\t/etc/kubernetes/admin.conf already exists\n\t/etc/kubernetes/kubelet.conf already exists", "stdout": "Running pre-flight checks", "stdout_lines": ["Running pre-flight checks"], "warnings": []}
	to retry, use: --limit @/home/screeley/git/gluster-kubernetes-dev/test-dir/scale1/gluster-kubernetes/vagrant/site.retry

PLAY RECAP *********************************************************************
master                     : ok=21   changed=5    unreachable=0    failed=1   
node0                      : ok=20   changed=5    unreachable=0    failed=0   
node1                      : ok=20   changed=5    unreachable=0    failed=0   
node2                      : ok=4    changed=2    unreachable=0    failed=1   
node3                      : ok=20   changed=5    unreachable=0    failed=0   

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Deploy of gluster template fails

Error from server: DaemonSet in version "v1beta1" cannot be handled as a DaemonSet: [pos 1495]: json: decode bool: got first char "

vagrant quickstart fails: Device /dev/vdb not found (or ignored by filtering).

I get the following error on the kubernetes master when using the quickstart guide https://github.com/gluster/gluster-kubernetes#quickstart

[root@master deploy]# ./gk-deploy -g
Starting Kubernetes deployment.
serviceaccount "heketi-service-account" created
deployment "glusterfs-node0" created
deployment "glusterfs-node1" created
deployment "glusterfs-node2" created
Waiting for GlusterFS pods to start ... OK
service "deploy-heketi" created
deployment "deploy-heketi" created
Waiting for deploy-heketi pod to start ... OK
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    17  100    17    0     0    511      0 --:--:-- --:--:-- --:--:--   531
Creating cluster ... ID: 73f2e1d87c54137b27f52285e4b22a14
	Creating node node0 ... ID: 5dcc52cc01a199a59386e7863602e983
		Adding device /dev/vdb ... Unable to add device: Unable to execute command on glusterfs-node0-2509304327-edwgn:   Device /dev/vdb not found (or ignored by filtering).


		Adding device /dev/vdc ... Unable to add device: Unable to execute command on glusterfs-node0-2509304327-edwgn:   Device /dev/vdc not found (or ignored by filtering).


		Adding device /dev/vdd ... Unable to add device: Unable to execute command on glusterfs-node0-2509304327-edwgn:   Device /dev/vdd not found (or ignored by filtering).


	Creating node node1 ... ID: 173452ef339f96efd9b1d089469cb572
		Adding device /dev/vdb ... Unable to add device: Unable to execute command on glusterfs-node1-3290690057-kbbmh:   Device /dev/vdb not found (or ignored by filtering).


		Adding device /dev/vdc ... Unable to add device: Unable to execute command on glusterfs-node1-3290690057-kbbmh:   Device /dev/vdc not found (or ignored by filtering).


		Adding device /dev/vdd ... Unable to add device: Unable to execute command on glusterfs-node1-3290690057-kbbmh:   Device /dev/vdd not found (or ignored by filtering).


	Creating node node2 ... ID: 27ace16e97f55dffebce8126844a87df
		Adding device /dev/vdb ... Unable to add device: Unable to execute command on glusterfs-node2-4072075787-7t1gx:   Device /dev/vdb not found (or ignored by filtering).


		Adding device /dev/vdc ... Unable to add device: Unable to execute command on glusterfs-node2-4072075787-7t1gx:   Device /dev/vdc not found (or ignored by filtering).


		Adding device /dev/vdd ... Unable to add device: Unable to execute command on glusterfs-node2-4072075787-7t1gx:   Device /dev/vdd not found (or ignored by filtering).


Error: Error calling v.allocBricksInCluster: Id not found

the path "heketi-storage.json" does not exist
Timed out waiting for pods matching 'job-name=heketi-storage-copy-job'.
service "deploy-heketi" deleted
deployment "deploy-heketi" deleted
No resources found
Error from server: services "heketi-storage-endpoints" not found
serviceaccount "heketi-service-account" deleted
pod "glusterfs-node0-2509304327-edwgn" deleted
pod "glusterfs-node1-3290690057-kbbmh" deleted
pod "glusterfs-node2-4072075787-7t1gx" deleted
deployment "glusterfs-node0" deleted
deployment "glusterfs-node1" deleted
deployment "glusterfs-node2" deleted

Suggestions

Thanks for this project - it's been very useful. I've been trying to use it on a 3 node Kubernetes cluster installed via kubeadm on CentOS 7. While experimenting, I found a few enhancements that would be useful:

The script expects heketi-cli to be in the path. Perhaps throw an error if it's not detected?
Executing rm -rf /var/lib/heketi on --abort on all the nodes would save me having to do it manually if I need to start from scratch
After --abort I need to manually run vgs followed by vgremove -y <volume group starting with vg_> so that the volumes are ready for a fresh install. Not sure if this can happen automatically on --abort

I also found I needed to ensure lvm2-monitor was active on all nodes or strange errors would crop up. If it wasn't running I had to do the following:

systemctl restart lvm2-lvmetad.service
systemctl restart lvm2-lvmetad.socket

I could then check the status with systemctl status lvm2-monitor

Hope this is useful!

log file contains only the last log line

Provide documentation for supported deployment scenarios

We should provide a document or set of documentation that clearly describes the deployment scenarios this project supports. This includes diagrams and descriptions of all the components in each scenario and the relationships between the components.

Some issues with `gk-deploy`

Long arguments seem to have issues:

$ ./gk-deploy --deploy-gluster --verbose                                                                                               
Unknown option 'deploy-gluster'.

$ ./gk-deploy -g --verbose                                                                                                          1  
Unknown option 'erbose'.

Short versions work, but there seem to be some other issues:

./gk-deploy -g -v                                                                                                                    
Starting Kubernetes deployment.
serviceaccount "heketi-service-account" created
Found secret 'heketi-service-account-token-3juax' in namespace 'default' for heketi-service-account.
  File "<stdin>", line 10
    print node['node']['hostnames']['manage'][0]
             ^
SyntaxError: Missing parentheses in call to 'print'
Deploying GlusterFS pods on .
The Deployment "glusterfs-" is invalid: metadata.name: Invalid value: "glusterfs-": must match the regex [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* (e.g. 'example.com')
Waiting for GlusterFS pods to start ... Checking status of pods matching 'glusterfs=pod':

Checking status of pods matching 'glusterfs=pod':
[...]

Too many "resource not found" errors in the screen.

gk-deploy script should fail to execute if the topology.json file not found

Currently gk-deploy is started even if the topology file is not specified. It should verify and fail if not found.

Heketi-cli command failure does not quit the gk-deploy

Any heketi-cli command failure does not call abort instead it just proceeds with other steps.

Deamonset label to deploy gluster pods are not able to be accessed from heketi pod

currently heketi searches for this label.

This repo needs CI

Now that there is code and deployment logic in this repo, it needs to make sure it keeps working and therefore needs a CI.

I advise to do this first before accepting any new changes.

Use current namespace instead of 'default'

Currently, the default behavior of gk-deploy is to deploy into the namespace 'default'. We should instead deploy to whatever the user's current namespace is (e.g. not require a -n option).

Good to have a prereq() function which mention the prerequistes of this tool.

Its good to have a function which mention the prerequisites we need to have for the gk-deploy to run. This helps the storage admin wrt to the deployment of this solution.

Explore 3rd party alternatives to providing our own vagrant k8s deployment

Update Jan 11 2016: This purpose of this issue has changed and is now focused on the updated issue's title: "Explore 3rd party alternatives to providing our own vagrant k8s deployment". The migration to ansible discussion has moved to #149.

I ended up getting this project working on my existing k8s cluster with Ubuntu 16.04 hosts. I'm really happy about that and excited about using it. Overall, this is a great project that helped me get started on understanding glusterfs and heketi. However, I do have some suggestions in the same spirit as #35.

From my testing, the glusterfs version installed has to be the same version as the glusterfs-daemonset, 3.8 in this case. This should be better documented. Installing from Ubuntu 16.04 default repo will grab 3.7, which generated errors when trying to mount volumes. Also, you only want to install the glusterfs-client on all nodes as installing glusterfs-server generated other errors.
Widen the scope of the ansible playbook to support environments outside of just the CentOS deployment from the Vagrantfile. For Ubuntu, this would involve installing glusterfs-client, loading dm_thin_pool, dm_snapshot, and dm_mirror modules, and opening up ufw roles if ufw is enabled.
Why not have the ansible playbook automatically run the gk-deploy script on the master? Or better yet, why not convert the gk-deploy to be an ansible role? IMHO, I think the focus for the project should be on the ansible playbook (similar to how https://github.com/att-comdev/halcyon-kubernetes is organized). The user would simply specify an inventory file and the playbook would do everything: bring up the k8s cluster (optional if user already has a working k8s cluster), perform additional preparation for glusterfs+heketi like installing packages and opening up additional firewalld/ufw rules, running the gk-deploy logic, and finally running the hello world app (also optional). The Vagrantfile would still be provided to get users running few VM nodes with storage devices added if they don't already have a target cluster. This would provide a more painless method of getting the project working. Plus, ansible can be written more cleanly than a bash script especially when adding error checking or conditional checks about OS distribution or k8s/OpenShift versions.

Operational Use Case: Replace a bad drive given errors on a brick

See heketi/heketi#634

What is the operational story for getting a disk replaced when we only see a single Heketi brick reporting failures?

Operational Use Case: Replace a node in the cluster

See heketi/heketi#635

What is the story for replacing a node in the cluster?

Heketi topology load failure

Topology load failure should not proceed, It should fail with a warning and should be able to resume from the load command.

Is SSH server in GlusterFS containers really needed?

Hi. Why do you run SSH in GlusterFS containers? Maybe I am being naive, but wouldn't it be better to use kubectl exec to run commands in the containers? I am saying this because you run the DaemonSet with host network and in privileged mode, and makes me feel questioning the security of this provisioning strategy. In some OpenStack installation the firewall can fail sometimes in applying the security groups, and if this happens it would expose the SSH servers to the internet.

Configure topology via a configmap

To be able to package correctly Heketi (e.g with helm/kpm, see #39) or have a smoother experience with kubernetes, it should limit as much as possible manual actions, like 'pushing the topology'

I would like to edit/create a configmap with the topology file to have it updated.

metalink error on vagrant+ansible up.sh

The following error, although moves between nodes on subsequent runs, it has shown consistently on at least 1 node for each up.sh initiation. This time it showed on node0. wondering if there is something we can do to help this? going to try to remove epel.repo prior to yum installs or possibly yum clean metadata after repo is installed????

fatal: [node0]: FAILED! => {"changed": true, "cmd": ["yum", "-y", "install", "wget", "screen", "git", "vim", "glusterfs-client", "heketi-client", "iptables", "iptables-utils", "iptables-services", "docker", "kubeadm"], "delta": "0:00:16.419865", "end": "2016-12-08 14:39:52.070970", "failed": true, "rc": 1, "start": "2016-12-08 14:39:35.651105", "stderr": "http://mirror.math.princeton.edu/pub/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttps://mirror.chpc.utah.edu/pub/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.symnds.com/distributions/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.us.leaseweb.net/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.sfo12.us.leaseweb.net/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirrors.syringanetworks.net/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirrors.mit.edu/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.cs.pitt.edu/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttps://pubmirror1.math.uh.edu/fedora-buffet/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://fedora-epel.mirror.lstn.net/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://ftp.osuosl.org/pub/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.nexcess.net/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.sjc02.svwh.net/fedora-epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for epel\nTrying other mirror.\nhttp://mirror.oss.ou.edu/epel/7/x86_64/repodata/repomd.xml: [Errno -1] repomd.xml does not match metalink for

The --abort function acts differently with -g option.

Operational Use Case: Replace a disk in a node

See heketi/heketi#633

What is the operational story to replace a disk in a node in Kubernetes?

Configure log output to `stdout`

At the moment there seems to be no log output for the glusterfs pods.

no option parsing for namespace or -n.

@MohamedAshiqrh have the patch , waiting for the PR :)

tests: All functional tests should change to use prebuilt vagrant boxes

Currently we use generic Vagrant boxes, which we then setup as we want using Ansible. This model is completely unpredictable as time goes by because many things change. Instead, we need to change our tests to use prebuilt vagrant boxes, which have been setup for our tests. This means that we also need a directory in Heketi so that anyone can build the boxes automatically, then the boxes can submit to Vagrant Atlas.

Helm Chart

We've started work on a Helm Chart based off the manifests here.

So far it works with a few changes for standard token and api locations but doesn't persist the database or load the topology automatically. I thought I'd raise a ticket early for tracking and inputs but the addition of etcd and daemonset features to Heketi should let us wrap this up and push it upstream.

https://github.com/AcalephStorage/charts/tree/glusterfs/incubator/glusterfs

How do you configure the StorageClass for the vagrant deployment?

While using the vagrant setup, both @screeley and I are running into the problems encountered with #24. The GlusterFS provisioner has undergone a lot of changes in Kubernetes lately so the first thing I did was ascertain the version the kubernetes version, which is 1.4.5.

According to history of the provisioning README at the time 1.4.5 shipped (https://github.com/kubernetes/kubernetes/blob/1d527194656bad6a0f191f9fc6160bf7e931cf09/examples/experimental/persistent-volume-provisioning/README.md) the following storage class and claim should work, but they don't. Any idea @jarrpa @humblec ?

[root@master deploy]# cat gluster-sc.yaml
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
name: glusterfs
provisioner: kubernetes.io/glusterfs
parameters:
resturl: "http://10.36.0.0:8080"
restuser: "admin"
secretNamespace: "default"
secretName: "heketi-service-account-token-7ne36"

[root@master deploy]# cat gluster-claim.yaml
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "test-claim",
"annotations": {
"volume.beta.kubernetes.io/storage-class": "glusterfs"
}
},
"spec": {
"accessModes": [
"ReadWriteMany"
],
"resources": {
"requests": {
"storage": "5Gi"
}
}
}
}

Template file has extra entry for securityContext

thin: Required device-mapper target(s) not detected in your kernel

I am hitting the following error when executing heketi-cli setup-openshift-heketi-storage:

Error: Unable to execute command on glusterfs1-1373000839-qq9jv:   /usr/sbin/modprobe failed: 1
  Cannot read thin-pool target version.
  thin: Required device-mapper target(s) not detected in your kernel.
  Run `lvcreate --help' for more information.

On the hosts there is Ubuntu 16.04.1 LTS installed.

What are the prerequisites for Heketi/Glusterfs to create volumes?

Deploy GlusterFS by default, with a prompt

By default we should assume that we are meant to deploy the GlusterFS DaemonSet. However, we should prompt the user whether they want to deploy GlusterFS or not. This prompt should also include a small note as to the firewall requirements on the node for GlusterFS. The -g option should still be retained, with the slight modification that it simply assumes 'yes' and skips the prompt entirely.

invalid option "secretName" for volume plugin kubernetes.io/glusterfs

I am trying to create this PersistantVolumeClaim:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: claim1
  annotations:
    volume.beta.kubernetes.io/storage-class: slow
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi

When doing so this error shows up:

kubectl describe persistentvolumeclaim claim1                                                                                           368ms 
Name:		claim1
Namespace:	default
Status:		Pending
Volume:		
Labels:		<none>
Capacity:	
Access Modes:	
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  48m		1m		29	{persistentvolume-controller }			Warning		ProvisioningFailed	Failed to provision volume with StorageClass "slow": glusterfs: invalid option "secretNamespace" for volume plugin kubernetes.io/glusterfs
  49m		7s		170	{persistentvolume-controller }			Warning		ProvisioningFailed	Failed to provision volume with StorageClass "slow": glusterfs: invalid option "secretName" for volume plugin kubernetes.io/glusterfs

I set up this StorageClass before:

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: slow
provisioner: kubernetes.io/glusterfs
parameters:
  resturl: "http://heketi.gluster.svc:8080"
  # restuser: "admin"
  secretNamespace: "gluster"
  secretName: "heketi-secret"

Error: Unable to execute command on glusterfs0-2272744551-a4ghp: volume create: heketidbstorage: failed: Host 120fa67f-b5fe-4232-8c77-0c78e1c1c8ce.pub.cloud.scaleway.com is not in ' Peer in Cluster' state

Topology info gives this:

Cluster Id: 645be219ee6b0598b4d51458f2c82a12

    Volumes:

    Nodes:

	Node Id: 18be84c12d63e0cba5b45a85145867f4
	State: online
	Cluster Id: 645be219ee6b0598b4d51458f2c82a12
	Zone: 1
	Management Hostname: 120fa67f-b5fe-4232-8c77-0c78e1c1c8ce.pub.cloud.scaleway.com
	Storage Hostname: 120fa67f-b5fe-4232-8c77-0c78e1c1c8ce.pub.cloud.scaleway.com
	Devices:
		Id:a19f21522ad62a555ce29fcfa374019c   Name:/dev/vdb            State:online    Size (GiB):46      Used (GiB):0       Free (GiB):46      
			Bricks:

	Node Id: 41a0f607a5669136219f3ccd09cb4583
	State: online
	Cluster Id: 645be219ee6b0598b4d51458f2c82a12
	Zone: 1
	Management Hostname: 220d4345-ea09-4ba3-bf8e-bc2c86bc821c.pub.cloud.scaleway.com
	Storage Hostname: 220d4345-ea09-4ba3-bf8e-bc2c86bc821c.pub.cloud.scaleway.com
	Devices:
		Id:71227ba841eb6ca845fb4315fe011b2c   Name:/dev/vdb            State:online    Size (GiB):46      Used (GiB):0       Free (GiB):46      
			Bricks:

	Node Id: 4fbef6294f6eedcff4fe86874cd4b93c
	State: online
	Cluster Id: 645be219ee6b0598b4d51458f2c82a12
	Zone: 1
	Management Hostname: f6e5fcaf-35bf-424b-a2f9-900d3d1a9b11.pub.cloud.scaleway.com
	Storage Hostname: f6e5fcaf-35bf-424b-a2f9-900d3d1a9b11.pub.cloud.scaleway.com
	Devices:
		Id:7b8fbfe3ad7de9c825f082f91d0bf6ac   Name:/dev/vdb            State:online    Size (GiB):46      Used (GiB):0       Free (GiB):46      
			Bricks:

What can I try to resolve this?

gluster / gluster-kubernetes Goto Github PK

gluster-kubernetes's Introduction

gluster-kubernetes

GlusterFS Native Storage Service for Kubernetes

Component Projects

Presentations

Documentation

Quickstart

Contact

gluster-kubernetes's People

Contributors

Stargazers

Watchers

Forkers

gluster-kubernetes's Issues

Recommend Projects

Recommend Topics

Recommend Org