Ansible role to deploy a spark cluster
- Install dependencies, this role requires java be installed on the nodes. My advice would be to use galaxy:
sudo ansible-galaxy install geerlingguy.java
- Simply clone this repo to /etc/ansible/roles
sudo git clone [email protected]:slaclab/ansible-role-spark.git /etc/ansible/roles/ansible-spark
- modify your hosts inventory file (either /etc/ansible/hosts or your own file somewhere to be referenced by the playbook with -i) to something like this:
[all:vars]
ansible_user=centos
ansible_ssh_private_key_file=~private-key.pem
[masters]
dhcp-os-129-163.slac.stanford.edu
[zookeepers:children]
masters
[spark-masters:children]
masters
[slaves]
dhcp-os-129-155.slac.stanford.edu
dhcp-os-129-160.slac.stanford.edu
dhcp-os-129-161.slac.stanford.edu
dhcp-os-129-162.slac.stanford.edu
[spark-workers:children]
slaves
- create a playbook file somewhere (eg. ~/spark.yml):
- name: spark master setup
hosts: all
roles:
- role: geerlingguy.java
- role: ansible-spark
- run the playbook
ansible-playbook [-i <path>/<to>/<hosts/inventory>] ~/spark.yaml
- validate cluster
navigate a web browser to http://:7077/ and you should see the spark master panel and all of the workers defined registered.
- test cluster
/opt/spark/spark-2.0.0-bin-hadoop2.7/bin/spark-submit \
--master spark://<master>:7077 \
--supervise
--class org.apache.spark.examples.SparkPi \
/opt/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.0.jar 1000
- integrate zookeeper
- add ubuntu support