Giter Club home page Giter Club logo

cloudera-playbook's Introduction

Cloudera Playbook

An Ansible Playbook that installs the Cloudera stack on RHEL/CentOS

Running the playbook

$ vi ~/.ansible.cfg

[defaults]
# disable key check if host is not initially in 'known_hosts'
host_key_checking = False

[ssh_connection]
# if True, make ansible use scp if the connection type is ssh (default is sftp)
scp_if_ssh = True
$ vi ~/ansible_hosts

[scm_server]
<host>        license_file=/path/to/cloudera_license.txt

[db_server]
<host>

[krb5_server]
<host>        default_realm=<REALM>

[utility_servers:children]
scm_server
db_server
krb5_server

[gateway_servers]
<host>        host_template=HostTemplate-Gateway role_ref_names=HDFS-HTTPFS-1

[master_servers]
<host>        host_template=HostTemplate-Master1
<host>        host_template=HostTemplate-Master2
<host>        host_template=HostTemplate-Master3

[worker_servers]
<host>
<host>
<host>

[worker_servers:vars]
host_template=HostTemplate-Workers

[cdh_servers:children]
utility_servers
gateway_servers
master_servers
worker_servers
  • Run playbook
$ ansible-playbook -i ~/ansible_hosts cloudera-playbook/site.yml
    
-i INVENTORY
   inventory host path or comma separated host list (default=/etc/ansible/hosts)

Ansible communicates with the hosts defined in the inventory over SSH. It assumes you’re using SSH keys to authenticate so your public SSH key should exist in authorized_keys on those hosts. Your user will need sudo privileges to install the required packages.

By default Ansible will connect to the remote hosts using the current user (as SSH would). To override the remote user name you can specify the --user option in the command, or add the following variables to the inventory:

[all:vars]
ansible_user=ec2-user

AWS users can use Ansible’s --private-key option to authenticate using a PEM file instead of SSH keys.

Enabling Kerberos

The playbook can install a local MIT KDC and configure Hadoop Security. To enable Hadoop Security:

  • Specify the '[krb5_server]' host in the inventory (see above)
  • Set 'krb5_kdc_type' to 'mit' in group_vars/krb5_server

Overriding CDH service/role configuration

The playbook uses Cloudera Manager Templates to provision a cluster. As part of the template import process Cloudera Manager applies Autoconfiguration rules that set properties such as memory and CPU allocations for various roles.

If the cluster has different hardware or operational requirements then you can override these properties in group_vars/cdh_servers. For example:

cdh_services:
  - type: hdfs        
    datanode_java_heapsize: 10737418240

These properties get added as variables to the rendered template's instantiator block and can be referenced from the service configs. For example roles/cdh/templates/hdfs.j2:

"roleType": "DATANODE",
"configs": [{
  "name": "datanode_java_heapsize",
  "variable": "DATANODE_JAVA_HEAPSIZE"
}

How to contribute

  • Fork the repo and create a topic branch
  • Push commits to your repo
  • Create a pull request!

cloudera-playbook's People

Contributors

jrkinley avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.