Giter Club home page Giter Club logo

lifeguard's Introduction

Lifeguard, keeping you safe in the ClusterPools

Welcome!

Welcome to the Open Cluster Management Lifeguard project. Lifeguard provides a series of helpful utility scripts to automate the creation, use, and management of ClusterPool, ClusterDeployment (WIP), ClusterImageSet, and ClusterClaim provided by the Open Cluster Management/Red Hat Advanced Cluster Management/Hive Projects. Rest assured, these utility scripts don't do anything too extraordinary - ClusterPools, ClusterDeployments, ClusterClaims, and ClusterImagesets are just created and managed via Kubernetes Resources. That means that these utility scipts just template and oc apply various yaml files "under the hood". Below, we'll overview all of the "submodules" for this project - the helper scripts this project provides - and how to use them!

This project is still a work in progress, so there may still be gaps in logic, especially around "retry" on failed operations/user selections, we're working on patching these as we're able, and we're open to contribution!

Global Configuration

If you don't want colorized output or are using automation/a shell that can't show bash colorization, export COLOR=False for non-colorized output.

Installing as an oc Extension

You can easily install a basic version of lifeguard as an oc extension by running make install-oc-extension. If you wish to uninstall these oc extensions at a later date - simply run make uninstall-oc-extension.
You can change the installation directory by modifying the environment variable $INSTALL_DIRECTORY. The default location is /usr/local/bin

Prereqs

Advanced Cluster Management/Hive Installation

If you want to use ClusterPools (the primary focus of this repo), you'll first need a Kubernetes cluster running Hive v1.0.13+ or Red Hat Advanced Cluster Managment for Kubernetes (RHACM) 2.1.0+ (which includes a productized version of hive). Some of the features exposed in this utility are only present in Hive v1.0.16+ and RHACM 2.2.0+, but older versions won't break this utility, just some features may not work! Both Hive and RHACM can be installed via OperatorHub on OpenShift but you can also build and install Hive from source and the RHACM team is iterating on an installable open source project as well under the open-cluster-management organization on GitHub.

Optional: Configuring RBAC

Once you have a cluster, we recommend that you configure RBAC groups and/or Service Accounts to federate access to your ClusterPools and ClusterDeployments. These Kubernetes resources represent OpenShift clusters and the lifecycle of these Kubernetes resources determines the state of those OpenShift clusters, so it is important to restrict access to these resources especially when used in automation.

We have some documentation and resources to help you make the best choices around RBAC for ClusterPools derived from our own experience using ClusterPools on RHACM to serve multitenant users (internal dev squads, not true multitenancy) and at scale within CI scenarios. Our resources can be found in the docs directory of this repo, individual documents linked below:

Creating and Consuming ClusterPools

ClusterPools

The ClusterPool submodule of this project provides an "easy way" to create your first ClusterPool on a target cluster.

Creating a ClusterPool

To create your first ClusterPool:

  1. oc login to the OCM/ACM/Hive cluster where you wish to host ClusterPools
  2. cd clusterpools and run apply.sh (named for the oc command it will leverage throughout)
  3. Follow the prompts, the script will guide you through all of the configuration, secret creation, and clusterpool creation.

You may also consider defining a series of environment variables to "fully automate" the creation of additional clusterpools once you have one clusterpool under your belt. The prompts in apply.sh will note which environment variable can be defined to skip a given set, but here's a full list for convenience:

CLUSTERPOOL_TARGET_NAMESPACE - namespace you want to create/destroy a clusterpool in
PLATFORM - cloud platform you wish to use, must be one of: AWS, AZURE, GCP
CLOUD_CREDENTIAL_SECRET - name of the secret to be used to access your cloud platform
OCP_PULL_SECRET - name of the secret containing your OCP pull secret
CLUSTERIMAGESET_NAME - name of the clusterimageset you wish to use for your clusterpool
CLUSTERPOOL_SIZE - "size" of your clusterpool/number of "ready" clusters in your pool
CLUSTERPOOL_NAME - your chosen name for the clusterpool
# AWS Specific
CLUSTERPOOL_AWS_REGION - aws region to use for your clusterpool
CLUSTERPOOL_AWS_BASE_DOMAIN - aws base domain to use for your clusterpool
# Azure Specific
CLUSTERPOOL_AZURE_REGION - azure region to use for your clusterpool
CLUSTERPOOL_AZURE_BASE_DOMAIN - azure base domain to use for your clusterpool
CLUSTERPOOL_AZURE_BASE_DOMAIN_RESOURCE_GROUP_NAME - name of the resource group containing your azure base domain dns zone
# GCP Specific
CLUSTERPOOL_GCP_REGION - gcp region to use for your clusterpool
CLUSTERPOOL_GCP_BASE_DOMAIN - gcp base domain to use for your clusterpool

Note: If you find that the above list does not fully automate clusterpool creation, then we made a mistake or need to update the list! Please let us know via a GitHub issue or contribute a patch!

Destroying a ClusterPool

To delete a ClusterPool: Note: Deleting a ClusterPool will delete all unclaimed clusters in the pool, but any claimed clusters (clusters with an associated ClusterClaim) will remain until the ClusterClaim is deleted. You can check which ClusterPool a ClusterClaim is associated with by checking the spec.clusterPoolName entry in the ClusterClaim object via oc get ClusterClaim <cluster-claim-name> -n <namespace> -o json | jq '.spec.clusterPoolName'.

  1. oc login to the OCM/ACM/Hive cluster where you created ClusterPools
  2. cd clusterpools and run delete.sh (named for the oc command it will leverage)
  3. Follow the prompts, the script will guide you through the location and deletion of your ClusterPool

ClusterClaims

Claiming a Cluster from a ClusterPool (Creating a ClusterClaim)

To claim a cluster from a ClusterPool:

  1. oc login to the OCM/ACM/Hive cluster where you created your clusterpools
  2. cd clusterclaims and run apply.sh (named for the oc command it will leverage throughout)
  3. Follow the prompts, the script will guide you through all of the configuration, claim creation, and credentials extraction.

You may also consider defining a series of environment variables to "fully automate" the creation of new ClusterClaims once you have one claim under your belt. The prompts in apply.sh will note which environment variable can be defined to skip a given set, but here's a full list for convenience:

CLUSTERPOOL_TARGET_NAMESPACE - namespace you want to create/destroy a clusterpool in
CLUSTERCLAIM_NAME - chosen name for the ClusterClaim, must be unique and not contain `.`
CLUSTERPOOL_NAME - your chosen name for the clusterpool
CLUSTERCLAIM_GROUP_NAME - RBAC group to associate with the ClusterClaim
CLUSTERCLAIM_LIFETIME - lifetime for the cluster claim before automatic deletion, formatted as `1h2m3s` omitting units as desired (set to "false" to disable)
CLUSTERCLAIM_AUTO_IMPORT - set to "true" if you want the cluster imported into Advanced Cluster Management (defaults to "false"; no prompting)

Note: If you find that the above list does not fully automate ClusterClaim creation, then we made a mistake or need to update the list! Please let us know via a GitHub issue or contribute a patch!

Configuring Your subjects List Correctly

We overview how we use RBAC Groups and ServiceAccounts in their own documents, but we'll overview the subjects list in the ClusterClaim object briefly here, as it pertains to the use of lifeguard.

The subjects array in the ClusterClaim defines a list of RBAC entities who should be granted access to the claimed cluster and its related resources including the ClusterDeployment, User/Pass Secrets, and kubeconfig secret. Hive will automatically grant RBAC entities in this list access to the claimed clusters' resources. lifeguard handles the population of this list by:

  • Automatically detecting if the user who called lifeguard is a ServiceAccount type user and adding that ServiceAccount to the subjects array
  • Prompting the user to enter an RBAC Group to add to the subjects array - this is usually used to add the users' own RBAC group (in our case the users' team) to the claim.

You can also add individual users, ServiceAccounts, and all ServiceAccounts in a given namespace to a claim but these operations aren't exposed in lifeguard yet, but we're open to requests and contributions!

If you see the following error, you likely have a misconfigured subjects array and haven't been granted permissions to read your claimed clusters' credentials:

Error from server (Forbidden): secrets "<clusterdeployment-name>-<identifier>-admin-password" is forbidden: User "<username>" cannot get resource "secrets" in API group "" in the namespace "<clusterdeployment-name>"

Getting Credentials for a Claimed Cluster

apply.sh will extract the credentials for the cluster you claimed and tell you how to access those credentials but, if you have a pre-existing claim, we have a utility script to handle just credential extraction.

To extract the credentials from a pre-existing claim:

  1. oc login to the OCM/ACM/Hive cluster where your ClusterClaim resides
  2. cd clusterclaims and run get_credentials.sh (named for the oc command it will leverage throughout, get)
  3. Follow the prompts, the script will guide you through the credentials extraction

Reconciling Credential directories for Claimed Clusters

After creating multiple claims, you'll find that the directories for each are growing in number and no longer relevant. Additionally, there may have been claims created by your team that you want to access. To clean up and reconcile local claims with remote claims on the cluster, you can run reconcile_claims.sh.

Reconciling claims will:

  1. Optionally clean out the clusterclaims/backup/ directory
  2. Move claim folders not on the remote cluster to a clusterclaims/backup/ directory (if there are duplicate names, they will be overwritten)
  3. Re-fetch all claims found remotely using get-credentials.sh

Set export RECONCILE_SILENT="true" to bypass cleaning up the backup directory and automatically force all moves (remove and replace any duplicate backups).

Destroying a ClusterClaim and the Claimed Cluster

To delete a ClusterClaim: Note: Deleting a ClusterClaim immediately deletes the cluster that was allocated to the claim. You can view the claimed cluster via oc get clusterclaim.hive <cluster-claim-name> -n <namespace> -o json | jq -r '.spec.namespace'.

  1. oc login to the OCM/ACM/Hive cluster where your claim resides
  2. cd clusterclaims and run delete.sh (named for the oc command it will leverage)
  3. Follow the prompts, the script will guide you through the location and deletion of your clusterclaim

lifeguard's People

Contributors

chrisahl avatar dhaiducek avatar gurnben avatar kevinfcormier avatar mdelder avatar mprahl avatar ngraham20 avatar openshift-merge-robot avatar pshickeydev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lifeguard's Issues

Failed to get credentials for ready ClusterClaim

get_credentials.sh failed to fetch my credentials even though my cluster was ready.

% ./get_credentials.sh
* Testing connection
* Using cluster: https://api.collective.aws.red-chesterfield.com:6443
* 
* Using app
* Using: kevin
Cluster is not ready, current state: [Pending: :] [Hibernating: False:Running] [Unreachable: False:ClusterReachable]
Unable to extract credentials until cluster is claimed and ready.

The entire status property is missing on my ClusterClaim. Any idea if this could also occur when the ClusterClaim is not ready? I changed the condition to check for == "True" instead of != "False" in the attached PR and that fixed my issue. I changed it for both the conditions that come from ClusterClaim and those that come from ClusterDeployment, but perhaps its only ClusterClaim that has this issue. If the fix is OK, we should make the same change in apply.sh as well.

When running apply.sh or get_credentials.sh on OSX, base64 decode fails

The base64 option for decode (-d) isn't supported on OSX. However --decode is supported on RHEL and OSX.

clusterclaims (main) $ ./get_credentials.sh 
* Testing connection
* Using cluster: https://api.collective.aws.red-chesterfield.com:6443
* 
        NAME    STATUS
(1)     demo    Active
(2)     leads   Active
- note: to skip this step in the future, export CLUSTERPOOL_TARGET_NAMESPACE
Enter the number corresponding to your desired Project/Namespace from the list above: 1
* Using demo
        NAME                AGE
(1)     dario-testing-461   17d
(2)     mdelder-golf        45h
- note: to skip this step in the future, export CLUSTERCLAIM_NAME
Enter the number corresponding to ClusterClaim you want to claim a cluster from: 2
* Using: mdelder-golf
* Cluster openshift-v461-XXX online claimed by mdelder-golf.
base64: invalid option -- d
Usage:  base64 [-hvD] [-b num] [-i in_file] [-o out_file]
  -h, --help     display this message
  -D, --decode   decodes input
  -b, --break    break encoded string into num character lines
  -i, --input    input file (default: "-" for stdin)
  -o, --output   output file (default: "-" for stdout)
base64: invalid option -- d
Usage:  base64 [-hvD] [-b num] [-i in_file] [-o out_file]
  -h, --help     display this message
  -D, --decode   decodes input
  -b, --break    break encoded string into num character lines
  -i, --input    input file (default: "-" for stdin)
  -o, --output   output file (default: "-" for stdout)
{
  "username": "",

Errors are printed to STDOUT instead of STDERR

All the scripts seem to use printf to print errors to stdout before exiting. For example:

https://github.com/open-cluster-management/lifeguard/blob/main/clusterpools/apply.sh#L38

printf "${RED}Unable to create AWS Credentials Secret. See above message for errors.  Exiting."
exit 3

I think as a standard practice, errors should be printed to stderr.

I am invoking some of these scripts from other scripts, so in some cases I need to suppress the output or process the output for logging, etc. This made it difficult to debug because execution was stopped when your script exited with an error code, but I did not have the benefit of seeing the output saying that I was missing gsed.

I am happy to create a PR for this, but I wanted to check first if you are in agreement with this!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.