suse / doc-cap Goto Github PK

View Code? Open in Web Editor NEW

11.0 24.0 9.0 5.89 MB

SUSE Cloud Application Platform Documentation

Home Page: https://documentation.suse.com/suse-cap/

documentation suse enterprise container cloud-foundry kubernetes

doc-cap's Introduction

SUSE Cloud Application Platform (CAP) Documentation

This is the source for the official SUSE Cloud Application Platform (CAP) Documentation

Released versions of the documentation will be published at https://documentation.suse.com/ once available.

Branches

On Jan 29, 2020, we changed to a new branching model. We have switched the default branch from develop to master.

Use the master branch as the basis of your commits/of new feature branches.
The develop branch has been deleted on the server. Do not to push to the develop branch. Your changes may go lost in the medium term and never make it to the proper branch.

How to Update Your Local Repository

If you created a local clone or GitHub fork of this repo before Jan 29, 2020, do the following:

Make sure that your master and develop branches do not contain any important changes. If there are changes on either branch, export them using git format-patch or put them on a different branch.
Go to the master branch: git checkout master .
To pull the latest changes from the remote repository and to delete references to branches that do not exist anymore on the server, run the following command: git pull --prune .
Delete your local develop branch: git branch -D develop.
To check for stale local branches, run: git branch -v. For any branches marked as [gone], check if you still need them. If not, delete them: git branch -D BRANCHNAME

Table 1. Overview of important branches

Name	Purpose
`master`	doc development (latest development version)
`maintenance/*`	maintenance for released versions

Contributing

Thank you for contributing to this repo. When creating a pull request, please follow the guidelines below:

If you want to contribute to the most recent release, create your pull request against the master branch (not develop). The master branch is protected.
If you want to contribute to a previous release, please create your pull request against the respective maintenance/* branch. These branches are also protected.
Make sure all validation (Travis CI) checks are passed.
For your pull request to be reviewed, please tag the relevant subject matter expert(s) from the development team (if applicable) and members of the documentation team.
Implement the required changes. If you have any questions, ping a documentation team member in #susedoc on RocketChat.
For help on style and structure, refer to the Documentation Styleguide.

Editing DocBook

To contribute to the documentation, you need to write DocBook.

You can learn about DocBook syntax at http://docbook.org/tdg5/en/html .
SUSE documents are generally built with DAPS (package daps) and the SUSE XSL Stylesheets (package suse-xsl-stylesheets).

Install the documentation environment with the following command:

sudo /sbin/OneClickInstallUI https://gitlab.nue.suse.com/susedoc/doc-ymp/raw/master/Documentation.ymp

Basic daps usage:
- $ daps -d DC-<YOUR_BOOK> validate: Make sure what you have written is well-formed XML and valid DocBook 5
- $ daps -d DC-<YOUR_BOOK> pdf: Build a PDF document
- $ daps -d DC-<YOUR_BOOK> html: Build multi-page HTML document
- $ daps -d DC-<YOUR_BOOK> optipng: Always optimize new PNG images
- Learn more at https://opensuse.github.io/daps

doc-cap's People

Stargazers

Watchers

Forkers

troytop alexjh adfinis-forks hairmare btat mudler tubbz-alt ayaz345

doc-cap's Issues

Creating and installing offline (cached) buildpacks

Most of the creating part is documented in this repo:
https://github.com/SUSE/cf-buildpack-packager-docker

For installing: https://docs.cloudfoundry.org/buildpacks/custom.html#share-buildpack-package

Architecture diagram

We need a diagram explaining the architecture of CAP. We have some material but it needs to be adapted to what is suitable for customers needs.

App Autoscaler

We'll be adding the App Autoscaler into 1.2, though it'll be considered experimental at this stage. This ties into an upstream project and will be off by default.

QA still need to test this but we can start documenting what we have whilst then pointing upstream.

Trello: https://trello.com/c/3wEpZBN4/641-8-implement-app-autoscaler-non-ha
Upstream: SUSE/scf#1594

Changing cluster sizing and configuration with helm upgrade (to same version)

Change for "Backup and Restore" - Buildpacks

With cf-backup-plugin-1.0.7 and 1.0.8-pre apps don't fail, but simply need the custom buildpack to be in place before they are restored. I also thought that for buildpacks itself it's not about backup/restore, but more about managing them - e.g. restoring an app using a custom buildpack of a newer version should work fine.

Therfore we should replace:

"Buildpacks are not saved. Attempts to restore applications using buildpacks not available on the target SUSE Cloud Foundry instance will fail. Saving and restoring buildpacks has to be performed separately, and restoration completed before the backup plugin is invoked."

by something like:

"Buildpacks are not saved. Applications using e.g. custom buildpacks not
available on the target SUSE Cloud Foundry instance will not get restored.
Custom buildpacks have to be managed separately, and relevant buildpacks
need to be in place before affected applications are restored."

LDAP integration

@cornelius reported in bsc 1079615:

We need to document how to integrate with an existing LDAP for user authentication.

One part is:

Documentation for mapping the users from an external LDAP group (specially the initial IT ops ids) to CF roles (admin, orgManager) This is a common activity after setting un a customer environment and a bit convoluted step.

Added support for structured properties such as requiring client certs in the gorouter

We have added a new property ROUTER_TLS_PEM to allow client certificates to be added in the gorouter: SUSE/scf#1613
Context: https://trello.com/c/ttQFiiL2/723-we-need-to-be-able-to-require-client-certs-in-the-gorouter

Key point:

ClientSSL: Please describe how the gorouter enforces two-way-ssl (i.e. how to "require" ClientSSL certifikates). Please describe how to configure the TrustStore of those certificates.

ServerSSL: Please describe how to chain multiple certificates together to terminate SSL for multiple domains.

Expressed as stories, these would be:

As a CAP operator
I want to enable validation of client certificates in TLS handshakes with clients as described in https://docs.cloudfoundry.org/adminguide/securingtraffic.html#gorouter_mutual_auth
so that end users can run applications that are only accessible to clients with specific valid certs

As a CAP operator
I want to be specify multiple signed/issued SSL certificates for the various domains hosted on my particular CAP cluster so that end users HTTPS connections terminate SSL at the CF router for multiple domains

This is part of a new internal functionality tied to structured properties now enabled within SCF. This isn't customer facing at all but we may be adding more functionality into this sphere depending on customer requests.

Addendum for CAP running on CaaSP 3

Context: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
Internal bug: https://bugzilla.suse.com/show_bug.cgi?id=1097668

CaaSP 3 introduces PodSecurityPolicy (PSP) support (SUSE/caasp-salt#428). This change adds 2 PSP's:

unprivileged (Default assigned to all users)

The unprivileged PodSecurityPolicy is intended to be a
reasonable compromise between the reality of Kubernetes workloads, and
suse:caasp:psp:privileged. By default, we'll grant this PSP to all
users and service accounts.

privileged

The privileged PodSecurityPolicy is intended to be given
only to trusted workloads. It provides for as few restrictions as possible
and should only be assigned to highly trusted users.

Currently, all the pods are created using the default serviceAccount in their namespace so they have the unprivileged psp applied. For example, this means the nfs-broker can't create pods because privileged mode and privilege escalation are disabled by default (error: cannot set allowPrivilegeEscalation to false and privileged to true).

For CaaSP, we'll need to create our own pod policy that suits our needs.

QA CSS created a slightly different PSP as a workaround that is applied when creating the namespaces. It is based on the unprivileged PSP; the only difference is that it enables the privileged mode and the privilege escalation. This would go into a file called cap-psp-rbac.yaml:

---
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
  name: suse.cap.psp
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
spec:
  # Privileged
  #privileged: false      	<<< default in suse.caasp.psp.unprivileged
  privileged: true
  # Volumes and File Systems
  volumes:
    # Kubernetes Pseudo Volume Types
    - configMap
    - secret
    - emptyDir
    - downwardAPI
    - projected
    - persistentVolumeClaim
    # Networked Storage
    - nfs
    - rbd
    - cephFS
    - glusterfs
    - fc
    - iscsi
    # Cloud Volumes
    - cinder
    - gcePersistentDisk
    - awsElasticBlockStore
    - azureDisk
    - azureFile
    - vsphereVolume
  allowedFlexVolumes: []
  allowedHostPaths:
    # Note: We don't allow hostPath volumes above, but set this to a path we
    # control anyway as a belt+braces protection. /dev/null may be a better
    # option, but the implications of pointing this towards a device are
    # unclear.
    - pathPrefix: /opt/kubernetes-hostpath-volumes
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  #allowPrivilegeEscalation: false	   <<< default in suse.caasp.psp.unprivileged
  allowPrivilegeEscalation: true
  #defaultAllowPrivilegeEscalation: false  <<< default in suse.caasp.psp.unprivileged
  # Capabilities
  allowedCapabilities: []
  defaultAddCapabilities: []
  requiredDropCapabilities: []
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: false
  hostPorts:
  - min: 0
    max: 65535
  # SELinux
  seLinux:
    # SELinux is unsed in CaaSP
    rule: 'RunAsAny'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: suse:cap:psp
rules:
  - apiGroups: ['extensions']
    resources: ['podsecuritypolicies']
    verbs: ['use']
    resourceNames: ['suse.cap.psp']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cap:clusterrole
roleRef:
  kind: ClusterRole
  name: suse:cap:psp
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: default
  namespace: uaa
- kind: ServiceAccount
  name: default
  namespace: scf
- kind: ServiceAccount
  name: default
  namespace: stratos

As a note, it would be a good practice to explicitly set allowPrivilegeEscalation to true in the charts so the defaultAllowPrivilegeEscalation can be set to false in the PSP. This way, it gives a finer security control on what the pods are authorized to do and restrict the default authorizations.

For CaaSP 2 users, they can implement this workaround already since PSP is optional.

Document how to run CAP on non-CaaSP Kubernetes systems

@cornelius reported in bsc 1072943:

CAP runs not only on CaaSP but on any Kubernetes which is matching a few requirements on available features. We should at least document that this is possible.

Document Buildpacks

@cornelius, @troytop reported in bsc 1072939:

Create a 'Buildpacks' section which describes installation and management of buildpacks. Offline buildpacks is only one component, we should also point out (and point at upstream docs) custom buildpacks, updating (or downgrading) buildpacks, and buildpack order (i.e. the detect heuristics)

As we are shipping only online buildpacks which download dependencies at runtime most operators of CAP will want to create offline buildpacks which can be used behind the firewall without access to the Internet. We need to document how to do this. There is some upstream documentation for this.

How to control distribution of services and pods across nodes

How to ensure that an even distribution of critical and workload pods are distributed across the kubernetes/CaaSP nodes or in some cases pinned to a particular Availability/Placement zone.
For example we should have the Diego-cell pods evenly distributed across AZs(min=3) and sized so that one AZ failure appropriately evicts an places the workloads on the remaining 2 as evenly as possible to 'fit' without application failures. Also the critcal router pods and mysql dbs should have restart in pods that have the best performance and load balanced across the AZs.

2.11 upgrade needs additional helm command line option

SUSE/cf-ci#147 shows how we do this in the QA upgrade pipeline.

Upgrading from 2.10 needs a --recreate-pods option given to the helm upgrade cf command. This should only be needed for 2.11. Upgrading from 2.11 to a future version shouldn't need this because all of the pods should always be in a ready state.

How to set up a proxy registry

For installing CAP customers need to download container images on all nodes which run CAP components. To avoid having to download the same images multiple times a proxy registry caching images is required.

It also solves part of the use case where the system where CAP is set up doesn't have access to the Internet and because of this can't directly access the SUSE registry where the images are coming from. The proxy registry still would need to have access to the SUSE registry in this case, though.

We need to document how to set this up.

The terminology is a bit confusing here. We use "proxy registry" to describe a pull-through cache, i.e. a registry which either delivers the requested image if it already has it or first pulls it from another registry. Their also is the term mirror which sometimes is used in the same context. This would mean a registry which is actively populated with all images even before any of them is requested. In the upstream terminology and implementation these terms are mixed.

Helm upgrade commands need the `--recreate-pods` flag

We need to document the need for the --recreate-pods when doing helm upgrade commands.

Important: This only applies to upgrades from one version to another.

Clean up diagrams

Diagrams are fuzzy. try Inkscape SVG

Document connecting to the Azure service broker.

We've tried out CAP with the built-in Azure service broker. The steps on its usage noted here, though we have to properly summarize the information as most of this written during discovery: https://trello.com/c/sCzODvSj/716-3-test-and-document-azure-service-broker

It is important to note that support for the Azure service broker itself is handled by Microsoft, not SUSE.

Stratos 2.0.0 released

paas-devel: "The Stratos team is pleased to announce Stratos 2.0.0.

Stratos 2.0.0 is a major release and features a new version of the front-end UI built using the latest Angular technology stack and sporting a Material Design based UX. The UI layout has been improved and a number of improvements and new features added.

Both Upstream and SUSE CAP releases are available. The SUSE CAP release will be mirrored to the CAP registry and the Helm chart published as soon as we get QA sign-off.

Upstream

This is built from the Cloud Foundry GitHub repository (https://github.com/cloudfoundry-incubator/stratos) and is the Community/upstream release of Stratos. Full details are here:

https://github.com/cloudfoundry-incubator/stratos/releases/tag/2.0.0

You can install this release using Helm from our upstream Helm repository by following the instructions here:

https://github.com/cloudfoundry-incubator/stratos/tree/v2-master/deploy/kubernetes

You can, of course, upgrade from a previous 1.x or 2.x RC/Beta version using Helm - see here: https://github.com/cloudfoundry-incubator/stratos/tree/v2-master/deploy/kubernetes#upgrading-your-deployment.

SUSE CAP

The SUSE CAP release is built from the SUSE/stratos GitHub repository (https://github.com/SUSE/stratos) and is a fork of upstream with a couple of changes - mainly a slightly different theme, a copyright message and inclusion of the EULA. In addition, images are built on top of SLE rather than OpenSUSE (which is used for the upstream release).

You can also install this release using Helm. You will need to download the Helm chart from here:

https://github.com/SUSE/stratos-cap/releases/download/2.0.0-cap/console-helm-chart-2.0.0-cap.tgz

When installing, you will need to provide credentials for the internal Docker registry.

I have created a PR in the kubernetes-charts-suse-com GitHub repository for the Helm chart changes for this release - this will not work until the images have been signed and copied to the public registry.

As soon as QA have signed off, I will mirror the images to the CAP registry and merge the Helm chart PR."

Document DNS requirements, with examples

Currently, the Deployment Guide handwaves away name services. We must provide authoritative requirements, and stop evading with xip.io and nip.io.

The template for postgres usb deployement has wrong indentation

@greygoo reported in bsc 1100388:

When trying to deploy postgres following the instructions at http://docserv.nue.suse.com/documents/CAP_1/cap-deployment/single-html/#sec.cap.configure-postgresql I realized that the template for usb-config-values.yaml does contain a wrong indentation.

This is the snippet from the docs, I marked the lines which are not correctly indented with an "X":

----------------------------------snip--------------------------------------

env:
  # Database access credentials
   SERVICE_POSTGRESQL_HOST: postgres.example.com
   SERVICE_POSTGRESQL_PORT: 5432
   SERVICE_POSTGRESQL_USER: pgsql-admin-user
   SERVICE_POSTGRESQL_PASS: pgsql-admin-password
  # The SSL connection mode when connecting to the database.  For a list of
  # valid values, please see https://godoc.org/github.com/lib/pq
X SERVICE_POSTGRESQL_SSLMODE: disable
  
  # CAP access credentials, from your original deployment configuration 
  # (see Section 2.4, “Configuring the SUSE Cloud Application Platform Production Deployment”)
  CF_ADMIN_USER: admin
  CF_ADMIN_PASSWORD: password
  CF_DOMAIN: example.com
  
  # Copy the certificates you extracted above, as shown in these
  # abbreviated examples, prefaced with the pipe character
  
  # SCF cert
  CF_CA_CERT: |
   -----BEGIN CERTIFICATE-----
   MIIE8jCCAtqgAwIBAgIUT/Yu/Sv4UHl5zHZYZKCy5RKJqmYwDQYJKoZIhvcNAQEN
   [...]
   xC8x/+zT0QkvcRJBio5gg670+25KJQ==
   -----END CERTIFICATE-----
   
   # UAA cert
X  UAA_CA_CERT: |
    -----BEGIN CERTIFICATE-----
    MIIE8jCCAtqgAwIBAgIUSI02lj0a0InLb/zMrjNgW5d8EygwDQYJKoZIhvcNAQEN
    [...]
    to2GI8rPMb9W9fd2WwUXGEHTc+PqTg==
    -----END CERTIFICATE-----
   SERVICE_TYPE: postgres   
    
Xkube:
X organization: cap
X registry: 
X   hostname: "registry.suse.com"
X   username: ""
X   password: ""

----------------------------------snip--------------------------------------

This would be a correctly indented config:

----------------------------------snip--------------------------------------

env:
  # Database access credentials
  SERVICE_POSTGRESQL_HOST: 172.24.44.151.xip.io
  SERVICE_POSTGRESQL_PORT: 5432
  SERVICE_POSTGRESQL_USER: pgsql-admin-user
  SERVICE_POSTGRESQL_PASS: postgres
  # The SSL connection mode when connecting to the database.  For a list of
  # valid values, please see https://godoc.org/github.com/lib/pq
  SERVICE_POSTGRESQL_SSLMODE: disable
  
  # CAP access credentials, from your original deployment configuration 
  # (see Section 2.4, “Configuring the SUSE Cloud Application Platform Production Deployment”)
  CF_ADMIN_USER: admin
  CF_ADMIN_PASSWORD: password
  CF_DOMAIN: 10.86.1.13.xip.io
  
  # Copy the certificates you extracted above, as shown in these
  # abbreviated examples, prefaced with the pipe character
  
  # SCF cert
  CF_CA_CERT: |
    -----BEGIN CERTIFICATE-----
    MIIE8jCCAtqgAwIBAgIUK6y/yKkVbYbfvWXXcFCX6zF5iiowDQYJKoZIhvcNAQEN
    BQAwETEPMA0GA1UEAxMGU0NGIENBMB4XDTE4MDcwNjA3MjkwMFoXDTQ4MDYyODA3
    MjkwMFowETEPMA0GA1UEAxMGU0NGIENBMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A
    MIICCgKCAgEAzewG2q7g8bNF5TrmeaP3rLaiGDqi5YanszFK69RhHFivtV9zHnbj
    YwAbCe7BuY8oFkEmgYdwscRgX4IG/BhtaAtv9Bvkmt668IbNsvHlgXkHeTYz6fWR
    d5mJTnyIIOaguYW97WfoO6tecdJvXzhvzPWkzZTV6+fOSjlGtRk9o3WhAclA03l7
    63+KCXqTHmWU5iGzdOFMEXvyzgkAVP4Mby4dVWIN+TAO/RyX0dxaBC2enNYxUadW
    LwrE2rJ323/kT5dzITzdWPHmMBiYHomgQrXtcbfWblmYjGzF59hotfNEuYn/rBUW
    ub1J2QLA3rSIISqafhaqjYMj5WInnBWypY+yZr8qnrmUNruICDkwkpacbbmZGUyE
    UJhp/iI+YSiwVyX7BWN3uvtnbmGyh+NVUqvpB2fS6o4tigiVBfy+2hy6RLoWTFew
    UDhxNGBC4uuPxBg/6T4JrToflG9bXGaz3OP7eko41vqWbg1uNuNi6Zxn5DuAwZb1
    g+Tj8mpUZKNdVBJCF0l+aKU4mJZvSOtOWKliG5r0hstZ2T6eUR+g2VcuL2+XG/++
    /kVR/FNhOyWjzAAXp5nx/B9ZWfatLoh0bPXzxBWAMTbYLr91U/HIO+5vcZgd5xAo
    yj9isUY9c8rRuYdGpLhenwH4yvxT6kDxgoFkYHRIVl8gfaZB/7JmWV0CAwEAAaNC
    MEAwDgYDVR0PAQH/BAQDAgEGMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFKob
    g9okSlD3UhrVIv+QXGLzGfIAMA0GCSqGSIb3DQEBDQUAA4ICAQB2BKWdmq1YCiMK
    F/RDXf/AytC4Xn21nYltxM95LyS0sueoMv2aFj4DD9ujgpzhpLBg/6j0KdWrNxzR
    GJ8ot+4ySRqI4ymJLu2cxj6254uzLvPIT6C2oHXrsZ/jpaOdJAy1tlloDyM73rNC
    GoaJ8xRdzqN4WCin8aDryAxtKMULKDf5CpVFCrjEV3Cp2IpXJXk1KkOsTTBJhY4o
    s3QaVcFkya0wtEXFiDhORDcIeK5t7s0oznhj5ezcMlu7CFmBqbO+q7CTRbldJEhj
    rbQirJrJ9RA4LfF/HVYiw6KQ3On4dd1hMRJnfBVz9TkMWhIqnA6na2265qQfuHPS
    TMRWgHjeIUo8d6kPADwybwE5vAcJGhNTYuBVprXWoGWpDKWQoStEXrXYxYGOjYbv
    f9JTQQoL7C7ZlzHEak6m9QPJI4zd9kJIyh0SZumTatEmo6yVBIrhVgT0q0oCV/1x
    QAiIKPIo/Mc8rW/iwBLU0vRUs2MvM+iqN8xsuHfrMdMfmZ48KlUrG/R+30UakA6I
    wdM45jMDWMvoCKbV1C+f1E9zeptaSAuAM6/SGPYaI8zxZiCb+eErM1t5U5XiAYnW
    bGYpmW1FAyfnJLXPz6LBshle2NvVxRtA/5pFDj6jzAE5yX5RL8ivW0zTLa/os4Ll
    oA2yHX7/uLZ7Dz3hNQnlHg5UE402Fw==
    -----END CERTIFICATE-----
   
  # UAA cert
  UAA_CA_CERT: |
    -----BEGIN CERTIFICATE-----
    MIIE8jCCAtqgAwIBAgIUMvKiDUPnkZzwU3NgfvXJpgmcUDwwDQYJKoZIhvcNAQEN
    BQAwETEPMA0GA1UEAxMGU0NGIENBMB4XDTE4MDcwNjA3MjQwMFoXDTQ4MDYyODA3
    MjQwMFowETEPMA0GA1UEAxMGU0NGIENBMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A
    MIICCgKCAgEAoqJMj3i/CU3useDnrPOKURkcFZRhUiBzM2h73bs7Kof3qK+vTQpr
    e4rTf2MIsXds4+GYxdDCI2HbitiXQZkZS1n3uYurRuBMwn7mJpHTTu6VdBy68wQA
    U5AtvjIYl3N3BSJ8beVc3s/NjGgG1gOnZ0nKjrEl+sxALulQAM6ujisJ5M9dVvzU
    ydyMGsC+KBS8IoR37wByQ34XL+SMdw+WVNzAXn0BpLLxM8Owzz4CAjDTW4c2BSVo
    CCcU+/TG+Lu5CsgrqBGyhPBSiupbhAPQO0N/+kV5xyRA5NTnhqq8rtRuXqrgwY9T
    Mo3Lv4lYPP3YhaMyxZ+BTQR81/FqL/e5Mn2T1EmHj5yHaVmZtUGivjpIu83Ho+Ie
    /VUhjk0ZHoN3Qv2/etu//i37P1obyUrjfPZgkdGShK3N/Vpw+3ELT+CHUJpcPWn1
    DoNLtI4UKjQnh1XfudZKSB4bU+rgqZVmGsAMyrMFwKD/KlPSf9FmLe3JiEjdlzvh
    Yh6vcuJE5X0WqYypVQzq3Rji+erva1Vg8gf9kYmShtAbNVx8NO7Jat9WeqbgyFVz
    dNhHQqG/0d4RhsVChRUCHnhKEPWvRZ/bW4TSQmK8wqxVJBCPC+hZ436BrdEnWI/o
    wuS6h3XJEE6JMHYmcdX88gXP1ij8qs4DqjG0Geh77MrGMfBJnjR49X0CAwEAAaNC
    MEAwDgYDVR0PAQH/BAQDAgEGMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFBAe
    OdcaZWm7KUMUbmfelqXGJ+XGMA0GCSqGSIb3DQEBDQUAA4ICAQAhu5bpQQK2P8in
    Al/hKmW/plOtJEw94aBprPYkSJojPaw2AEGXz1IelW8SBYTaAfZJGRJwQf/L+Rfq
    eV4EXN82zQz3SEqjk2xZ4Rg/YAXPkmPfS8az4Acb/ONtipfp4M3jrcmiKSoekbF2
    MNkfO+Xng9EAEUg0klZFu+tfGcMSGV7q2vWiYF80FQKQyJ6U8QwgA0HB3SBuSQ/H
    lVhQr9E8ohCDx7DurW9vK18OhpJHFoQlivfer/lBp87ofXQgeFu7HSEN3wyjSPB2
    zaCmw0MjeYlX5dtQA6F1e7zy6M3XhnWaq3UtFAb36AutJgIQq4PkCuWUp2NcwlbB
    xk2QNssfV/doxhguURgk8wQP04Wn/jdR1wgyiZKXPcxuLBkWJ2o58cf3paVq7lL+
    Cau/y9ODC3UTUg4NBsQ8bkIE3GaS9wTF0WXy2wlnnXr8TqGNXCq+gPniSq2vzugi
    qZ+9cm3yA2V0AN0wTaKBVtZynrStsl8HTCRcQYCC4EZoco4+lvxVVEimRLp4mOJD
    pYRoBvliBZQTtYhAxeK3fG2cH+jkS7OW9ML4oF0ObiIh2CLnZOXMU4ZRAIh2ZeKe
    k29e7yrAHds4C/RphDtOPV1/QNIEBGXDodxPB++V4bnDsfXEx8MsIzuy4+OXKXKl
    d6ccko7iKHt1WoDKWzsz3Ey1U66SMg==
    -----END CERTIFICATE-----
  SERVICE_TYPE: postgres   
    
  kube:
    organization: cap
    registry: 
      hostname: "registry.suse.com"
      username: ""
      password: ""

----------------------------------snip--------------------------------------

Example HA Configuration - Wrong UAA HA sizing parameters

The current UAA HA sizing parameters for the uaa-sizing.yaml seem to be wrong and were probably copy and pasted from the SCF sizings.

The uaa-sizing.yaml yaml file should probably look similar to:

sizing:
  mysql:
    count: 3
  mysql_proxy:
    count: 3
  uaa:
    count: 3

However, I'm unaware of the actual recommended HA sizing parameters.

Custom certificates

It would be helpful to document how to use custom certificates. This should answer questions like:

How do I deploy customer provided/CA certs across the cluster for different scenarios of SSL termination?
What techniques/scripts are there to propagate and update these certificates are there?
Where do custom certificates go, where are they stored, in which format, what are the relevant configuration files?
How to setup certificates on an HA Proxy or Nginx acting as load balancer ahead of the routers?

Document load balancer setup on Azure

We need to document how to set up load balancers with CAP on Azure. There is some more info in Bugzilla.

Upgrading a non-High Availability Deployment to High Availability

2.14.1 Upgrading a non-High Availability Deployment to High Availability

http://docserv.nue.suse.com/documents/CAP_1/cap-deployment/single-html/#sec.cap.ha-prod-upgrade

Some backslashes are missing in:

helm upgrade suse/cf
--name susecf-scf
--namespace scf
--values scf-config-values.yaml
--values scf-sizing.yaml
--set "secrets.UAA_CA_CERT=${CA_CERT}"

should be

helm upgrade suse/cf
--name susecf-scf
--namespace scf
--values scf-config-values.yaml
--values scf-sizing.yaml
--set "secrets.UAA_CA_CERT=${CA_CERT}"

Changes to cf role composition for 1.2.1

We've changed some of the role names as we move more in line with what the Cloud Foundry folks are doing upstream in later versions, so the manifest file will require updates from earlier CAP versions.

database role combines the previous mysql and mysql-proxy roles
diego-locket role merged into diego-api
log-apirole combines loggregator and syslog-rlp roles
Renamed syslog-adapterrole to adapter
Removed processes list from all roles
Removed duplicate routing_api.locket.api_location property

How to configure log level

The following options are exposed in the suse/cf Helm chart. These can be set in the values.yaml file when deploying or updating CAP:

  # The log destination to talk to. This has to point to a syslog server.
  SCF_LOG_HOST: ~

  # The port used by rsyslog to talk to the log destination. It defaults to 514,
  # the standard port of syslog.
  SCF_LOG_PORT: "514"

  # The protocol used by rsyslog to talk to the log destination. The allowed
  # values are tcp, and udp. The default is tcp.
  SCF_LOG_PROTOCOL: "tcp"

  # The cluster's log level: off, fatal, error, warn, info, debug, debug1,
  # debug2.
  LOG_LEVEL: "info"

May need some supporting information about what is captured at each log level if we can get that information.

logging to external syslog server

#130
#99

Teardown instructions.

This isn't urgent but we should take note of what's required to remove SCF/UAA.

PR: SUSE/scf#1611
Instructions: SUSE/scf@e6a572a

Context: https://trello.com/c/tbcLZ8FR/693-5-document-how-to-clean-up-an-scf-uaa-installation

add general guidance on DNS configuration for load balancers [edit]

List CAP DNS requirements and brief overview of using Azure DNS

Prerequisites

7.1 Prerequisites

http://docserv.nue.suse.com/documents/CAP_1/cap-deployment/single-html/#sec.cap.prereqs-service-broker

Regarding the command for "Then apply this new PSP configuration before you deploy your new service broker sidecar with this command: "

Considering the objects of cap-psp-rbac.yaml are already created, using 'kubectl create' would issue an error saying it's already created so you have to use 'kubectl apply'.

With apply, kubectl will say the created objects with no modifications are "unchanged" and it will modify "applied" the objects you want to modify.

Production grade load balancers

@cornelius reported in bsc 1079611:

Document how to setup a 'production grade' load-balancer and then updating the configuration with the routers for the CF cluster. Listing the links for quick look up for 'production grade' load-balancers (e.g F5 and all common ones)

This might be related to setting up load balancers via helm bsc 1073096

Expand PSP information to cover sidecars

Some excitement today whereby we discovered that sidecars weren't working with CaaSP v3.

PodSecurityPolicy was the issue whereby we had to create rules so that the documented postgres-sidecar and mysql-sidecar namespaces were similar to what we have for scf, uaa and stratos-ui. Our main CaaSP section should be applied to the MySQL and Postgres sidecar sections to indicate that whatever namespace is used, it will need a PSP. Without the PSP, the sidecar will fail on any version of CAP installed on CaaSP v3 (or v2 if PSP is optionally used).

Bug discovered: https://trello.com/c/nnpMhl8S/754-usb-postgres-sidecar-fails-when-deploying-with-context-deadline-exceeded
Section of docs with current PSP info: https://www.suse.com/documentation/cloud-application-platform-1/singlehtml/book_cap_deployment/book_cap_deployment.html#sec.cap.caasp-3
Section of docs that need PSP info: https://www.suse.com/documentation/cloud-application-platform-1/singlehtml/book_cap_deployment/book_cap_deployment.html#sec.cap.configure-mysql (and Postgres)

Sizing guide

The CAP setup can be controlled by sizing information in the Helm configuration. This configuration controls how many instances of each pod are created and how much memory is required. We need a guide for operators to help them chose the right values based on the parameters of their requirements.

Sizing guide and syslog server

This issue replaces #130, #99, and #97

Sizing guide

#97

Test Storage Class Doc Error

Hi,

There is an indentation error in the test-storage-class.yaml example.
StorageClassName is not a member of requests but spec

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-sc-persistent
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
      storageClassName: persistent

Should be:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-sc-persistent
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: hostpath

2.3 Test Storage Class

https://www.suse.com/documentation/cloud-application-platform-1/singlehtml/book_cap_deployment/book_cap_deployment.html#sec.cap.teststorageclass-prod

Document HA option

@cornelius reported in bsc 1096212:

There is an option in the config now to enable HA defaults so that you don't have to set options for all services separately. It's config.HA. We should document this as it makes an HA setup as simple as it can get.

HA config for cc-clock and syslog-scheduler

Document updated config to allow HA for cc-clock and syslog-scheduler

Unclear AKS introduction

@cornelius reported in bsc 1094675:

There are two things which appear to be unclear to me in the introduction to the AKS section (http://docserv.suse.de/documents/CAP_1/cap-deployment/single-html/#cha.cap.depl-azure):

It talks about setting up an un-managed Kubernetes on AKS. AKS is a managed Kubernetes so I think it would be better to just say how to prepare AKS for installation of CAP, and maybe explain that AKS is a managed Kubernetes service.

The mention of the Kube minion is a bit confusing. On one hand I think the terminology used by Kubernetes is "node" now, not "minion" anymore. And on the other hand it's not clear what this actually means for somebody installing CAP. Is this a production setup or not? Would a load balancer make it one?

Setting up Azure AKS for CAP

Just "saving" the commands here, so that they don't just live on some disks.

Below, the commands on which we should base our official documentation for Azure AKS.
Those are based on the blog from Thomas Conte (http://hypernephelist.com/2018/05/17/deploying-suse-cap-on-aks.html), but fine tuned and do work around Azure/AKS#412.

# Prequisites
# kubectl, curl, sed, awk and jq commands


#
# Set parameters for your Azure AKS Cluster

# Set Resource Group name
export RGNAME="cap-aks"

# Set Azure Location
export REGION="westeurope"

# Set Kubernetes Node Count (min. 3 for CAP)
export NODECOUNT="3"

# Set Virtual Machine Size to use
export NODEVMSIZE="Standard_D2_v2"

# Set your public ssh key associated with your Azure account
export SSHKEYVALUE="~/.ssh/id_rsa.pub"


#
# Create the Resource Group

az group create --name $RGNAME --location $REGION


#
# Create the AKS Instance
# If you don't change the name, it will be the same as the "Resource Group"

export AKSNAME=$RGNAME

az aks create --resource-group $RGNAME --name $AKSNAME --node-count $NODECOUNT --admin-username "scf-admin" --ssh-key-value $SSHKEYVALUE --node-vm-size $NODEVMSIZE


#
# Retrieve the kubectl credentials

mv ~/.kube/config ~/.kube/config.old

az aks get-credentials --resource-group $RGNAME --name $AKSNAME


#
# Check kubernetes access

kubectl get nodes
kubectl get pods --all-namespaces


#
# Identify and set the Cluster Resource Group to work with

export MCRGNAME=$(az group list -o table | grep MC_"$RGNAME"_ | awk '{print$1}')


#
# Enable swap accounting

vmnodes=$(az vm list -g $MCRGNAME | jq -r '.[] | select (.tags.poolName | contains("node")) | .name')

for i in $vmnodes
do
   az vm run-command invoke -g $MCRGNAME -n $i --command-id RunShellScript --scripts "sudo sed -i 's|linux.*./boot/vmlinuz-.*|& swapaccount=1|' /boot/grub/grub.cfg"
done

for i in $vmnodes
do
   az vm restart -g $MCRGNAME -n $i
done


#
# Create and configure a Public IP and Load Balancer

az network public-ip create --resource-group $MCRGNAME --name $AKSNAME-public-ip --allocation-method Static

az network lb create --resource-group $MCRGNAME --name $AKSNAME-lb --public-ip-address $AKSNAME-public-ip --frontend-ip-name $AKSNAME-lb-front --backend-pool-name $AKSNAME-lb-back

nicnames=$(az network nic list --resource-group $MCRGNAME | jq -r '.[].name')

for i in $nicnames
do
    az network nic ip-config address-pool add \
    --resource-group $MCRGNAME \
    --nic-name $i \
    --ip-config-name ipconfig1 \
    --lb-name $AKSNAME-lb \
    --address-pool $AKSNAME-lb-back
done


#
# Configure load balancing and network security rules

# Relevant ports for CAP
export CAPPORTS="80 443 4443 2222 2793"

# Relevant ports for CAP incl. Stratos Web Console
export CAPPORTS="80 443 4443 2222 2793 8443"

for i in $CAPPORTS
do
    az network lb probe create --resource-group $MCRGNAME \
    --lb-name $AKSNAME-lb \
    --name probe-$i \
    --protocol tcp \
    --port $i

    az network lb rule create --resource-group $MCRGNAME \
    --lb-name $AKSNAME-lb \
    --name rule-$i \
    --protocol Tcp \
    --frontend-ip-name $AKSNAME-lb-front \
    --backend-pool-name $AKSNAME-lb-back \
    --frontend-port $i \
    --backend-port $i \
    --probe probe-$i
done

nsg=$(az network nsg list --resource-group=$MCRGNAME | jq -r '.[].name')
pri=200

for i in $CAPPORTS
do
    az network nsg rule create --resource-group $MCRGNAME \
    --priority $pri \
    --nsg-name $nsg \
    --name $AKSNAME-$i \
    --direction Inbound \
    --destination-port-ranges $i \
    --access Allow

    pri=$(expr $pri + 1)
done


#
# Print the public and private IPs for later use

echo -e "\n Resource Group:\t$RGNAME\n \
Public IP:\t\t$(az network public-ip show --resource-group $MCRGNAME --name $AKSNAME-public-ip --query ipAddress)\n \
Private IPs:\t\t\"$(az network nic list --resource-group $MCRGNAME | jq -r '.[].ipConfigurations[].privateIpAddress' | paste -s -d " " | sed -e 's/ /", "/g')\"\n"

Sample scf-config-values.yaml file for CAP on Azure AKS

secrets:
    # Password for user 'admin' in the cluster
    CLUSTER_ADMIN_PASSWORD: changeme

    # Password for SCF to authenticate with UAA
    UAA_ADMIN_CLIENT_SECRET: uaa-admin-client-secret

env:

    # Domain for SCF. DNS for *.DOMAIN must point to the a kube node's (not master)
    # external ip. This must match the value passed to the
    # cert-generator.sh script.
    DOMAIN: <Public IP>.xip.io
    # UAA host/port that SCF will talk to. If you have a custom UAA
    # provide its host and port here. If you are using the UAA that comes
    # with the SCF distribution, simply use the two values below and
    # substitute the cf-dev.io for your DOMAIN used above.
    UAA_HOST: uaa.<Public IP>.xip.io
    UAA_PORT: 2793
    #Azure deployment requires overlay
    GARDEN_ROOTFS_DRIVER: "overlay-xfs"
kube:
    # The IP address assigned to the kube node pointed to by the domain. 
    external_ips: [<Private IPs>]
    storage_class:
        # Make sure to change the value in here to whatever storage class you use
        persistent: "default"
        shared: "shared"
    # The registry the images will be fetched from. The values below should work for
    # a default installation from the suse registry.
    registry:
       hostname: "registry.suse.com"
       username: ""
       password: ""
    organization: "cap"

    auth: none

Upgrading a non-High Availability Deployment to High Availability

2.14 Example High Availability Configuration
2.14.1 Upgrading a non-High Availability Deployment to High Availability

http://docserv.nue.suse.com/documents/CAP_1/cap-deployment/single-html/#sec.cap.ha-prod
http://docserv.nue.suse.com/documents/CAP_1/cap-deployment/single-html/#sec.cap.ha-prod-upgrade

If UAA is deployed in HA (2 in this examples), getting the SECRET returns duplicated values, hence, getting the CA_CERT will return an empty value:

$ SECRET=$(kubectl get pods --namespace uaa
-o jsonpath='{.items[*].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretKeyRef.name}')

$ echo $SECRET
secrets-2.10.1-1 secrets-2.10.1-1

$ CA_CERT="$(kubectl get secret $SECRET --namespace uaa
-o jsonpath="{.data['internal-ca-cert']}" | base64 --decode -)"

$ echo $CA_CERT

This can be fixed by adding a simple awk to get the first value:

$ SECRET=$(kubectl get pods --namespace uaa
-o jsonpath='{.items[*].spec.containers[?(.name=="uaa")].env[?(.name=="INTERNAL_CA_CERT")].valueFrom.secretKeyRef.name}' | awk {'print $1'})

Set up NFS Volume Service

We need procedural docs on how an operator sets the PERSI_NFS_* values in scf-config-values.yaml and then cf enable-service-access ... to configure the NFS volume service for user apps.

https://github.com/cloudfoundry/nfs-volume-release

Here are the values from the suse/cf helm chart with comments:

  # Comma separated list of white-listed options that may be set during create
  # or bind operations.
  # Example:
  # "uid,gid,allow_root,allow_other,nfs_uid,nfs_gid,auto_cache,fsname,username,password"
  PERSI_NFS_ALLOWED_OPTIONS: "uid,gid,auto_cache,username,password"

  # Comma separated list of default values for nfs mount options. If a default
  # is specified with an option not included in PERSI_NFS_ALLOWED_OPTIONS, then
  # this default value will be set and it won't be overridable.
  PERSI_NFS_DEFAULT_OPTIONS: ~

  # Comma separated list of white-listed options that may be accepted in the
  # mount_config options. Note a specific 'sloppy_mount:true' volume option
  # tells the driver to ignore non-white-listed options, while a
  # 'sloppy_mount:false' tells the driver to fail fast instead when receiving a
  # non-white-listed option."
  #
  # Example:
  # "allow_root,allow_other,nfs_uid,nfs_gid,auto_cache,sloppy_mount,fsname"
  PERSI_NFS_DRIVER_ALLOWED_IN_MOUNT: "auto_cache"

  # Comma separated list of white-listed options that may be configured in
  # supported in the mount_config.source URL query params
  #
  # Example: "uid,gid,auto-traverse-mounts,dircache"
  PERSI_NFS_DRIVER_ALLOWED_IN_SOURCE: "uid,gid"

  # Comma separated list default values for options that may be configured in
  # the mount_config options, formatted as 'option:default'. If an option is not
  # specified in the volume mount, or the option is not white-listed, then the
  # specified default value will be used instead.
  #
  # Example:
  # "allow_root:false,nfs_uid:2000,nfs_gid:2000,auto_cache:true,sloppy_mount:true"
  PERSI_NFS_DRIVER_DEFAULT_IN_MOUNT: "auto_cache:true"
 
  # Comma separated list of default values for options in the source URL query
  # params, formatted as 'option:default'. If an option is not specified in the
  # volume mount, or the option is not white-listed, then the specified default
  # value will be applied.
  PERSI_NFS_DRIVER_DEFAULT_IN_SOURCE: ~

  # Disable Persi NFS driver
  PERSI_NFS_DRIVER_DISABLE: "false"

  # LDAP server host name or ip address (required for LDAP integration only)
  PERSI_NFS_DRIVER_LDAP_HOST: ""

  # LDAP server port (required for LDAP integration only)
  PERSI_NFS_DRIVER_LDAP_PORT: "389"

  # LDAP server protocol (required for LDAP integration only)
  PERSI_NFS_DRIVER_LDAP_PROTOCOL: "tcp"

  # LDAP service account user name (required for LDAP integration only)
  PERSI_NFS_DRIVER_LDAP_USER: ""

  # LDAP fqdn for user records we will search against when looking up user uids
  # (required for LDAP integration only)
  # Example: "cn=Users,dc=corp,dc=test,dc=com"
  PERSI_NFS_DRIVER_LDAP_USER_FQDN: ""

Document Admin password change

cf backup tool: refinement to how restore actually works.

There have been assumptions as to how the backup/restore plugin does restores, so we have some updates in a couple PRs:
cloudfoundry-incubator/cf-plugin-backup#60
cloudfoundry-incubator/cf-plugin-backup@aecdcb8

Context: https://trello.com/c/e1GeBtZb/712-cf-backup-better-documentation-as-to-how-restore-works

Enable Travis CI

I'm not an admin in this repo. You can go to Settings -> Integrations & Services and add "Travis CI". Just enter "notify.travis-ci.org" into the domain text field.

Backup/Restore

@cornelius reported in bsc 1079617:

We need to document how to backup and restore a CF cluster so that it can be reinstalled without losing data.

Backup/Restore will be part of the 1.1 release. We will release it as an upstream plugin so it has a little bit different status than the rest of the product which we deliver through our own channels with the standard SUSE support.

Feature work on the backup and restore CLI plugin is tracked on Trello: https://trello.com/c/LW3nZuG9

Log to ELK stack

The ELK stack (Elasticsearch, Logstash, Kibana) is a popular choice to handle logging. It would be good to have a guide how to configure CAP to drain logs to Logstash.

A welcome enhancement would be to document how to achieve this via helm install for ELK and pass the config into CF helm install.

How to set up a registry for an air-gapped environment

When the CAP system is completely separated from the Internet (i.e. there is an air-gap), there needs to be a way how to get the container images from the SUSE registry to the system where they are supposed to be installed. This can be solved by using a registry, pushing images there, and then moving the registry to the separated environment.

We need to document how to do that.

cf-backup-plugin has no details into what it doesn't do.

This work may be already underway but the backup/restore plugin currently has a table regarding the scope of what it does do.

We should expand the scope in some way to incorporate what it doesn't do. For instance, we should tell operators prior to backing up that it won't do certain tasks, like handle custom buildpacks. The table we have in https://github.com/SUSE/cf-plugin-backup/blob/master/README.md doesn't feel like the right place for information like that; operators should be told earlier.

This type of list will need expansion but we can accommodate that for future updates. We need a structure of some sort first.