ibm / cp4d-deployment Goto Github PK

The repo provides instructions on deploying CloudPakforData on the public cloud.

License: Apache License 2.0

cp4d-deployment's Introduction

Cloud Pak for Data

IBM Cloud Pak for Data is an end-to-end platform that helps organizations in their journey to AI. It enables data engineers, data stewards, data scientists, and business analysts to collaborate using an integrated multiple-cloud platform. Cloud Pak for Data uses IBM's deep analytics portfolio to help organizations meet data and analytics challenges. The required building blocks (collect, organize, analyze, infuse) for information architecture are available using Cloud Pak for Data.

Cloud Pak for Data (any version) can be deployed on OCP provided it meets pre-reqs as defined by the Knowledge Center. Through the marketplace offerings on Amazon Web Services , Microsoft AZURE and IBM Cloud Catalog, the user can do a one click automated deployment of CP4D clusters with a pre-defined configuration.

This repository contains deployment steps to get you started on setting up Cloud Pak for Data on IBM Cloud Satellite locations.

Cloud Pak for Data on IBM Cloud Satellite locations
- AWS
- On-Premises
- Azure

Upgrade and Post-Installation activities

Please see Updating OpenShift Container Platform for upgrading RedHat Openshift Container Platform and Upgrading Cloud Pak for Data for upgrading IBM Cloud Pak for Data installation.

For Post-Installation activities on the cluster,please refer to the instructions at Administering Cloud Pak for Data

cp4d-deployment's People

Contributors

Stargazers

Watchers

Forkers

satyamodi shaithal skwatra1992 ajiyos bhaskers-blu-org1 akinfemi markmoloney matt-wyke pelletierkevin praveshcompiles daslerjo noorgarawi justsmile4 hs312205 golzt gudrunka muthukumarbala07 parthakom2 jkwonl shrumit wahl7 hkamel juliabamford sionartingstall vivprata swjwizard benswinney intellits sherryyyu sandeepbhootna nerav-doshi moezilla402 jsbohm samayamadhavan kairen pazderab hongweijia vibes360 chayan88 cambelem ekambaraml shubhamkale31 krunalsagathiya cohenjo92 lionelmace khongks drodriguezflores yliu138 ibmbcajshemi atrodriguez91 aribakhan97 vamshicholleti93 adamjm ogfunkycold maulikshah999 henrywork02 therayy vishwajit-dandage santoshpawaribm shankarpentyala07 praveenmail2him startrekor ayamiohhira asifma massimocaprinali githubdemo30 alejandrodelgado esughos fawazsiddiqi fdescollonges anilbojja6990 heathkinder yanagih jiangxuemichelle nijokjhcl ce-anz-ato-mbr dvasilen mcltn faraz shajeena shankar-pentyala linhvo-ibm alyfantisd amir-khan17 davidbesada pmdarius sandeepravutla473 ghas-results

cp4d-deployment's Issues

AWS Private cluster and other changes.

based on whether cluster is private /public we need to use private_ip or public_ip to connect to bootnode and execute terraform commands.
Mandatory variables validation.
Create separate templates for healthcheck,autoscaler and workerocs for SZ and MZ
Provide a way to allow user to defined AZ to be used
EFS mount issue and encryption.
Update terraform destroy documentation
On Failure exit for storage and service installation

Add an option to change the EFS performance mode

There are two EFS performance modes, and some users will benefit from using the Max I/O Performance Mode

https://aws.amazon.com/premiumsupport/knowledge-center/linux-efs-performance-modes/
https://docs.aws.amazon.com/efs/latest/ug/performance.html#performancemodes

[managed-openshift/ibmcloud] Error installing wkc

When you deploy the WKC, looks like the pod c-db2oltp-wkc-db2u-0 never get ready, and restart again and again.

Connecting to the pod, could see that the script db2u_entrypoint.sh is registering the db2, applying licence, but the command never ends.
db2inst1 944 1 0 22:48 pts/0 00:00:00 /bin/bash -x /db2u/db2u_entrypoint.sh
db2inst1 1381 944 0 22:48 pts/0 00:00:00 /mnt/blumeta0/home/db2inst1/sqllib/adm/db2licm -a /db2u/license/db2u-lic

I have tried to do it manually with the command db2licm, but the same behavior

Any idea why?

autocaler and efs script changes

Parameterize
minReplicas = 1
max-total-nodes = 24
Append cluster id to efs security group name.

EFS Provisioner is not resilient

Based on the configuration:
https://github.com/IBM/cp4d-deployment/blob/master/aws/efs_module/efs-provisioner.yaml

the pod wouldn't resist a node/cluster restart or a simple delete.

This should be wrapped in a deployment to make it more resilient.

Issue in the terraform destroy -target null_resource.destroy_cluster -var-file="Path To osaws_var.tfvars file" step

When running the null_resource.destroy_cluster step - the process is unable to delete the 3 private subnets.
aws_subnet.private1[0]
aws_subnet.private2[0]
aws_subnet.private3[0]

terraform destroy process fails with timeout.

Single zone cluster creation on Azure does not work with OCP4.5

When creating an Azure cluster in a single zone with 3 master and 3 worker nodes, the cluster creation ends with 5 master nodes and at least 3 worker nodes. The master nodes are distributed over 2 availability zones, like in the following example. The problem is reproducible:

[Known Issue] IBM cloud: dial tcp i/o timeout

Applying the templates from ibmcloud sometimes results in an error such as the following. This is caused due to a race condition between the kubernetes provider and the cluster config data source.

Post "https://c100-e.jp-tok.containers.cloud.ibm.com:30625/api/v1/namespaces/kube-system/secrets": dial tcp 127.0.0.1:30625 i/o timeout

The solution is to run terraform apply again and the installation will continue as normal.

Portworx operator version changed to 1.4.1

Portworx operator version changed from 1.4.0 to 1.4.1

Portworx provisioning failing.

Portworx provisioing failing. Inorder to troubleshoot how do I log in into kubernetes cluster ?

Variable FILESYSTEMID picking multiple EFS File System ID

Given ... An AWS account containing multiple CPD systems, each in their own VPC, each with an EFS file system in a particular region
When ... running Terraform apply
Expected ... Code in delete-efs.sh and ocp-install.tf for FILESYSTEMID which is used to determine the EFS File System Id is supposed to return only one value.
Actual ... This code in delete-efs.sh and ocp-install.tf for FILESYSTEMID=aws efs describe-file-systems --query 'FileSystems[*].FileSystemId' --output text returns multiple File System IDs when there is more than one EFS file system in the AWS account in a particular region

Replace python with python3 in Terraform scripts.

Python 2.7 is deprecated, replace all instance of python with python3 in all scripts.

OCS operator version 4.3 is not coming up.

OCS operator version 4.3 is failing with error: no matches for kind "StorageCluster" in version "ocs.openshift.io/v1".

Terraform deployment error

After running

terraform apply

the cluster is being created, but I get an error message:

Error: Request failed with status code: 409, ServerErrorResponse: {"incidentID":"5fc8494525f02ac0-IAD","code":"E0007","description":"A cluster with the same name already exists. Choose another name.","type":"Provisioning"}

However, I don't see another cluster with the name I was using.

Azure: Terraform script az_resource_quota_validation.sh gets an error

cp4d-deployment/selfmanaged-openshift/azure/azure_infra/az_resource_quota_validation.sh

When the Terraform script gets to calling the az_resource_quota_validation.sh bash script, I get the following error:

null_resource.az_validation_check: Destroying... [id=5051546956155078295]
null_resource.az_validation_check: Destruction complete after 0s
null_resource.az_validation_check: Creating...
null_resource.az_validation_check: Provisioning with 'local-exec'...
null_resource.az_validation_check (local-exec): Executing: ["cmd" "/C" "chmod +x ./*.sh"]
null_resource.az_validation_check: Provisioning with 'local-exec'...
null_resource.az_validation_check (local-exec): Executing: ["cmd" "/C" "./az_resource_quota_validation.sh -appId 41e92f4e-604f-4c99-868f-b9e9f50ac12a -password [redacted] -subscriptionId 57142dd5-0c01-43d2-aa2e-524391595206 -region canadacentral -printlog false -is_wsl no -is_wkc no -is_wml no -is_dv no -is_wos no -is_spark no -is_cde no -is_streams no -is_streams_flows no -is_db2wh no -is_ds no -is_db2oltp no -is_dods no -is_spss no -is_bigsql no -is_pa no -is_ca no ; if [ $? -ne 0 ] ; then echo "Resource quota validation Failed" ; exit 1 ; fi"]
null_resource.az_validation_check (local-exec): '.' is not recognized as an internal or external command,
null_resource.az_validation_check (local-exec): operable program or batch file.

Error: Error running command './az_resource_quota_validation.sh -appId 41e92f4e-604f-4c99-868f-b9e9f50ac12a -password [redacted] -tenantId 490a6c2b-9d4c-46e8-90c0-ab0dce6bcca0 -subscriptionId 57142dd5-0c01-43d2-aa2e-524391595206 -region canadacentral -printlog false -is_wsl no -is_wkc no -is_wml no -is_dv no -is_wos no -is_spark no -is_cde no -is_streams
no -is_streams_flows no -is_db2wh no -is_ds no -is_db2oltp no -is_dods no -is_spss no -is_bigsql no -is_pa no -is_ca no ; if [
$? -ne 0 ] ; then echo "Resource quota validation Failed" ; exit 1 ; fi': exit status 1. Output: '.' is not recognized as an internal or external command,
operable program or batch file.

I also get some errors trying to execute that bash script on it’s out without running ‘terraform apply’. I did populate the env variables in the env.sh bash script as well as all the variables that needed to be defined in the variables.tf file. I can confirm I have enough vCPU cores available in my Azure subscription for the DSv3 family series required. So don’t think it’s failing on the vCPU check, I think the script itself appears to be failing for some sort of syntax error. Please see output below running the az_resource_quota_validation.sh bash script by itself:

$ ./az_resource_quota_validation.sh
Total vcpu rquired is 40

List of values entered

The client_id entered is :
The client secret entered is :
The TENANT_ID value entered is:
The subscriptionId value entered is :
The location which has been selected is :
Traceback (most recent call last):
File "", line 1, in
File "C:\Python39\lib\json_init_.py", line 293, in load
return loads(fp.read(),
File "C:\Python39\lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Python39\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python39\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Executing curl command to get the usage data

The vCPU usage limit is written into the file : az_vcpu_limit_20210317-173323.json

Executing curl command to get the Network data

The Network usage limit is written into the file : az_network_limit_20210317-173323.json

Please find the default resource quota's required
As per the OCP4.5 documentation for azure, the minimum quota required are as follows:

Component Number of components required by default(minimum)

vCPU 40
vNet 1
Network Interfaces 6
Network security groups 2
Network load balancers 3
Public IP addresses 3
Private IP addresses 7

Summary of the resource quota details for the subscriptionId :
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Resource_name	Required	Available	Validation_check
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
vCPU	40	0	FAILED
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
vNet	1	0	FAILED
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
networkInterface	6	0	FAILED
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
networkSecurityGroups	2	0	FAILED
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
loadBalancers	3	0	FAILED
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `"' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `,' is not a known regexp operator
publicIpAddresses	3	0	FAILED
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

failed

SSO using instead of iam user and script is error out because of the check

Hi Team,

Is there any way to deploy cp4d using SSO instead of iam user. This is SSO user has full admin access right.
Here it is erroring out in libs_aws/iam_helper.py at
targetArn = self.__iam_resource.CurrentUser().arn
targetUser = targetArn.split('/')[-1]
return targetUser

calling at user_name = iam_helper.get_user_name() in aws_permission_validation.py

Is that cp4d-deployment design for iam user only, please confirm?
The AWS cloud we are trying.

Terraform enhancement - support installation of Scheduling Service and Watson Machine Learning Accelerator

Hi,

The request is to enhance existing terraform script to automation installation on AWS, IBM ROK and Azure:

Openshift GPU operator (possible)
Scheduling Service: https://www.ibm.com/docs/en/cloud-paks/cp-data/3.5.0?topic=service-installing-scheduling
Watson Machine Learning Accelerator: https://www.ibm.com/docs/en/cloud-paks/cp-data/3.5.0?topic=accelerator-installing-watson-machine-learning

We have deployed Scheduling Service and Watson Machine Learning Accelerator in IBM ROK (PoC). This is my note tracking my installation steps that may be helpful -> https://ibm.box.com/s/qrkh8vs3nwaix7vlcxlww914dypqcjee

I am happy to work with folks that would help on this terraform automation.

Thanks.

Kelvin

Portworx licensing does not mention IBM Essentials option

I notice that the only option regarding Portworx licensing mentioned is the trial Enterprise license. Portworx has as of CP4D version 2.5 worked out an option to allow the bundled entitlement license to be use which allows up to 8 nodes.

Is there a reason that the IBM Essentials version of the Portworx license does not appear in the ReadMe? It's fully supported on CP4D v3 (and has since v2.5).

[managed-openshift/ROSA] Error installing CPD - Private ROSA

Hello Team,
Hope everyone is safe and well !!

I was trying to consume the CP4D terraform asset for managed OpenShift on AWS (ROSA) and got stuck at below stage.
I believe it is a potential bug :(

If we use true for below variable
variable "private_cluster" {
type = bool
description = "Endpoints should resolve to Private IPs"
default = false
}
Then terraform try to find the cluster specified in the variable.tf instead of creation .

Sample Error Log:

module.ocp.null_resource.install_rosa (local-exec): W: You are choosing to make your cluster private. You will not be able to access your cluster until you edit network settings in your cloud provider.
? Are you sure you want to set cluster 'ibmrosa' as private? (y/N) 
module.ocp.null_resource.install_rosa: Creation complete after 1m11s [id=xxxxxxxxxxxxxxxxx]
^[[43;148R^[[43;148Rmodule.ocp.null_resource.create_rosa_user: Creating...
module.ocp.null_resource.create_rosa_user: Provisioning with 'local-exec'...
module.ocp.null_resource.create_rosa_user (local-exec): Executing: ["/bin/sh" "-c" "/Users/chayan/Documents/work/2021/scotgov/ORCI/cp4data/cp4d-deployment/managed-openshift/aws/terraform/installer-files/rosa create admin --cluster='ibmrosa' > /Users/chayan/Documents/work/2021/scotgov/ORCI/cp4data/cp4d-deployment/managed-openshift/aws/terraform/installer-files/.creds\necho \"Sleeping for 4mins\"\nsleep 240\n"]
module.ocp.null_resource.create_rosa_user (local-exec): E: Failed to get cluster 'ibmrosa': There is no cluster with identifier or name 'ibmrosa'

Terraform Destroy Cluster command doesn't Work after Installing Services on Existing Cluster

When we try to install a new service on an existing AWS/Azure cluster, destroy_cluster in tfstate file changes it's public IP to constant 3.34.127.192.

{
"mode": "managed",
"type": "null_resource",
"name": "destroy_cluster",
"provider": "provider.null",
"instances": [
{
"schema_version": 0,
"attributes": {
"id": "7079104035232105590",
"triggers": {
"bootnode_public_ip": "3.34.127.192",
"directory": "ocpfourx",
"private-key-file-path": "~/.ssh/id_rsa",
"username": "ec2-user"
}
},
"dependencies": [
"aws_instance.bootnode",
"aws_internet_gateway.bootnode",
"aws_key_pair.keypair",
"aws_security_group.openshift-public-ssh",
"aws_security_group.openshift-vpc",
"aws_subnet.public1",
"aws_vpc.cpdvpc"
]
}
]
},

OCS operator version upgrade

OCS operator version changed from 4.5.1 to 4.5.2

[managed-openshift/ibmcloud] Warning during 'terraform apply'

I have this warning when running terraform apply

Warning: External references from destroy provisioners are deprecated
  on portworx/portworx.tf line 73, in resource "null_resource" "volume_attachment":
  73:     environment = {
  74:       TOKEN             = data.ibm_iam_auth_token.this.iam_access_token
  75:       REGION            = var.region
  76:       RESOURCE_GROUP_ID = var.resource_group_id
  77:       CLUSTER_ID        = var.cluster_id
  78:       WORKER_ID         = data.ibm_container_vpc_cluster_worker.this[count.index].id
  79:       VOLUME_ID         = ibm_is_volume.this[count.index].id
  80:     }
Destroy-time provisioners and their connection configurations may only
reference attributes of the related resource, via 'self', 'count.index', or
'each.key'.
References to other resources during the destroy phase can cause dependency
cycles and interact poorly with create_before_destroy.
(and 5 more similar warnings elsewhere)

AWS Credentials in osaws_var.tfvars.txt not used

Despite populating the osaws_var.tfvars.txt with valid AWS credentials, I am getting this error message when I run:-

[root@ip-172-31-42-158 aws_infra]# terraform apply -var-file=/root/cp4d-deployment-master/aws/aws_infra/osaws_var.tfvars.txt
data.template_file.awscreds: Refreshing state...
data.template_file.repo: Refreshing state...
data.template_file.clusterautoscaler: Refreshing state...
data.template_file.awsregion: Refreshing state...
data.template_file.crio-mc: Refreshing state...
data.template_file.efs-configmap: Refreshing state...
data.template_file.registry-conf: Refreshing state...
data.template_file.security-limits-mc: Refreshing state...
data.template_file.sysctl-machineconfig: Refreshing state...
data.template_file.portworx-override: Refreshing state...
data.template_file.registry-mc: Refreshing state...

Error: No valid credential sources found for AWS Provider.
        Please see https://terraform.io/docs/providers/aws/index.html for more information on
        providing credentials for the AWS Provider

  on vpc.tf line 1, in provider "aws":
   1: provider "aws" {

I can work around this issue multiple ways. Personally, I have used /root/.aws/credentials file. But I thought it may be useful to you to know there is an issue. I tested this with terraform version 0.12.29

By the way, I also found I needed to use the latest version of terraform to work with your scripts. I suggest you add this to the documentation, so that it's clear. I spent some time chasing 'bugs' that were simply due to having an out of date terraform install.

[managed-openshift/ibmcloud] Error when running 'terraform apply'

I always have this error when running 'terraform apply'

I run 'terraform apply' again and it works fine.

Error: Error Getting Subnet (02b7-cd6365b1-8117-4f60-b89d-4e3fe0d6a7de): provided token is invalid or expired
{
    "StatusCode": 401,
    "Headers": {
        "Cache-Control": [
            "max-age=0, no-cache, no-store, must-revalidate"
        ],
        "Cf-Cache-Status": [
            "DYNAMIC"
        ],
        "Cf-Ray": [
            "663405b15cd2d6c5-FRA"
        ],
        "Cf-Request-Id": [
            "0ad461e2d50000d6c5cf8d4000000001"
        ],
        "Connection": [
            "keep-alive"
        ],
        "Content-Length": [
            "134"
        ],
        "Content-Type": [
            "application/json; charset=utf-8"
        ],
        "Date": [
            "Tue, 22 Jun 2021 08:14:17 GMT"
        ],
        "Expect-Ct": [
            "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""
        ],
        "Expires": [
            "-1"
        ],
        "Pragma": [
            "no-cache"
        ],
        "Server": [
            "cloudflare"
        ],
        "Strict-Transport-Security": [
            "max-age=31536000; includeSubDomains"
        ],
        "Vary": [
            "Accept-Encoding"
        ],
        "X-Content-Type-Options": [
            "nosniff"
        ],
        "X-Request-Id": [
            "613e6d0d-3999-4f72-a34b-3ba71d08b42c"
        ],
        "X-Xss-Protection": [
            "1; mode=block"
        ]
    },
    "Result": {
        "errors": [
            {
                "code": "not_authorized",
                "message": "provided token is invalid or expired"
            }
        ],
        "trace": "613e6d0d-3999-4f72-a34b-3ba71d08b42c"
    },
    "RawResult": null
}                         


  on portworx/portworx.tf line 21, in data "ibm_is_subnet" "this":
  21: data "ibm_is_subnet" "this" {



Error: Error updating database user (portworxuser) entry: Request failed with status code: 403, ServerErrorResponse: {"errors":"forbidden"}

  on portworx/portworx.tf line 90, in resource "ibm_database" "etcd":
  90: resource "ibm_database" "etcd" {

[managed-openshift/ibmcloud] Error installing DMC

When you select to install dmc from CP4D in IBM Cloud, the installation never ends, showing allways this massege:
dmcaddon-cr is Installing!!!!
module.cpd_install.null_resource.install_dmc[0]: Still creating... [1h30m50s elapsed]
module.cpd_install.null_resource.install_dmc[0]: Still creating... [1h31m0s elapsed]
and go on

The problem looks like in the script:
.templates/cpd_install/scripts/install-dmc.sh, line 51
./check-cr-status.sh dmcaddon dmcaddon-cr ${NAMESPACE} dmcStatus

The correct Status to check, it is NOT dmcStatus. The coorect status to check it is dmcAddonStatus

Document that Elastic File System is a Technology Preview feature

Based on OCP documentation:

Elastic File System is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

https://docs.openshift.com/container-platform/4.3/storage/persistent_storage/persistent-storage-efs.html

CP4D users should be aware of this limitation

The above applies to 4.x (all the way to 4.6 at the time of writing)

Using 3.5 GA version of datacore from public repo in AWS

Allow Cluster Network IP to be a variable in AWS config.

In the install-config.yaml the cluster network is hardcoded at 10.128.0.0/16. If you change the vpc and subnet variables in the aws config to something other than 10.0.0.0/16 then install fails due to overlapping cidr ranges. Should be able to pass a variable through for the cluster network if changing the cidr and subnet variables.

Warning during 'terraform init' in managed_openshift/ibmcloud

I have this error message when I run terraform init

Warning: registry.terraform.io: This version of Terraform has an outdated GPG key and is unable to verify new provider releases. Please upgrade Terraform to at least 0.12.31 to receive new provider updates. For details see: https://discuss.hashicorp.com/t/hcsec-2021-12-codecov-security-event-and-hashicorp-gpg-key-exposure/23512

problem with terraform script

While running "terraform apply" right after the portworx storage classes get created, I get the following error. This continually happens now if I try "terraform apply" again.

Error: Error running command './reencrypt_route.sh cpd-tenant': exit status 1. Output: error: error executing jsonpath "_{.items[0].metadata.name}": Error executing template: array index out of bounds: index 0, length 0. Printing more information for debugging the template:
template was:
{.items[0].metadata.name}
object given to jsonpath engine was:
map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}}

error: filespec must match the canonical format: [[namespace/]pod:]file/path
Error from server (NotFound): routes.route.openshift.io "cpd-tenant-cpd" not found
error: open /cert.crt: no such file or directory
rm: cannot remove '/cert.crt': No such file or directory_

storage class change for db2 bigsql

Storage class changes for portoworx on aws

[managed-openshift/ibmcloud] install cpd-cli in the container

Hi,
could you install the cpd-cli directly in the container ?

It would help installing modules not available in the actual script and do the upgrades, checks, ...

Here is my commands for instance, the best would it be cpd-cli to be part of the container image.

$ sudo podman exec -it bp2s-container-1 bash --login
# cd cpd_install/scripts/
# export TEMPLATES_DIR=~/templates
# export CPD_REGISTRY_PASSWORD="<YOUR_CPD_KEY>”
# ./setup_cpd_cli.sh
# cd $TEMPLATES_DIR/cpd-cli

error when running Terraform Apply "The given key does not identify an element in this collection value"

Error: Invalid index

on vpc/vpc.tf line 19, in resource "ibm_is_vpc_address_prefix" "this":
19: zone = local.zones[count.index]
|----------------
| count.index is 1
| local.zones is list of string with 1 element

The given key does not identify an element in this collection value.

Error: Invalid index

on vpc/vpc.tf line 19, in resource "ibm_is_vpc_address_prefix" "this":
19: zone = local.zones[count.index]
|----------------
| count.index is 2
| local.zones is list of string with 1 element

The given key does not identify an element in this collection value.

timeout patch for image registry failing

Update readme for creating portworx spec

Update the readme for creating portworx spec

[managed-openshift/ROSA] Error installing CPD

Hello Team,
Hope everyone is safe and well !!

I was trying to consume the CP4D terraform asset for managed OpenShift on AWS (ROSA) and got stuck in availability zone
The vairable.tf has below variables for selecting single_zone or multi_zone

# Enter the number of availability zones the cluster is to be deployed, default is multi zone deployment.
variable "az" {
  description = "single_zone / multi_zone"
  default     = "multi_zone"
}
variable "availability_zone1" {
  description = "example eu-west-2a"
  default     = ""
}
variable "availability_zone2" {
  description = "example eu-west-2b"
  default     = ""
}
variable "availability_zone3" {
  description = "example eu-west-2c"
  default     = ""
}

If I switch to single_zone and have a value only for availability_zone1 variable terraform complains that

E: Failed to create cluster: Only a single availability zone can be provided to a single-availability-zone cluster, instead received 2

Enhancement Request: AWS: Option to set Ingress to use Network Load Balancer

From OpenShift 4.6, it is now possible in AWS to set the default IngressController to use a Network Load Balancer instead of a Classic Load Balancer.

The instructions to do it are located here:- https://docs.openshift.com/container-platform/4.6/installing/installing_aws/installing-aws-network-customizations.html#nw-aws-nlb-new-cluster_installing-aws-network-customizations

Would it be possible to incorporate this procedure into the terraform scripts ?

create-efs.sh not picking correct VPC

Given ... An AWS account containing multiple CPD systems, each in their own VPC, each with an EFS file system
When ... running Terraform apply (which is calling create-efs.sh) using a new VPC
Expected ... EFS file system to be created with Mount Targets and Security Groups applied
Actual ... create-efs.sh seems to set VPC_ID to multiple VPCs and the remainder of the script continues but fails to execute properly. The new EFS File System exists but has no Mount Targets.

create-efs-output.txt

Installation Progress option for Client using Script

I was installing CPD on Azure using Terraform script following this documentation: https://github.com/IBM/cp4d-deployment/blob/master/selfmanaged-openshift/azure/README.md#deployment-topology
When running wsl installation post cpd lite installation using ./wait-for-service-install.sh wsl zen, the client would like to showcase how much progress (in % or other progress format instead of simply printing "wsl installing!!!!" shown below) is actually done in the module installation?

[managed-openshift/ibmcloud] CP4D URL is not the right one in the output of the terraform commands

At the end of the installation script, the output gives the CP4D URL.

In my case it was not the right one.

Here is the output

Apply complete! Resources: 30 added, 0 changed, 0 destroyed.
 
Outputs:
 
cpd_url = https://cpd-tenant-cpd-bps2s2.bp2s-test2-cluster-d4c5e211c86b934b94ab6901c8e3e2d5-0000.eu-de.containers.appdomain.cloud/zen

The right URL is this one :
https://bps2s2-cpd-bps2s2.bp2s-test2-cluster-d4c5e211c86b934b94ab6901c8e3e2d5-0000.eu-de.containers.appdomain.cloud/

Option to install CP4D using IBM NFS (ibmc-file-gold-gid) storage class instead of Portworx is not present for IBM cloud

We are using the provided scripts to automate provisioning and installation of CP4D on OC cluster. While reviewing the provided scripts, we can see only Portworx as storage call for IBM Cloud. Could you please let us know how we can use this script to use IBM NFS (ibmc-file-gold-gid) as storage class?

Interpolation-only expressions to be updated for Terraform v0.13.5

For latest Terraform version v0.13.5, there are warning messages for Interpolation-only expressions. Terraform scripts needs to be updated as per warning messages.

Enhance AWS Terraform script to be able to use the external repository

The Terraform script for AWS needs to be upgraded to include support for external repository. The images can be stored in a common external repository. This will enable the images to be stored at one place and applied across multiple clusters.

Portworx storage class changes.

[managed-openshift/ibmcloud] Another error when running 'terraform apply'

I sometimes have this error when running 'terraform apply'

I run 'terraform apply' again and it works fine.

Error: Error updating database user (portworxuser) entry: Request failed with status code: 403, ServerErrorResponse: {"errors":"forbidden"}

  on portworx/portworx.tf line 90, in resource "ibm_database" "etcd":
  90: resource "ibm_database" "etcd" {

Parameterize the ELB Timeout Value in AWS Script update-elb-timeout.sh

It has been agreed that the ELB Timeout Value in update-elb-timeout.sh should be parameterized.

This is needed for Customers installing DataStage and using the Legacy DataStage Clients as well as for others that are installing Cognos.

In cpd-install.tf there is a script to setup the idle time
"./update-elb-timeout.sh ${local.vpcid}"

The affected line in update-elb-timeout.sh is:-
aws elb modify-load-balancer-attributes --load-balancer-name $lbs --load-balancer-attributes "{\"ConnectionSettings\":{\"IdleTimeout\":600}}"

At the moment the value defaults to 60 seconds and the script is hard coded to set it to 600 seconds but it sometimes needs to be set as high as 1800 or 3600 seconds. The maximum value allowed is 4000 seconds.

Add support for DB2 Data Gate app in AWS

Hi,

We are the DB2 Data Gate team, a product running on Cloud Pak and generally available since June 19th.

At the moment, we support IBM Cloud and we are investigating to get a personal AWS environment. We are willing to support AWS and Azure. For the moment we only want to focus on AWS.

The storage options we support are :

Network File System (NFS)
Portworx
Rook/Ceph
hostPath storage (local)
OCS (OpenShift Container Storage) (required storage class : ocs-storagecluster-cephfs (default)

As the current cp4d-deployment repo doesn't include DB2 Data Gate in the possible app to be installed using terraform I'm initiating this process with this issue.

Forked cp4d-deployment repo with my branch adding Db2 Data Gate : https://github.com/pelletierkevin/cp4d-deployment/tree/pelletierkevin_datagate

Single zone installation error on aws

Error: Invalid index

on data.tf line 140, in data "template_file" "machineautoscaler":
140: az3 = coalesce(var.availability-zone3, local.avzone[2])
|----------------
| local.avzone is list of string with 2 elements

The given key does not identify an element in this collection value.

Error: Invalid index

on data.tf line 162, in data "template_file" "workerocs":
162: az3 = coalesce(var.availability-zone3, local.avzone[2])
|----------------
| local.avzone is list of string with 2 elements

The given key does not identify an element in this collection value.

Error: Invalid index

on data.tf line 184, in data "template_file" "machinehealthcheck":
184: az3 = coalesce(var.availability-zone3, local.avzone[2])
|----------------
| local.avzone is list of string with 2 elements

ibm / cp4d-deployment Goto Github PK

cp4d-deployment's Introduction

Cloud Pak for Data

Upgrade and Post-Installation activities

cp4d-deployment's People

Contributors

Stargazers

Watchers

Forkers

cp4d-deployment's Issues

Recommend Projects

Recommend Topics

Recommend Org