Giter Club home page Giter Club logo

tbd-workshop-1-public-adac's Introduction

TBD Workshop 1.

Workshop goals

  1. Learn how to provision computing resources for running Big Data analyses using the Infrastructure as Code (IaC) approach.
  2. Learn how to set up opinionated CI/CD pipelines to deploy cloud infrastructure.
  3. Learn how to utilize linters for detecting security vulnerabilities in cloud infrastructure.
  4. Learn how to run Apache Spark code in a distributed way on Hadoop cluster using Vertex AI notebooks and Dataproc services on GCP.
  5. Learn how to use Workload Identity Federation for a secure authentication from GitHub Actions to Google Cloud. img.png

High level architecture

img.png

Prerequisites

Software

GCP

  • Redeem a GCP coupon to create a billing account
  • Authenticate to GCP to obtain the default credentials used for running the code
# first remove the stored credentials if exist
gcloud auth application-default revoke
# login and get the new application credentials
gcloud auth application-default login

Project setup

  1. Export shared environment variables
export TF_VAR_tbd_semester=2023L
# format: 20xx for teachers, student ID number for students 
export TF_VAR_user_id=2003
# use your own billing account id
export TF_VAR_billing_account=016F99-F0B167-9A895D
  1. Enter bootstrap folder then init project and Terraform state bucket
cd bootstrap
terraform init
terraform apply
cd ..
  1. CI/CD (Github Actions setup using Workload Identity Federation)
  • Edit env/backend.tfvars file and set bucket variable with the Terraform state bucket
  • Edit env/project.tfvars file and set project_name, iac_service_account variables using the output from the bootstrap phase, e.g.: img.png
  • Edit cicd_bootstrap/conf/github_actions.tfvars to set github_org and github_repo, e.g.:
  github_org  = "mwiewior"
  github_repo = "tbd-workshop-1-public"
  • Init state file and set env variables
cd cicd_bootstrap
terraform init -backend-config=../env/backend.tfvars
  • Apply
# authenticate Docker backend with GCP
gcloud auth configure-docker
# create CI/CD integration using Workload Identity
terraform apply -var-file ../env/project.tfvars -var-file conf/github_actions.tfvars -compact-warnings
cd ..
  1. Use output variables for configuring Github Actions workflow: .github/workflows/pull-request.yml,e.g. : img.png Please do not edit and hardcode these values in a YAML but set the Github Actions secrets instead while preserving the secret names, i.e. GCP_WORKLOAD_IDENTITY_PROVIDER_NAME and GCP_WORKLOAD_IDENTITY_SA_EMAIL. img.png

  2. Install and configure pre-commit

pre-commit install
  1. Run all linters locally:
pre-commit run --all-files --config .pre-commit-config.yaml --verbose

TASKS to do

  • If pre-commit linters report any issues please try to fix them ๐Ÿ› ๏ธ.
(base) โžœ  tbd-workshop-1-public git:(feat/init-workshop-1) โœ— pre-commit run --all-files --config .pre-commit-config.yaml
Terraform fmt............................................................Passed
Terraform validate.......................................................Passed
Terraform docs...........................................................Passed
Terraform validate with tflint...........................................Passed
Checkov..................................................................Failed
- hook id: terraform_checkov
- exit code: 1

terraform scan results:

Passed checks: 44, Failed checks: 1, Skipped checks: 14

Check: CKV_GCP_89: "Ensure Vertex AI instances are private"
        FAILED for resource: module.vertex_ai_workbench.google_notebooks_instance.tbd_notebook
        File: /modules/vertex-ai-workbench/main.tf:48-61
        Calling File: /main.tf:29-39
        Guide: https://docs.bridgecrew.io/docs/ensure-gcp-vertex-ai-workbench-does-not-have-public-ips

                48 | resource "google_notebooks_instance" "tbd_notebook" {
                49 |   depends_on   = [google_project_service.notebooks]
                50 |   location     = local.zone
                51 |   machine_type = "e2-standard-2"
                52 |   name         = "${var.project_name}-notebook"
                53 |   container_image {
                54 |     repository = var.ai_notebook_image_repository
                55 |     tag        = var.ai_notebook_image_tag
                56 |   }
                57 |   network             = var.network
                58 |   subnet              = var.subnet
                59 |   instance_owners     = [var.ai_notebook_instance_owner]
                60 |   post_startup_script = "gs://${google_storage_bucket_object.post-startup.bucket}/${google_storage_bucket_object.post-startup.name}"
                61 | }

dockerfile scan results:

Passed checks: 103, Failed checks: 1, Skipped checks: 2

Check: CKV_DOCKER_1: "Ensure port 22 is not exposed"
        FAILED for resource: /modules/docker_image/resources/Dockerfile.EXPOSE
        File: /modules/docker_image/resources/Dockerfile:30-30
        Guide: https://docs.bridgecrew.io/docs/ensure-port-22-is-not-exposed

                30 | EXPOSE 8080 16384 16385 4040 22
github_actions scan results:

Passed checks: 99, Failed checks: 0, Skipped checks: 0




Lint Dockerfiles.........................................................Failed
- hook id: hadolint
- exit code: 1

modules/docker_image/resources/Dockerfile:22 DL3013 warning: Pin versions in pip. Instead of `pip install <package>` use `pip install <package>==<version>` or `pip install --requirement <requirements file>`

  • Modify Terraform code to use your custom Docker image in Vertex AI Workbench

Commit changes, push to a branch and open a PR to YOUR repository main/master branch. If you see a warning like this -- please enable the workflows: img.png ...and repush your changes!

Once all Pull Requests checks have passed please merge your PR and wait until your release job finishes. 7. Navigate to the Vertex AI Workbench menu item, find your notebook on the list, press CONNECT and follow the instructions img.png

  1. In your Jupyterlab enviroment add Python3.8 kernel:
python3.8 -m ipykernel install --user --name pyspark
  1. Run a Hello-world PySpark application in a YARN-client mode: img.png

  2. Additional tasks using Terraform:

  1. Add support for arbitrary machine types and worker nodes for a Dataproc cluster and JupyterLab instance
  2. Add support for preemptible/spot instances in a Dataproc cluster
  3. Perform additional hardening of Jupyterlab environment, i.e. disable sudo access and enable secure boot
  4. (Optional) Get access to Apache Spark WebUI
  1. IMPORTANT โ— โ— โ— Please remember to destroy all the resources after the workshop:
terraform init -backend-config=env/backend.tfvars
terraform destroy -no-color -var-file env/project.tfvars 

Requirements

Name Version
terraform ~> 1.4.0
docker 3.0.2
google ~> 4.63.0

Providers

No providers.

Modules

Name Source Version
dataproc ./modules/dataproc n/a
gcr ./modules/gcr n/a
jupyter_docker_image ./modules/docker_image n/a
vertex_ai_workbench ./modules/vertex-ai-workbench n/a
vpc ./modules/vpc n/a

Resources

No resources.

Inputs

Name Description Type Default Required
ai_notebook_instance_owner Vertex AI workbench owner string n/a yes
project_name Project name string n/a yes
region GCP region string "europe-west1" no

Outputs

No outputs.

tbd-workshop-1-public-adac's People

Contributors

mwiewior avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.