Giter Club home page Giter Club logo

deploy-flyte's People

Contributors

davidmirror-ops avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deploy-flyte's Issues

Add GPU support to GCP modules

It includes adding a programmatic way to label the node pool with the specific GPU type only if the user indicate the requirement to use GPUs (bool)

No ingress address

Hey guys,

After fixing a few barriers I've managed to get the cluster up and running with the flyte-binary in running and healthy mode..

But I cannot get the ingress address:
CleanShot 2023-09-10 at 18 10 34

My helm config:

configuration:
  inlineSecretRef: flyte-binary-inline-config-secret
  database:
    username: flyteadmin
    host: 'example'
    dbname: flyteadmin
  storage:
    metadataContainer: flyte-staging-data
    userDataContainer: flyte-staging-data
    provider: s3
    providerConfig:
      s3:
        region: 'eu-central-1'
        authType: 'iam'
  logging:
    level: 5
    plugins:
      cloudwatch:
        enabled: true
        templateUri: |-
          https://console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/eks/opta-development/cluster;stream=var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}-{{ .containerId }}.log
  auth:
    enabled: true
    oidc:
      baseUrl: https://accounts.google.com
      clientId: itsmysecret
      clientSecret: mysecretasswell
    internal:
      clientSecret: example
      clientSecretHash: has example
    authorizedUris:
      - https://flyte.dev.example
  inline:
    cluster_resources:
      customData:
        - production:
            - defaultIamRole:
                value: arn:aws:iam::example:role/flyte-staging-flyte-worker
        - staging:
            - defaultIamRole:
                value: arn:aws:iam::example:role/flyte-staging-flyte-worker
        - development:
            - defaultIamRole:
                value: arn:aws:iam::example:role/flyte-staging-flyte-worker
    flyteadmin:
      roleNameKey: 'iam.amazonaws.com/role'
    plugins:
      k8s:
        inject-finalizer: true
        default-env-vars:
          - AWS_METADATA_SERVICE_TIMEOUT: 5
          - AWS_METADATA_SERVICE_NUM_ATTEMPTS: 20

    storage:
      cache:
        max_size_mbs: 10
        target_gc_percent: 100
    tasks:
      task-plugins:
        enabled-plugins:
          - container
          - sidecar
          - K8S-ARRAY
        default-for-task-types:
          - container: container
          - container_array: K8S-ARRAY
    task_resources:
      defaults:
        cpu: 1
        memory: 1Gi
        storage: 100Mi

clusterResourceTemplates:
  inline:
    001_namespace.yaml: |
      apiVersion: v1
      kind: Namespace
      metadata:
        name: '{{ namespace }}'

    002_serviceaccount.yaml: |
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: default
        namespace: '{{ namespace }}'
        annotations:
          eks.amazonaws.com/role-arn: '{{ defaultIamRole }}'

ingress:
  create: true
  commonAnnotations:
    kubernetes.io/ingress.class: nginx
    alb.ingress.kubernetes.io/certificate-arn: 'arn:aws:acm:eu-central-1:example...'
  httpAnnotations:
    nginx.ingress.kubernetes.io/app-root: /console
  grpcAnnotations:
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
  host: flyte.dev.example

rbac:
  extraRules:
    - apiGroups:
        - ''
      resources:
        - pods
        - services
        - configmaps
      verbs:
        - '*'
    - apiGroups:
        - ''
      resources:
        - serviceaccounts
      verbs:
        - create
        - get
        - list
        - patch
        - update
    - apiGroups:
        - rbac.authorization.k8s.io
      resources:
        - rolebindings
        - roles
      verbs:
        - create
        - get
        - list
        - patch
        - update

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::example

Nodes - Pods Tolerations

After getting a few things to work it seems that the tolerations of the nodes include:
flyte.org/node-role = worker but the pods scheduled by flyte do not include them, any idea how to fix it?

CleanShot 2023-09-12 at 11 50 29

Cloudwatch not getting logs.

Hey again :)

The logs are not getting into the CloudWatch for some reason.

Logs section in the helm file

logging:
    level: 4 (tried also 1)
    plugins:
      cloudwatch:
        enabled: true
        templateUri: |-
          https://console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/eks/flyte-staging/cluster;stream=var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}-{{ .containerId }}.log

CloudWatch can't find the log stream:

CleanShot 2023-09-21 at 17 28 04

Looking at the other log streams it seems that most of the pods get log streams but none of the workflow/tasks get them.

Any idea what is not configured right? (tried adding CloudWatch full access policy to the role as well)

Make GCP reference implementation permissions more fine-grained

The base assumption for this Issue is that a reference implementation should implement the Least Privilege approach as a way to showcase a more secure deployment OOB and also inform users who want/need to relax security controls about the minimum set of permissions required, in this case, for Flyte.

The current GCP implementation is more permissive than necessary, specifically:

  1. When it creates the GCS bucket for metadata, it grants both GSAs (Google Service Accounts) flyte-worker and flyte-binary the admin role.
  2. It does so by configuring a google_storage_bucket_iam_binding; an authoritative resource that leaves other members (GSAs in this case) without the possibility of using the legacyBucketReader role for that bucket. This could be inconvenient, especially if an organization as other tools that require access to metadata, especially in combination with ACLs.
  3. It's not clear why Flyte's GSA should be both admins and also inherit the legacyBucketReader role.
  4. According to Google's recommendations, each Flyte service should have it's own GSA and use a CustomRole with specific permissions

Previous versions of the documentation and recent experiments of Flyte users indicate that it's possible to use a set of more granular permissions for Flyte services.

The working combination that implements the least privilege approach should be used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.