Giter Club home page Giter Club logo

Comments (3)

foersleo avatar foersleo commented on September 21, 2024 1

Thanks for bringing this up @chiragjn.

As has been discussed on the linked issue, the GSP firmware can be disabled through the module option NVreg_EnableGpuFirmware (as also detailed in the documentation of the driver in https://download.nvidia.com/XFree86/Linux-x86_64/535.161.07/README/gsp.html).

Setting module options for Bottlerocket can be done through the user-data setting for the kernel command line and the reboot-to-reconcile option as documented on https://bottlerocket.dev/en/os/1.19.x/api/settings/boot/ .

To achieve disabling GSP, you will have to set the following options in your user data:

[settings.boot]
reboot-to-reconcile: true
[settings.boot.kernel-parameters]
"nvidia.NVreg_EnableGpuFirmware" =
[
  "0"
]

I have done a test with the following eksctl config to check in an A/B scenario. One nodegroup that boots the image "vanilla" (ng-bottlerocket-g4), and one nodegroup with the appropriate settings to disable GSP firmware (ng-bottlerocket-g4-nogsp):

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: bottlerocket-nvidia
  region: us-west-2
  version: '1.28'

nodeGroups:
  - name: ng-bottlerocket-g4
    instanceType: g4dn.xlarge
    desiredCapacity: 1
    amiFamily: Bottlerocket
    ami: ami-0afc36986e4122bb4
    iam:
       attachPolicyARNs:
          - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
          - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
          - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
          - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
    ssh:
        allow: true
        publicKeyName: ec2_rsa
    bottlerocket:
      settings:
        motd: "Hello from eksctl!"
  - name: ng-bottlerocket-g4-nogsp
    instanceType: g4dn.xlarge
    desiredCapacity: 1
    amiFamily: Bottlerocket
    ami: ami-0afc36986e4122bb4
    iam:
       attachPolicyARNs:
          - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
          - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
          - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
          - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
    ssh:
        allow: true
        publicKeyName: ec2_rsa
    bottlerocket:
      settings:
        motd: "Hello from eksctl!"
        boot:
          reboot-to-reconcile: true
          kernel-parameters:
            nvidia.NVreg_EnableGpuFirmware:
              - "0"

Instance g4dn.xlarge with GSP disabled:

bash-5.1# /usr/libexec/nvidia/tesla/bin/nvidia-smi -q | grep "GSP Firmware Version"
    GSP Firmware Version                  : N/A
bash-5.1# cat /proc/cmdline 
nvidia.NVreg_EnableGpuFirmware="0" [...]

Instance g4dn.xlarge without GSP disabled:

bash-5.1# /usr/libexec/nvidia/tesla/bin/nvidia-smi -q | grep "GSP Firmware Version"
    GSP Firmware Version                  : 535.161.07

Would this fix your issue or is there anything extra that you would need from Bottlerocket?

from bottlerocket.

chiragjn avatar chiragjn commented on September 21, 2024 1

Oh amazing, didn't know about this
I'll try this out with Karpenter user data and report back by tomorrow

from bottlerocket.

chiragjn avatar chiragjn commented on September 21, 2024 1

This works as expected! Thanks again :)

from bottlerocket.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.