Skip to content

Conversation

@karthikvetrivel
Copy link
Member

@karthikvetrivel karthikvetrivel commented Jan 29, 2026

Relevant PR: NVIDIA/k8s-driver-manager#146

Description

Pass GPU_WORKLOAD_CONFIG environment variable to k8s-driver-manager init container in vfio-manager DaemonSet to prevent unnecessary GPU unbind/rebind operations during rolling updates.

Problem

During rolling updates of the vfio-manager DaemonSet, k8s-driver-manager unconditionally unbinds all GPUs from vfio-pci on startup. When the desired state is already vfio-pci binding, this causes unnecessary disruption to active VM workloads using GPU passthrough (KubeVirt, Kata Containers).

Design Rationale

We know that vfio-manager only runs on vm-passthrough nodes: The DaemonSet's nodeSelector requires nvidia.com/gpu.deploy.vfio-manager: "true", which is only set for gpuWorkloadConfigVMPassthrough nodes. This is true regardless of whether the workload config comes from an explicit node label or sandboxWorkloads.defaultWorkload.

Checklist

  • No secrets, sensitive information, or unrelated changes
  • Lint checks passing (make lint)
  • Generated assets in-sync (make validate-generated-assets)
  • Go mod artifacts in-sync (make validate-modules)
  • Test cases are added for new code paths

Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant