Add env variable to signal skip vfio-pci unbind #2079
+7
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relevant PR: NVIDIA/k8s-driver-manager#146
Description
Pass
GPU_WORKLOAD_CONFIGenvironment variable tok8s-driver-managerinit container invfio-managerDaemonSet to prevent unnecessary GPU unbind/rebind operations during rolling updates.Problem
During rolling updates of the
vfio-managerDaemonSet,k8s-driver-managerunconditionally unbinds all GPUs from vfio-pci on startup. When the desired state is already vfio-pci binding, this causes unnecessary disruption to active VM workloads using GPU passthrough (KubeVirt, Kata Containers).Design Rationale
We know that vfio-manager only runs on vm-passthrough nodes: The DaemonSet's nodeSelector requires
nvidia.com/gpu.deploy.vfio-manager: "true", which is only set forgpuWorkloadConfigVMPassthroughnodes. This is true regardless of whether the workload config comes from an explicit node label orsandboxWorkloads.defaultWorkload.Checklist
make lint)make validate-generated-assets)make validate-modules)