Skip to content

[Feature] Optional operator-level policy to require explicit GPU resource requests #2080

@AkshatDudeja77

Description

@AkshatDudeja77

Motivation

In some cluster environments, administrators want to enforce a strict policy where workloads must explicitly request GPU resources (e.g. nvidia.com/gpu) in order to gain access to GPUs on a node.

Today, it is possible for containers to access GPUs indirectly (for example via environment variables or runtime configuration) without passing through device plugin allocation, which can make it difficult to enforce cluster-wide GPU usage policies.

Scope

This request is intentionally scoped to the GPU Operator rather than the Kubernetes device plugin.

Since containers that do not explicitly request GPU resources do not pass through device plugin allocation, enforcement at the device-plugin level is not feasible. However, the GPU Operator is well-positioned to provide optional, cluster-level policy and validation mechanisms before GPU workloads are admitted or configured.

Proposed Behavior

Introduce an optional, opt-in policy at the GPU Operator level that enforces explicit GPU resource requests.

When enabled, the operator would prevent or warn against configurations where workloads gain access to GPUs without explicitly requesting nvidia.com/gpu, for example via environment variables or runtime configuration.

This behavior would be disabled by default and only activated when explicitly configured by the cluster administrator.

Non-Goals

  • This proposal does not attempt to block or modify low-level GPU access mechanisms outside the scope of the GPU Operator (e.g., direct runtime configuration or container toolkit behavior).

  • This proposal does not change default GPU exposure behavior unless the feature is explicitly enabled by a cluster administrator.

  • This proposal does not introduce enforcement at the Kubernetes device plugin level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureissue/PR that proposes a new feature or functionalitylifecycle/frozen

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions